cpython hash function

This is a follow up to this post. So went on opening up the Objects/dictobject.c file and start reading up the comments. It’s extremely well documented. I read through the first set of comments(about 50 lines) to learn that python’s dict has four type of slots. i.e: an entry in the python’s dict object can be one of 4 types:

 1. Active : active (key,value pair)
 2. Unused: key=value=NULL
 3. Dummy: key=dummy value=NULL -- was previously active, but was deleted.
 4. Pending not yet inserted or deleted. key !=dummy and key != null

As for further implications and usage of these different states i refer you to go here.

Moving on to the comments on hashing function. From the comment section:

Major subtleties ahead: Most hash schemes depend on having a “good” hash function, in the sense of simulating randomness. Python doesn’t: its most important hash functions (for strings and ints) are very regular in common cases: To the contrary, in a table of size 2**i, taking the low-order i bits as the initial table index is extremely fast, and there are no collisions at all for dicts indexed by a contiguous range of ints. The same is approximately true when keys are “consecutive” strings. So this gives better-than-random behavior in common cases, and that’s very desirable.

Also, the comment goes on to talk about how the simple implementation has problems with collisions and the collision resolution strategy is crucial. But that’s stuff for another post. Ok clearly, my question can’t be answered by looking at this file. i’ll have to look deeper So i look up where the actual hash function is found and realize it’s part of the builtin modules. Therefore found in Python/bltinmodule.c . Now the builtin hash function implementation here just takes a PyObject(the C-representation of a Python object) and calls its’ hash function. I was initiall confused by the PyObject var and was searching for a similar variable anywhere else in the file and implemented. Dang, my C is rusty and broken… Anyway, after asking some awkward newbie questions on the python-dev irc, some developers there clarified that the hash function implementation is type-specific. Again realization hits pile screw driver. I was not thinking about it and somehow had assumed a generic hash function. Curse you anand, you’re letting the dynamic typing of python let form bad habits in your thinking wake up. there’s no magic in the real world. Only people willingly believing there is for whatever reason (pain-avoidance/laziness/sanity etc…)

I will follow this up with an actual implementation of hash function for the long integer object,  in a couple of days.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.