Thread by @ImogenBits, Here's the thread about how hash tables work and my bad twitter [...]

Here's the thread about how hash tables work and my bad twitter joke. It's missing a lot of very cool stuff, but it's got most of the relevant info in it.

Also as images in this tweet if you prefer that.
https://twitter.com/ImogenBits/status/1346855421550718976

https://twitter.com/ImogenBits/status/1346855421550718976

A hash table is a type of associative array, a data structure that lets us store things in pairs of keys and values. E.g. A phone book uses names as keys and phone numbers as values. We take it out and ask it "what's Imogen's number?" and it responds "776-2323".

The reason for a stack is that it gives us fast, O(1), access to the last element we inserted. What exactly we mean with "O(1)" is very technical, but it essentially means that it always takes the same amount of time to push/pop an element regardless of the size of the stack.

Hash tables give us O(1) access to the value behind any key. Take a second to think about how you'd implement something like this.
All I can come up with is some version of looking through a list until we find the key we want. But that will take longer the bigger our table is.

The trick around this is to use a hash function. That's really just a cool word for any function that takes some big value of arbitrary size and returns a value in a fixed range. E.g. the digital root maps any number into the range 0-9. 1337 goes to 5 and 7735596095 goes to 2

Now we can make a regular array, but instead of putting the first value into index 0, we hash the key and that becomes the index we put it at. In our example, we would make an array with 10 slots and if we want to insert the value "Alan" at the key 1337, we'd insert it at index 5

Now if we want to get the value for the key 1337 we don't have to go through the whole array and look for it. We can just hash it and get the correct index right away.

But what happens if we want to insert "Ada" at the key 23? It's a different key, so it shouldn't overwrite "Alan", but it also hashes to 5.
To deal with this we now store all the key/value pairs that get hashed to the same index together in lists.

There are two main ways of doing this.
The first is to have these lists inside the array itself. We define some way of iterating through the array starting from each index.
An example of that would be to just pick the next slot. Then "Ada" would be stored at index 6.

The other way is to turn our array into an array of linked lists. Each node of the linked lists contains a key/value pair. Then the only filled slot is at index 5 and it points to "1337: Alan", which then points to "23: Ada".

Both of these strategies have their advantages. Python uses the first for its dicts and C++ uses the second for its unordered_maps.
There also is much, much more to everything here. But I think these are the parts that are important and fun to know about.

Now to the important part of this:
My bad joke.
It uses C++ so the hash map here uses an array of linked lists. Because I modified the hash function it uses, every key will get mapped to 0. So the hash map now is just a very, very overblown linked list. https://twitter.com/ImogenBits/status/1344056689687982086

https://twitter.com/ImogenBits/status/1344056689687982086

If you now insert something at any key C++ is forced to just append it to the start of the list. The tricky part is getting the elements back out from there without using the key. The entire point of OOP is to hide implementation details like this from us.

So what we need to do is a bunch of pointer dereferencing and casting to get from the object, to the hash map, to the array, to the first element of the linked list. Then we can get our value out and swap some pointers around to remove it from the list.

Thanks for reading all this. I don't have a huge amount of experience explaining stuff, but I really enjoy it. So if you got some nice feedback, let me know!

You can follow @ImogenBits.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: