Maps are associative containers mapping keys to values. Almost all the programming languages have these data structures. They are called hashmap in java, dictionary in python, associative array in PHP etc. All are an abstract data type — collection of key value pairs.
Maps are always designed keeping in mind the lookup time. We can easily create a list of key value pairs but then the lookup time will be O(N) which is very poor. But with the use of hash functions we can achieve almost constant lookup time. The idea here is to create subsets of the entire collection by categorizing them based on the (hash or part of the hash) of the keys.
For example, lets say we have a registration counter that received 10 thousand registrations, now if we have to search for John Smith, we have to search the entire set of registrations. But imagine if we had three counters that take registrations based on the first letter of the last name.Counter 1: A-I, Counter 2: J-R, Counter 3: S-Z. Now we will have to look only into the third subset and that will be faster.
In golang maps are structured as array of buckets, you can think of buckets as the counters in the above registration example.
So lets see what happens when we use that famous syntax to create a map.
m := make(map[string]string)
Internally a structure called map header is created and the variable m receives a pointer to this structure, the map header contains all the meta information about the map, like:
- The number of entries that are currently in the map
- The number of buckets in a map is always equal to power of two hence the log(buckets) stored to keep the value small
- Pointer to the bucket array that is stored in contiguous memory location,
- Hash seed which is random to create each map differently
Each bucket stores an array of hash codes, a list of key value pairs and an overflow pointer. The array of hash code stores hash code for each key in the bucket, this is used for faster comparison. Each bucket can store maximum of 8 such key value pairs. The overflow pointer can point to a new bucket if more that 8 values are received by a bucket.
What happens when we insert a new value in the map
m[“green”] = “#00ff00”
Hash function is called and a hash code is generated for the given key, based on a part of the hash code a bucket is determined to store the key value pair. Once the bucket is selected the entry needs to be stored in that bucket. The complete hash code of the incoming key is compared with all the hashes from the initial array of hash codes i.e h1, h2, h3…. if no hash code matches that means this is a new entry. Now if the bucket contains an empty slot then the new entry is stored at the end of the list of key value pairs, else a new bucket is created and the entry is stored in the new bucket and the overflow pointer of old bucket points to this new bucket.
Map iterator in golang is random. If you print the same map multiple times you can see that each output will be different.
What happens when map grows
Every time the number of elements in a bucket reaches a certain limit, i.e the load factor which is 6.5, the map will grow in size by doubling the number of buckets. This is done by creating a new bucket array consisting of twice the number of buckets than the old array, and then copying all the buckets from old to new array but this is done very efficiently over time and not at once. During this the map maintains a pointer to the old bucket which is stored at the end of the new bucket array.
What happens when we delete an entry from the map
Maps in golang only grow in size. Even if we delete all the entries from the map, the number of buckets will remain the same
This is just a high level overview of how maps are implemented in go. The actual implementation will consist of complex operations that are out of the scope of this post. Knowing how to use maps is enough to write code however knowing the internals always help.
Your key take aways from the post
- In golang maps are internally array of buckets
- The lookup time for map is O(1)
- You can modify a map while iterating on it
- Map iteration is random
- The load factor for maps is 6.5
- The number of entries in each bucket is 8
- The number of bucket always doubles
- Overflow pointer is used to point to a new bucket when a already full bucket receives a new value
Links: This blog is inspired by the talk by Keith Randall — Inside the Map Implementation