Last active
September 8, 2020 16:11
-
-
Save nicknapoli82/c7cb7484011b08a9806d732ea1a142a6 to your computer and use it in GitHub Desktop.
Very simple analysis of a couple hashing algorithms
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is just comparing my hash function against the djb2 function and xxHash. | |
The purpose is not only to observe speed, but also to observe dispersion into a hash table and collisions that occur | |
considering a full 64 bits. Maybe this is useful? | |
I was more focused on my own hash function which is a simple play on the djb2 algorithm. | |
You may find interesting the times involved based on optimization, and when it concerns unfilled buckets. | |
For filling buckets I am only using the low bits of the full number. So | |
table bucket = 64bit hash & 0xffff; | |
when I want to only use the first 16 bits. | |
The dictionaries used are the cs50 large dictionary for the speller pset, and words.txt which I pulled | |
from here https://github.com/dwyl/english-words | |
Time was calculated by iterating through the given dictionary 1000 times, and timing only the hash function used. | |
Time is additive for each hash function call and is timed within the nano second limitation of clock_gettime. | |
This is for fun, and a little explorative. | |
Results for my hash: | |
clang -O0 -o dictionary dictionary.c | |
./dictionary large | |
Time spent in hash = 6.752705 | |
Using MY HASH and 65536 table buckets | |
There are 7409 zero buckets out of 143091 words | |
84964 collisions occured where 58127 buckets were used | |
The average bucket length was 3.326907 | |
Considering the total 64 bit hash 0 collisions occured | |
./dictionary words.txt | |
Time spent in hash = 20.061957 | |
Using MY HASH and 262144 table buckets | |
There are 44304 zero buckets out of 466551 words | |
248711 collisions occured where 217840 buckets were used | |
The average bucket length was 2.883202 | |
Considering the total 64 bit hash 4 collisions occured | |
clang -O3 -o dictionary dictionary.c - Optimized version for speed up comparison | |
./dictionary words.txt | |
Time spent in hash = 13.446997 | |
Using MY HASH and 262144 table buckets | |
There are 44304 zero buckets out of 466551 words | |
248711 collisions occured where 217840 buckets were used | |
The average bucket length was 2.883202 | |
Considering the total 64 bit hash 4 collisions occured | |
So -O3 optimization results in 32.97 % speed up | |
Results for djb2: | |
clang -O0 -o dictionary dictionary.c | |
./dictionary large | |
Time spent in hash = 5.446917 | |
Using djb2 and 65536 table buckets | |
There are 7331 zero buckets out of 143091 words | |
84886 collisions occured where 58205 buckets were used | |
The average bucket length was 1.577300 | |
Considering the total 64 bit hash 19 collisions occured | |
./dictionary words.txt | |
Time spent in hash = 16.352637 | |
Using djb2 and 262144 table buckets | |
There are 44262 zero buckets out of 466551 words | |
248669 collisions occured where 217882 buckets were used | |
The average bucket length was 1.592592 | |
Considering the total 64 bit hash 268 collisions occured | |
clang -O3 -o dictionary dictionary.c | |
./dictionary words.txt | |
Time spent in hash = 11.092085 | |
Using djb2 and 262144 table buckets | |
There are 44262 zero buckets out of 466551 words | |
248669 collisions occured where 217882 buckets were used | |
The average bucket length was 1.592592 | |
Considering the total 64 bit hash 268 collisions occured | |
Results for xxHash (XXH3 64 bits): | |
clang -O0 -o dictionary dictionary.c ./xxHash-dev/xxhash.c | |
./dictionary large | |
Time spent in hash = 8.331802 | |
Using xxHash and 65536 table buckets | |
There are 7373 zero buckets out of 143091 words | |
84928 collisions occured where 58163 buckets were used | |
The average bucket length was 2.694839 | |
Considering the total 64 bit hash 0 collisions occured | |
./dictionary words.txt | |
Time spent in hash = 26.469801 | |
Using xxHash and 262144 table buckets | |
There are 44042 zero buckets out of 466551 words | |
248449 collisions occured where 218102 buckets were used | |
The average bucket length was 3.851463 | |
Considering the total 64 bit hash 0 collisions occured | |
clang -O3 -o dictionary dictionary.c ./xxHash-dev/xxhash.c | |
./dictionary words.txt | |
Time spent in hash = 9.565276 | |
Using xxHash and 262144 table buckets | |
There are 44042 zero buckets out of 466551 words | |
248449 collisions occured where 218102 buckets were used | |
The average bucket length was 3.851463 | |
Considering the total 64 bit hash 0 collisions occured |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment