Created
May 13, 2024 12:09
-
-
Save joeldrapper/e5e345f8904f93d7a602677ccba4722c to your computer and use it in GitHub Desktop.
Murmur hash in pure Ruby
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
module Murmur | |
MASK32 = 0xffffffff | |
def self.hash(str, seed = 0) | |
# Initialize variables | |
# h1: The main hash value that will be iteratively updated | |
# k1: A temporary value used to process chunks of 4 bytes | |
# i: Counter to keep track of the number of bytes processed | |
h1 = seed | |
k1 = 0 | |
i = 0 | |
# Iterate over each byte of the input string | |
str.each_byte do |byte| | |
# Combine the current byte with the existing k1 value | |
# This is done to build a chunk of 4 bytes | |
k1 |= byte << (i * 8) | |
i += 1 | |
# Process the combined bytes when i reaches 4 | |
# This ensures that the hash is computed in chunks of 4 bytes | |
next unless i == 4 | |
# Perform bitwise operations on k1 | |
# These operations are designed to mix and scramble the bits of k1 | |
# The constants 0xcc9e2d51 and 0x1b873593 are chosen for optimal mixing | |
k1 = (k1 * 0xcc9e2d51) & MASK32 | |
k1 = ((k1 << 15) | (k1 >> (32 - 15))) & MASK32 | |
k1 = (k1 * 0x1b873593) & MASK32 | |
# Update the hash value h1 by XORing it with the scrambled k1 | |
# This combines the current chunk's hash with the overall hash | |
# The shift and addition operations further mix the bits of h1 | |
h1 ^= k1 | |
h1 = ((h1 << 13) | (h1 >> (32 - 13))) & MASK32 | |
h1 = ((h1 * 5) + 0xe6546b64) & MASK32 | |
# Reset i and k1 for the next iteration | |
i = 0 | |
k1 = 0 | |
end | |
# Process any remaining bytes if the string length is not a multiple of 4 | |
# This ensures that all bytes contribute to the final hash value | |
if i > 0 | |
k1 = (k1 * 0xcc9e2d51) & MASK32 | |
k1 = ((k1 << 15) | (k1 >> (32 - 15))) & MASK32 | |
k1 = (k1 * 0x1b873593) & MASK32 | |
h1 ^= k1 | |
end | |
# Finalize the hash value | |
# These operations mix the bits of h1 with the length of the input string | |
# The constants used are chosen for optimal mixing and avalanche effect | |
h1 ^= str.bytesize | |
h1 ^= h1 >> 16 | |
h1 = (h1 * 0x85ebca6b) & MASK32 | |
h1 ^= h1 >> 13 | |
h1 = (h1 * 0xc2b2ae35) & MASK32 | |
h1 ^ (h1 >> 16) | |
end | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment