Last active
April 8, 2023 23:00
-
-
Save ano/c4dd33e8e538afa170e2a0c3eb7c842c to your computer and use it in GitHub Desktop.
MySQL/ MariaDB Cosign Similarity User Defined Function
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
To write a user-defined function in MariaDB that calculates cosine similarity between a given vector and a vector stored in a BLOB field, you can use the following code: | |
``` sql | |
DELIMITER $$ | |
CREATE FUNCTION cosine_similarity(vector VARCHAR(255), blob_field BLOB) | |
RETURNS FLOAT | |
DETERMINISTIC | |
BEGIN | |
-- Deserialize the BLOB field into a vector | |
SET @serialized_vector = blob_field; | |
SET @vector_type = 'float'; -- Change this to 'double' if the vector contains 64-bit values | |
SET @vector_size = LENGTH(@serialized_vector) / (@vector_type_size := IF(@vector_type = 'float', 4, 8)); | |
SET @values = ''; | |
FOR i IN 0..(@vector_size - 1) DO | |
SET @value = SUBSTRING(@serialized_vector, i * @vector_type_size + 1, @vector_type_size); | |
IF @vector_type = 'float' THEN | |
SET @value = CONVERT(@value, FLOAT); | |
ELSE | |
SET @value = CONVERT(@value, DOUBLE); | |
END IF; | |
SET @values = CONCAT(@values, ',', @value); | |
END FOR; | |
SET @vector = CONCAT('[', SUBSTRING(@values, 2), ']'); | |
-- Calculate the cosine similarity | |
SET @dot_product = 0; | |
SET @norm_a = 0; | |
SET @norm_b = 0; | |
FOR i IN 1..JSON_LENGTH(vector) DO | |
SET @a_i = JSON_EXTRACT(vector, CONCAT('$[', i - 1, ']')); | |
SET @b_i = JSON_EXTRACT(@vector, CONCAT('$[', i - 1, ']')); | |
SET @dot_product = @dot_product + @a_i * @b_i; | |
SET @norm_a = @norm_a + @a_i * @a_i; | |
SET @norm_b = @norm_b + @b_i * @b_i; | |
END FOR; | |
SET @similarity = NULLIF(@dot_product / SQRT(@norm_a * @norm_b), 0); | |
RETURN @similarity; | |
END$$ | |
DELIMITER ; | |
``` | |
This function takes two parameters: a JSON-formatted vector and a BLOB field containing a serialized vector. It deserializes the BLOB field into a vector, and then calculates the cosine similarity between the two vectors using the dot product and unit vectors formula. The function returns the cosine similarity as a floating-point value. | |
To use this function, you can call it like any other SQL function: | |
``` sql | |
SELECT cosine_similarity('[1, 2, 3]', blob_field) FROM my_table; | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment