Skip to content

Instantly share code, notes, and snippets.

@arian
Created May 16, 2025 11:40
Show Gist options
  • Save arian/6c7d58b566c8468b24f90bab0ba3202b to your computer and use it in GitHub Desktop.
Save arian/6c7d58b566c8468b24f90bab0ba3202b to your computer and use it in GitHub Desktop.
elasticsearch significant terms relatedness semantic knowledge graph script
{
"size": 0,
"aggs": {
"NAME": {
"terms": {
"field": "terms"
},
"aggs": {
"terms": {
"significant_terms": {
"field": "terms",
"min_doc_count": 2,
"script_heuristic": {
"script": """
double fgCount = 1.0*params._subset_freq;
double fgTotal = 1.0*params._subset_size;
double bgCount = 1.0*params._superset_freq;
double bgTotal = 1.0*params._superset_size;
if (fgTotal == 0 || bgTotal == 0) return 0;
// Compute background probability
double bgProb = bgCount / bgTotal;
// Compute expected count in foreground
double expected = fgTotal * bgProb;
// Z-score
double num = fgCount - expected;
double denom = Math.sqrt(expected * (1.0 - bgProb));
denom = (denom == 0) ? 1e-10 : denom;
double z = num / denom;
// Inlined sigmoid functions
double s1 = (z + (-80)) / (50 + Math.abs(z + (-80)));
double s2 = (z + (-30)) / (30 + Math.abs(z + (-30)));
double s3 = (z + 0) / (30 + Math.abs(z + 0));
double s4 = (z + 30) / (30 + Math.abs(z + 30));
double s5 = (z + 80) / (50 + Math.abs(z + 80));
double result = 0.2 * s1 + 0.2 * s2 + 0.2 * s3 + 0.2 * s4 + 0.2 * s5;
return Math.round(result * 1e5) / 1e5;
"""
}
}
}
}
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment