Last active
June 3, 2024 06:00
-
Star
(155)
You must be signed in to star a gist -
Fork
(25)
You must be signed in to fork a gist
-
-
Save drorata/146ce50807d16fd4a6aa to your computer and use it in GitHub Desktop.
Minimal Working example of Elasticsearch scrolling using Python client
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Initialize the scroll | |
page = es.search( | |
index = 'yourIndex', | |
doc_type = 'yourType', | |
scroll = '2m', | |
search_type = 'scan', | |
size = 1000, | |
body = { | |
# Your query's body | |
}) | |
sid = page['_scroll_id'] | |
scroll_size = page['hits']['total'] | |
# Start scrolling | |
while (scroll_size > 0): | |
print "Scrolling..." | |
page = es.scroll(scroll_id = sid, scroll = '2m') | |
# Update the scroll ID | |
sid = page['_scroll_id'] | |
# Get the number of results that we returned in the last scroll | |
scroll_size = len(page['hits']['hits']) | |
print "scroll size: " + str(scroll_size) | |
# Do something with the obtained page |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi @derekangziying
I have tested it in a code very similar to:
In my case I am getting all yesterday events and write the result into a CSV file by using DictWriter Class , I wanted to filter the fields that starts with
ca.
to be used as CSV header, for example fieldsca.version
,ca.date_time
and more that are own my index.The 10k limit is handled by helpers.scan by doing scroll requests based on size (in this case 8000) until there's not more data to return and finally scroll is cleared by default at the end of the scan process (that's the reason I don't care using '40min' of TTL)