Last active
March 11, 2020 16:23
-
-
Save vitalibertas/c0aa4560907add2c77b0a9e6ff796378 to your computer and use it in GitHub Desktop.
Python API Download Zipped JSON file, Unzip and Format for Redshift, Upload to S3 as GZip.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
gz_buffer = BytesIO() | |
json_buffer = StringIO() | |
download_url = "{0}{1}/file".format(request_url, file_id) | |
request_download = requests.request("GET", download_url, headers=json_header, stream=True) | |
with zipfile.ZipFile(BytesIO(request_download.content), mode='r') as z: | |
unzip_file = StringIO(z.read(z.infolist()[0]).decode('utf-8')) | |
json_responses = json.load(unzip_file)['responses'] | |
for response in json_responses: | |
json_buffer.write(json.dumps(response)) | |
with gzip.GzipFile(mode='wb', fileobj=gz_buffer) as f: | |
f.write(json_buffer.getvalue().encode('utf-8')) | |
return gz_buffer |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Putting data into Redshift is most efficient when using a gzipped JSON file. However, Redshift doesn't like the how the JSON library decodes a Python list to start with brackets -- []. So you need to just take each list element instead. This is all done in memory, so mind how much you've got versus how much you're pulling down!