Skip to content

Instantly share code, notes, and snippets.

@orf
Last active November 18, 2020 19:54
Show Gist options
  • Save orf/a1114d6036ecf15413e33a4435fa418e to your computer and use it in GitHub Desktop.
Save orf/a1114d6036ecf15413e33a4435fa418e to your computer and use it in GitHub Desktop.

Unfuck S3

Use this script if you've managed to make wide-scale modifications to lots of S3 objects and you need to roll them all back to their earliest versions.

Pray you never need this, but be thankful it's here if you do.

It finds all the versions of objects under a specific prefix and deletes all but the earliest one.

# Use this script if you do something terrible and need to delete all but the earliest S3 version in a given bucket.
import boto3
import tqdm
s3 = boto3.client("s3")
BUCKET = "your bucket"
PREFIX = "your prefix"
things = s3.get_paginator("list_object_versions").paginate(Bucket=BUCKET, Prefix=PREFIX)
# {Key: {latest: datetime, versions_to_delete:[]}}
results = {}
for items in tqdm.tqdm(things):
for version in items['Versions']:
info = results.setdefault(version['Key'],
{"latest": version['LastModified'], "latest_version": version['VersionId'],
"to_delete": []})
if version['LastModified'] < info['latest']:
info['latest'] = version['LastModified']
info['to_delete'].append(info['latest_version'])
info['latest_version'] = version['VersionId']
to_delete = [(key, info) for key, info in results.items() if info['to_delete']]
delete_keys = [
{"Key": key, "VersionId": version_id}
for key, info in to_delete
for version_id in info['to_delete']
]
from itertools import zip_longest
def grouper(iterable, n, fillvalue=None):
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
for item_slice in tqdm.tqdm(list(grouper(delete_keys, 600))):
s3.delete_objects(
Bucket=BUCKET,
Delete={
"Objects": [item for item in item_slice if item is not None]
}
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment