Imagine you are processing 10k source files by facetting them into 500+ languages.
- You have 10,000 files of 2,000 IDs each so a total of about 20M IDs.
- Each row contains 1 or more language (which may not be null: so all IDs will be kept when you facet by language).
- Rare languages make up 0.01% of the IDs, while some like English are 1%.
- You partition your dataset into subsets by language and send them to a reliable uploader CLI tool
- You don't get any info back about whether the upload was a success
- You can write any record you want to assist your verification of the multipart upload