Skip to content

Instantly share code, notes, and snippets.

@alexeygrigorev
Created September 17, 2016 09:09
Show Gist options
  • Save alexeygrigorev/a1bc540925054b71e1a7268e50ad55cd to your computer and use it in GitHub Desktop.
Save alexeygrigorev/a1bc540925054b71e1a7268e50ad55cd to your computer and use it in GitHub Desktop.
Downloading segmented video from vimeo
import requests
import base64
from tqdm import tqdm
master_json_url = 'https://178skyfiregce-a.akamaihd.net/exp=1474107106~acl=%2F142089577%2F%2A~hmac=0d9becc441fc5385462d53bf59cf019c0184690862f49b414e9a2f1c5bafbe0d/142089577/video/426274424,426274425,426274423,426274422/master.json?base64_init=1'
base_url = master_json_url[:master_json_url.rfind('/', 0, -26) + 1]
resp = requests.get(master_json_url)
content = resp.json()
heights = [(i, d['height']) for (i, d) in enumerate(content['video'])]
idx, _ = max(heights, key=lambda (_, h): h)
video = content['video'][idx]
video_base_url = base_url + video['base_url']
print 'base url:', video_base_url
filename = 'video_%d.mp4' % video['id']
print 'saving to %s' % filename
video_file = open(filename, 'wb')
init_segment = base64.b64decode(video['init_segment'])
video_file.write(init_segment)
for segment in tqdm(video['segments']):
segment_url = video_base_url + segment['url']
resp = requests.get(segment_url, stream=True)
if resp.status_code != 200:
print 'not 200!'
print resp
print segment_url
break
for chunk in resp:
video_file.write(chunk)
video_file.flush()
video_file.close()
@kbabanov
Copy link

@Javi3rV :

I also thought about using multithreading and download all of them like there is no tomorrow but vimeo would take a look at net traffic and would suspect something

if so, you can tweak number of workers in line

with ThreadPoolExecutor(max_workers=15) as executor:

and, if wanted, set it to 1 for disable multithreading at all
I also thought about ability to download multiple urls and may be i came with solution a bit later

@davidecavestro
Copy link

davidecavestro commented Jan 31, 2025

I find it useful to have a way to avoid asking for user input, so that the whole thing can be easily scripted.
It's often just a matter of supporting env vars such as

url = url = os.getenv("SRC_URL") or input('enter [master|playlist].json url: ')
name = os.getenv("OUT_FILE") or input('enter output name: ')
max_workers = min(int(os.getenv("MAX_WORKERS", 5)), 15)

or/and if you prefer the launch args could be parsed.

Anyway IMHO multithreading is a different matter: as too many simultaneous requests from the same IP are a PITA, I consider a simple loop safer.

I'll check for using json contents for tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment