-
Star
(148)
You must be signed in to star a gist -
Fork
(37)
You must be signed in to fork a gist
-
-
Save alexeygrigorev/a1bc540925054b71e1a7268e50ad55cd to your computer and use it in GitHub Desktop.
import requests | |
import base64 | |
from tqdm import tqdm | |
master_json_url = 'https://178skyfiregce-a.akamaihd.net/exp=1474107106~acl=%2F142089577%2F%2A~hmac=0d9becc441fc5385462d53bf59cf019c0184690862f49b414e9a2f1c5bafbe0d/142089577/video/426274424,426274425,426274423,426274422/master.json?base64_init=1' | |
base_url = master_json_url[:master_json_url.rfind('/', 0, -26) + 1] | |
resp = requests.get(master_json_url) | |
content = resp.json() | |
heights = [(i, d['height']) for (i, d) in enumerate(content['video'])] | |
idx, _ = max(heights, key=lambda (_, h): h) | |
video = content['video'][idx] | |
video_base_url = base_url + video['base_url'] | |
print 'base url:', video_base_url | |
filename = 'video_%d.mp4' % video['id'] | |
print 'saving to %s' % filename | |
video_file = open(filename, 'wb') | |
init_segment = base64.b64decode(video['init_segment']) | |
video_file.write(init_segment) | |
for segment in tqdm(video['segments']): | |
segment_url = video_base_url + segment['url'] | |
resp = requests.get(segment_url, stream=True) | |
if resp.status_code != 200: | |
print 'not 200!' | |
print resp | |
print segment_url | |
break | |
for chunk in resp: | |
video_file.write(chunk) | |
video_file.flush() | |
video_file.close() |
@Javi3rV :
I also thought about using multithreading and download all of them like there is no tomorrow but vimeo would take a look at net traffic and would suspect something
if so, you can tweak number of workers in line
with ThreadPoolExecutor(max_workers=15) as executor:
and, if wanted, set it to 1 for disable multithreading at all
I also thought about ability to download multiple urls and may be i came with solution a bit later
I find it useful to have a way to avoid asking for user input, so that the whole thing can be easily scripted.
It's often just a matter of supporting env vars such as
url = url = os.getenv("SRC_URL") or input('enter [master|playlist].json url: ')
name = os.getenv("OUT_FILE") or input('enter output name: ')
max_workers = min(int(os.getenv("MAX_WORKERS", 5)), 15)
or/and if you prefer the launch args could be parsed.
Anyway IMHO multithreading is a different matter: as too many simultaneous requests from the same IP are a PITA, I consider a simple loop safer.
I'll check for using json contents for tests.
I didn't try this, but maybe downloading the json file and using it to test?
We know the json url changes but I'm not sure about the json contents
Edit: as an idea, in my personal script I added the possibility to add more than 1 url in a list of dataclasses (url, outputName). Then it just iterates the list and it downloads them one by one.
I didnt share it because it was just a personal preference, but it can be done in @kbabanov 's script easily aswell. I also thought about using multithreading and download all of them like there is no tomorrow but vimeo would take a look at net traffic and would suspect something lol.