Last active
June 17, 2018 12:31
-
-
Save lsloan/1327534 to your computer and use it in GitHub Desktop.
A Ruby program to convert video subtitles from YouTube's XML format to the SubRip format.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Gist title: "Convert video subtitles from YouTube XML format to SubRip (.srt)" | |
Summary: A Ruby program to convert video subtitles from YouTube's XML format to the SubRip format. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Convert XML YouTube subtitles to SubRip (srt) format | |
# To download the subtitle in XML, put the ID of the YouTube video | |
# at the end of the url: | |
# | |
# http://video.google.com/timedtext?hl=en&lang=en&v=__youtube_video_ID__ | |
# Usage: | |
# | |
# $ ruby youtube2srt.rb [input_filename] [output_filename] | |
# | |
# Where input_filename can be either the name of your xml file | |
# (probably timedtext.xml) or the hashid of your YouTube video. | |
# The output filename is optional. | |
require 'rubygems' | |
require 'nokogiri' | |
require 'uri' | |
require 'net/http' | |
BASE_URL = 'http://video.google.com/timedtext?hl=en&lang=en&v=' | |
source_filename = ARGV[0] | |
output_filename = ARGV[1] | |
TIME_FORMAT = '%02H:%02M:%02S,%3N' | |
def create_srt output_filename, source_filename, source_file | |
File.open(output_filename || source_filename.gsub('.xml', '').concat('.srt'), 'w+') do |srt_file| | |
source_file.css('text').to_enum.with_index(1) do |sub, i| | |
start_time = Time.at(sub['start'].to_f).utc | |
end_time = start_time + sub['dur'].to_f | |
srt_file.write(<<~CAPTION | |
#{i} | |
#{start_time.strftime(TIME_FORMAT)} --> #{end_time.strftime(TIME_FORMAT)} | |
#{Nokogiri::HTML.parse(sub.text).text} | |
CAPTION | |
) | |
end | |
end | |
end | |
if source_filename =~ /\.xml$/i | |
source_file = Nokogiri::XML(open(source_filename), &:noblanks) | |
create_srt(output_filename, source_filename, source_file) | |
puts "xml file #{source_filename} converted to srt" | |
else | |
response = Net::HTTP.get_response URI.parse(BASE_URL+source_filename) | |
if response.code_type.ancestors.include?(Net::HTTPSuccess) | |
source_file = Nokogiri::XML(response.body, &:noblanks) | |
create_srt(output_filename, source_filename, source_file) | |
puts 'Google timedtext.xml converted to srt' | |
else | |
puts "Couldn't find a srt file for #{source_filename} at #{BASE_URL + source_filename}" | |
end | |
end |
Well thank you :)
I just found out that you can simply use this URL:
https://www.youtube.com/api/timedtext?fmt=srt&v=YOUR_VIDEO_CODE&lang=YOUR_LANG_CODE
fmt=srt
doesn't work anymore, unfortunately.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This is not bad, although I'd rather see it done with XSL or Python.