Skip to content

Instantly share code, notes, and snippets.

@RobThree
Last active December 5, 2024 11:56
Show Gist options
  • Save RobThree/b966d213f13cf167b46ef2f9d8679532 to your computer and use it in GitHub Desktop.
Save RobThree/b966d213f13cf167b46ef2f9d8679532 to your computer and use it in GitHub Desktop.
Download House M.D. Transcripts
#!/usr/bin/env ruby
require 'rubygems'
require 'nokogiri'
require 'open-uri'
INDEX_PAGE = "https://clinic-duty.livejournal.com/12225.html"
# Use URI.open for URLs
(URI.open(INDEX_PAGE) { |f| Nokogiri::HTML(f) }/"table > tbody > tr > td > a").each do |a|
# Skip season links
next if (a/"b").any?
# Download a transcript
season, episode = a.previous.previous.inner_text.strip.split(".")
title = a.inner_text.strip
url = a["href"]
puts "Download %s.%s %s" % [season, episode, title]
page = URI.open(url) { |f| Nokogiri::HTML(f) }
transcript = Nokogiri::HTML(page.css("div.entryText.s2-entrytext").inner_html.gsub!(/<br\s*\/?>/, "\n").strip).text.strip
# Write the transcript to a file
File.open("#{season}-#{episode}.txt", "w") do |f|
f.write("S#{season}E#{episode}: #{title}\n")
f.write(transcript)
end
end
@RobThree
Copy link
Author

RobThree commented Dec 5, 2024

Note that this was based on https://gist.github.com/eungju/767906 and "updated" using ChatGPT. I had about a total of 10 minutes experience with Ruby code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment