Tessa Rhinehart rhine3

2020-11-18

Docs

https://www.psc.edu/resources/bridges/

Nodes:

we will typically use two types:

RSM with 128 G ram - analagous to SMP

This script is no longer supported.

Over the years since I posted this script, it has become more and more common to scrape audio files off of Xeno-Canto.org. This has resulted in an overwhelming amount of traffic to their servers.

Please do not scrape Xeno-Canto without contacting the organizers first to ask for permission and for more information. They will be able to advise you on the best time of day to download data from their servers, or any alternative download options that are available.

Script used to create a scrolling spectrogram (e.g. https://twitter.com/TessaRhinehart/status/1045816355612774400)

ffmpeg -i turkey_sound.wav -filter_complex \
"[0:a]showspectrum=s=600x200:slide=scroll,format=yuv420p[v]" \
-map "[v]" -map 0:a turkey_spec.mp4

Introduction to Bioinformatics - EcoEvo17

Goal: trim and map paired-end read files.

Trimming

Each sequenced nucleotide has a quality associated in a .fastq file. To be sure that we are working with high-quality data, trim off low-quality reads.

To trim, use a program called trim_galore. This program can take several different parameters, specified as "flags":

--paired - indicates that we are trimming paired-end reads
length 70 - indicates we want to keep only reads that are longer than 70bp
--quality 30 - indicates we want to keep only reads with a quality score greater than (or equal to?) 30

	top_dir <- "/Users/tessa/Desktop/moth-tests/M10_0001_test0"
	for (file in list.files(top_dir)){
	old_filepath <- paste(top_dir, file, sep="/")
	print(paste("Old filepath:", old_filepath))
	hex_code <- tools::file_path_sans_ext(basename(old_filepath))
	seconds <- strtoi(hex_code, base=16)

	true_time <- as.POSIXct(seconds, origin="1970-01-01")
	formatted <- strftime(true_time, "%Y%m%d_%H%M%S")
	new_filepath <- paste(top_dir, "/", formatted, ".WAV", sep="")

	import pandas as pd

	# Function for processing pandas dataframe
	def process(df, user_col, date_col, save_to):
	'''
	Save table of earliest date per user

	Save dataframe containing earliest value
	in date_col for each value in user_col. Creates dataframe
	containing one row per unique value in user_col,

	from datetime import datetime, timedelta

	filename = '5BAC2C6F'

	seconds_after_epoch = int(filename, 16)
	utc_datetime = datetime.fromtimestamp(seconds_after_epoch)
	dst_date = datetime(2019, 3, 10) # FOR UTC CONVERSION - 2019 ONLY

	# UTC TO CST CONVERSION
	if utc_datetime < dst_date:

	name: opso
	channels:
	- conda-forge
	dependencies:
	- python==3.6
	- pip==18.0
	- pandas==0.23.4
	- numpy==1.15.1
	- matplotlib==2.1.2
	- docopt==0.6.2