Andrew White whitead

932 followers · 69 following

University of Rochester/FutureHouse
San Francisco, CA
http://thewhitelab.org
@andrewwhite01
@andrew.diffuse.one
in/andrewdwhite

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

whitead / self_cite.py

Last active December 1, 2024 11:16

Compute number of self citations with Semantic Scholar

	# License CC0

	import httpx

	async def analyze_self_citations(doi):
	async with httpx.AsyncClient() as client:
	response = await client.get(
	f"https://api.semanticscholar.org/graph/v1/paper/DOI:{doi}",
	params={"fields": "title,authors,references.authors"}
	)

whitead / data.csv

Created October 8, 2024 16:28

data from https://diffuse.one/p/d1-003 post

We can't make this file beautiful and searchable because it's too large.

	hours,wage,capital,cost,field,explanation,discovery,year,id
	100000,60.0,50000000,141000000,physics,"1. The discovery of surface water on the Moon was made by NASA and SOFIA in 2020 (Wikipedia2022 chunk 11).
	2. SOFIA is a modified Boeing 747SP aircraft equipped with a 2.5-meter telescope used for infrared astronomy.
	3. Operating SOFIA requires significant resources due to the specialized equipment and aircraft operation.
	4. SOFIA's annual operating cost is approximately $85 million (general knowledge).
	5. The capital cost of developing SOFIA, including the aircraft and telescope, is estimated at $1 billion.
	6. Assuming the capital cost is depreciated over 20 years, the annual capital cost is $50 million.
	7. For this discovery, we attribute one year of operating and capital costs.
	8. An estimated 50 scientists worked full-time on this project over one year.
	9. Total man-hours are calculated as 50 scientists * 2,000 hours/year = 100,000 hours.

whitead / randomize_smi.py

Created March 21, 2024 21:26

Randomize SMILES

	from rdkit.Chem import MolFromSmiles, MolToSmiles

	smi = "..."
	MolToSmiles(MolFromSmiles(smi), canonical=False, doRandom=True, isomericSmiles=True, kekuleSmiles=True)

whitead / bart.py

Last active February 12, 2024 16:23

Bart Vestaboard

	import click
	import time
	import requests
	import xml.etree.ElementTree as ET
	from vesta import vesta_layout, send_to_vesta


	def get_departures(station_name):
	api_key = "MW9S-E7SL-26DU-VV8V"
	base_url = "https://api.bart.gov/api/etd.aspx"

whitead / review.txt

Created February 15, 2023 20:50

RoboReview

The paper by Caldas (2023) explored an approach to avoid the need for web server maintenance and cost by hosting a static file on sites like Github. The application developed was a JavaScript implementation of TensorFlow framework to predict the solubility of small molecules. The model implements a deep ensemble approach to report model uncertainty when reporting the prediction. The model was evaluated using RMSE, MAE, and correlation coefficient and outperformed the baseline models (Caldas2023 pages 6-7). The paper also provides a review of methods for calculating solution free energies and modelling systems in solution (Caldas2023 pages 11-12). The authors' model, kde10LSTM Aug, achieved a RMSE of 0.983 and a %±0.5log of 40.0% in the solubility challenge 1 dataset, outperforming 62% of the published RMSE values and 50% of the %±0.5log (Caldas2023 pages 9-10). This paper is significant as it provides an efficient and cost-effective approach to predict the solubility of small molecules with improved accuracy.

whitead / name.py

Created November 1, 2022 21:22

Automated naming of compounds

	import rdkit
	from rdkit import Chem
	from rdkit.Chem import AllChem
	import exmol
	import skunk
	import math
	import matplotlib.pyplot as plt
	import textwrap
	import matplotlib.pyplot as plt
	import matplotlib.font_manager as font_manager

whitead / fetch_pdb.py

Last active May 2, 2022 08:38

Here's a python function that goes from search string (like "human albumin") and returns a pdb file using @rcsbPDB's top result.

	import requests
	import tempfile
	def get_pdb(query_string):
	url = "https://search.rcsb.org/rcsbsearch/v1/query"
	query = {
	"query": {
	"type": "terminal",
	"service": "full_text",
	"parameters": {"value": query_string},
	},

whitead / iterzinc.py

Created December 7, 2021 14:39

	tranches = pd.read_csv('https://gist.githubusercontent.com/whitead/f47887e45bbd2f38332182d2d422da6b/raw/a3948beac9b9034dab432b697c5ec238503ac5d0/tranches.txt')
	def get_mol_batch(batch_size = 32):
	for t in tranches.values:
	d = pd.read_csv(t[0], sep=' ')
	for i in range(len(d) // batch_size):
	yield d.iloc[i * batch_size:(i + 1) * batch_size, 0].values

whitead / tranches.txt

Created December 5, 2021 05:37

zinc20 tranches

	http://files.docking.org/2D/AA/AAAA.smi
	http://files.docking.org/2D/AA/AAAB.smi
	http://files.docking.org/2D/AA/AAAC.smi
	http://files.docking.org/2D/AA/AAAD.smi
	http://files.docking.org/2D/AA/AABA.smi
	http://files.docking.org/2D/AA/AABB.smi
	http://files.docking.org/2D/AA/AABD.smi
	http://files.docking.org/2D/AA/AACA.smi
	http://files.docking.org/2D/AA/AACB.smi
	http://files.docking.org/2D/AA/AACD.smi

whitead / animate.py

Created August 27, 2021 14:48

Some code for animating

	from matplotlib.collections import LineCollection


	fps = 60.
	stride = 1
	duration = (T - 5) / fps / stride
	print(duration, fps)
	all_segments = [make_segments(paths, i) for i in range(N)]

	fig = plt.figure(figsize=(1080 //180, 1080 // 180), dpi=180)

NewerOlder