Skip to content

Instantly share code, notes, and snippets.

View whitead's full-sized avatar
:atom:

Andrew White whitead

:atom:
View GitHub Profile
@whitead
whitead / self_cite.py
Last active December 1, 2024 11:16
Compute number of self citations with Semantic Scholar
# License CC0
import httpx
async def analyze_self_citations(doi):
async with httpx.AsyncClient() as client:
response = await client.get(
f"https://api.semanticscholar.org/graph/v1/paper/DOI:{doi}",
params={"fields": "title,authors,references.authors"}
)
We can't make this file beautiful and searchable because it's too large.
hours,wage,capital,cost,field,explanation,discovery,year,id
100000,60.0,50000000,141000000,physics,"1. The discovery of surface water on the Moon was made by NASA and SOFIA in 2020 (Wikipedia2022 chunk 11).
2. SOFIA is a modified Boeing 747SP aircraft equipped with a 2.5-meter telescope used for infrared astronomy.
3. Operating SOFIA requires significant resources due to the specialized equipment and aircraft operation.
4. SOFIA's annual operating cost is approximately $85 million (general knowledge).
5. The capital cost of developing SOFIA, including the aircraft and telescope, is estimated at $1 billion.
6. Assuming the capital cost is depreciated over 20 years, the annual capital cost is $50 million.
7. For this discovery, we attribute one year of operating and capital costs.
8. An estimated 50 scientists worked full-time on this project over one year.
9. Total man-hours are calculated as 50 scientists * 2,000 hours/year = 100,000 hours.
@whitead
whitead / randomize_smi.py
Created March 21, 2024 21:26
Randomize SMILES
from rdkit.Chem import MolFromSmiles, MolToSmiles
smi = "..."
MolToSmiles(MolFromSmiles(smi), canonical=False, doRandom=True, isomericSmiles=True, kekuleSmiles=True)
@whitead
whitead / bart.py
Last active February 12, 2024 16:23
Bart Vestaboard
import click
import time
import requests
import xml.etree.ElementTree as ET
from vesta import vesta_layout, send_to_vesta
def get_departures(station_name):
api_key = "MW9S-E7SL-26DU-VV8V"
base_url = "https://api.bart.gov/api/etd.aspx"
@whitead
whitead / review.txt
Created February 15, 2023 20:50
RoboReview
The paper by Caldas (2023) explored an approach to avoid the need for web server maintenance and cost by hosting a static file on sites like Github. The application developed was a JavaScript implementation of TensorFlow framework to predict the solubility of small molecules. The model implements a deep ensemble approach to report model uncertainty when reporting the prediction. The model was evaluated using RMSE, MAE, and correlation coefficient and outperformed the baseline models (Caldas2023 pages 6-7). The paper also provides a review of methods for calculating solution free energies and modelling systems in solution (Caldas2023 pages 11-12). The authors' model, kde10LSTM Aug, achieved a RMSE of 0.983 and a %±0.5log of 40.0% in the solubility challenge 1 dataset, outperforming 62% of the published RMSE values and 50% of the %±0.5log (Caldas2023 pages 9-10). This paper is significant as it provides an efficient and cost-effective approach to predict the solubility of small molecules with improved accuracy.
@whitead
whitead / name.py
Created November 1, 2022 21:22
Automated naming of compounds
import rdkit
from rdkit import Chem
from rdkit.Chem import AllChem
import exmol
import skunk
import math
import matplotlib.pyplot as plt
import textwrap
import matplotlib.pyplot as plt
import matplotlib.font_manager as font_manager
@whitead
whitead / fetch_pdb.py
Last active May 2, 2022 08:38
Here's a python function that goes from search string (like "human albumin") and returns a pdb file using @rcsbPDB's top result.
import requests
import tempfile
def get_pdb(query_string):
url = "https://search.rcsb.org/rcsbsearch/v1/query"
query = {
"query": {
"type": "terminal",
"service": "full_text",
"parameters": {"value": query_string},
},
tranches = pd.read_csv('https://gist.githubusercontent.com/whitead/f47887e45bbd2f38332182d2d422da6b/raw/a3948beac9b9034dab432b697c5ec238503ac5d0/tranches.txt')
def get_mol_batch(batch_size = 32):
for t in tranches.values:
d = pd.read_csv(t[0], sep=' ')
for i in range(len(d) // batch_size):
yield d.iloc[i * batch_size:(i + 1) * batch_size, 0].values
@whitead
whitead / tranches.txt
Created December 5, 2021 05:37
zinc20 tranches
http://files.docking.org/2D/AA/AAAA.smi
http://files.docking.org/2D/AA/AAAB.smi
http://files.docking.org/2D/AA/AAAC.smi
http://files.docking.org/2D/AA/AAAD.smi
http://files.docking.org/2D/AA/AABA.smi
http://files.docking.org/2D/AA/AABB.smi
http://files.docking.org/2D/AA/AABD.smi
http://files.docking.org/2D/AA/AACA.smi
http://files.docking.org/2D/AA/AACB.smi
http://files.docking.org/2D/AA/AACD.smi
@whitead
whitead / animate.py
Created August 27, 2021 14:48
Some code for animating
from matplotlib.collections import LineCollection
fps = 60.
stride = 1
duration = (T - 5) / fps / stride
print(duration, fps)
all_segments = [make_segments(paths, i) for i in range(N)]
fig = plt.figure(figsize=(1080 //180, 1080 // 180), dpi=180)