Skip to content

Instantly share code, notes, and snippets.

View vadimkantorov's full-sized avatar
💭
looking for an internship for summer/fall 2021

Vadim Kantorov vadimkantorov

💭
looking for an internship for summer/fall 2021
View GitHub Profile
@vadimkantorov
vadimkantorov / multiprocessing_pool_batched.py
Last active May 27, 2025 09:45
Example of using multiprocessing with explicitly batched inputs
import multiprocessing
import itertools
inputs = list(range(111))
batchsize = 10
num_workers = 4
batches = itertools.batched(inputs, batchsize)
def reducer(xs):
@vadimkantorov
vadimkantorov / cache_hf_model.sh
Created May 23, 2025 17:26
Downloads and tests import of a HuggingFace model into a meta device (and thus does not use any GPU/CPU memory)
# Usage: bash cache_hf_model.sh Qwen/Qwen3-8B
# export HF_HOME=/my/cache/HF_HOME
python -c 'import sys, transformers; transformers.AutoModel.from_pretrained(sys.argv[-1], trust_remote_code=True, device_map="meta")' $@
@vadimkantorov
vadimkantorov / minidotenv.py
Created May 22, 2025 18:40
toml can abused to read some basic key-value pairs as well
def load_dotenv(dotenv_path = '.env'):
# https://snarky.ca/use-toml-for-env-files/
# https://github.com/theskumar/python-dotenv
'''
# such simple key-value files are toml subset and can be read via tomllib without external packages or hacks
a="b"
c="d"
'''
import os, tomllib
os.environ.update(tomllib.load(open(dotenv_path, 'rb')))
@vadimkantorov
vadimkantorov / catfsspec.py
Last active June 12, 2025 11:50
Basic example of using fsspec explaining some quirks on porting from regular Python I/O
import sys, fsspec
with fsspec.open(sys.argv[1], 'rt') as f: # must pass 'rt' explicitly, as in fsspec the default mode is 'rb'
print(f.read()) # msut use context manager as in fsspec the result of fsspec.open(...) does not have method read()
# echo world > hello.txt
# python catfsspec.py hello.txt
# python catfsspec.py file://hello.txt
# python catfsspec.py s3://mybucket/hello.txt
@vadimkantorov
vadimkantorov / git_private_fork.sh
Created May 16, 2025 15:26
Create a private fork of verl
# reference: https://gist.github.com/0xjac/85097472043b697ab57ba1b1c7530274
git clone --bare [email protected]:volcengine/verl.git
cd verl.git
# create a bare repo vaidmkantorov/verl
git push --mirror [email protected]:vadimkantorov/verl.git
cd .. && rm -rf verl.git
git clone [email protected]:vadimkantorov/verl.git
@vadimkantorov
vadimkantorov / tqdm.py
Last active May 13, 2025 13:05
Extremely simplified single-file, 20 LOC version of https://tqdm.github.io/docs/tqdm/ for debugging tqdm bugs like https://github.com/tqdm/tqdm/issues/760 or dropping the full dependency
# Save as tqdm.py in project dir, then `from tqdm import tqdm; from tqdm.auto import tqdm` should pick up this class, if fails use export PYTHONPATH=.
# Test run: python tqdm.py
import os, sys
# huggingface_hub/hf_api.py:
# from tqdm.auto import tqdm as base_tqdm
# from tqdm.contrib.concurrent import thread_map
# https://tqdm.github.io/docs/shortcuts/#tqdmauto
sys.modules['tqdm.auto'] = sys.modules[__name__]
@vadimkantorov
vadimkantorov / pip_install_dependencies_from_pyproject_toml.sh
Last active April 30, 2025 14:28
Install only pip dependencies from pyproject.toml (e.g. from from https://github.com/augustepoiroux/LeanInteract )
# https://github.com/pypa/pip/issues/11440
# https://github.com/pypa/pip/issues/7822
# https://stackoverflow.com/a/79598932/445810
# tomllib is available starting from python --version >= 3.11
python -m pip install $(python -c 'import tomllib;print(*tomllib.load(open("pyproject.toml","rb"))["project"]["dependencies"])') # --user --break-system-packages
@vadimkantorov
vadimkantorov / sitecustomize.py
Last active April 22, 2025 19:55
Python trace urllib HTTP requests
import http.client
http.client.HTTPConnection.debuglevel = 1
@vadimkantorov
vadimkantorov / ssh.sh
Last active May 12, 2025 13:24
Various ssh commands
# https://superuser.com/questions/1687960/over-ssh-can-you-use-the-same-private-key-on-the-host-side-for-other-purposes
alias sshagentssh='ssh-agent ssh -A -o AddKeysToAgent=yes'
# generate ssh key for github
# https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent
ssh-keygen -t ed25519 -b 4096 -C "[email protected]" -f ./id_ed25519 -N="" # -q
# https://stackoverflow.com/questions/4565700/how-to-specify-the-private-ssh-key-to-use-when-executing-shell-command-on-git
# https://github.com/settings/ssh/new
export GIT_SSH_COMMAND="ssh -o IdentitiesOnly=yes -i $PWD/id_ed25519"
@vadimkantorov
vadimkantorov / parquet2npyztsv.py
Last active April 25, 2025 13:37
Convert Parquet tables to npy (as record array) or npz (as columns) or tsv (as text columns)
# Usage: python parquet2npyztsv.py test.npy data/train-*-of-*.parquet
# Usage: python parquet2npyztsv.py test.npz data/train-*-of-*.parquet
# Usage: python parquet2npyztsv.py test.tsv data/train-*-of-*.parquet
import sys
import numpy as np
import pyarrow.parquet as pq
output_path, *input_paths = sys.argv[1:]