Skip to content

Instantly share code, notes, and snippets.

View doug-ross's full-sized avatar

Doug Ross doug-ross

  • U.S.
View GitHub Profile
@willccbb
willccbb / grpo_demo.py
Last active May 2, 2025 08:43
GRPO Llama-1B
# train_grpo.py
#
# See https://github.com/willccbb/verifiers for ongoing developments
#
import re
import torch
from datasets import load_dataset, Dataset
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import LoraConfig
from trl import GRPOConfig, GRPOTrainer
@cyphunk
cyphunk / softmax.js
Last active April 6, 2025 00:13 — forked from vladimir-ivanov/softmax.js
softmax function implementation in js
// Fork & examples for the one-line version by @vladimir-ivanov:
//let softmax = (arr) => (index) => Math.exp(arr[index]) / arr.map(y => Math.exp(y)).reduce((a, b) => a + b);
//
// Also see comments for improvements
function softmax(arr) {
return arr.map(function(value,index) {
return Math.exp(value) / arr.map( function(y /*value*/){ return Math.exp(y) } ).reduce( function(a,b){ return a+b })
})
}