From: https://www.reddit.com/r/LocalLLaMA/comments/1ipsnck/how_i_created_llamathink8binstruct/
I recently created LlamaThink-8b-Instruct Full Instruct model
GGUF: LlamaThink-8b-Instruct-GGUF
and a few of you were curious as to how I made it, here is the process to finetune a model with GRPO reinforcement learning.
So our goal is to make a thinker model, its super easy, first we need a dataset. Here is a script for llama cpp python to create a dataset.