mrcodetastic

I recently created LlamaThink-8b-Instruct Full Instruct model

and a few of you were curious as to how I made it, here is the process to finetune a model with GRPO reinforcement learning.

So our goal is to make a thinker model, its super easy, first we need a dataset. Here is a script for llama cpp python to create a dataset.

	#!/bin/bash
	# Run this script on a client computer that needs to connect to a VPS (using a ssh key), and uses a forwarded SSH port
	# that exists on that VPS (from the client that is behind NAT) to connect to the SSH server on the final client,
	# and uses that connection to forward local ports via the overall-SSH connection that is passing via the VPS.
	#
	# Client (forwards port 2201 on the VPS at 2202 locally) -> NAT -> VPS <- NAT <- Host (forwards port 22 as Port 2201 on the VPS)
	# Connecting to 'localhost' on port '2202' on the client, is the same as connecting to port 22 on the Host.
	#
	# Requirements
	# SSH key setup on both VPS and Ultimate Host to avoid interactive login steps.

	// Generate lookup table for converting between perceived LED brightness and PWM

	// Adapted from: https://jared.geek.nz/2013/feb/linear-led-pwm
	// See also: https://ledshield.wordpress.com/2012/11/13/led-brightness-to-your-eye-gamma-correction-no/

	#include <iostream>
	#include <cmath>
	#include <iomanip>

	constexpr int TABLE_SIZE = 256; // Number of steps (brightness)

	/*
	RadioLib SX127x Ping-Pong Example

	For default module settings, see the wiki page
	https://github.com/jgromes/RadioLib/wiki/Default-configuration#sx127xrfm9x---lora-modem

	For full API reference, see the GitHub Pages
	https://jgromes.github.io/RadioLib/

	Customised for a ESP32 S3 'hat' that uses the following PIN mappings:

	Cool apps:
	Universal Radio Hacker: https://github.com/jopohl/urh
	SDR Sharp
	-> Use 'Zadiag' on Windows to attach a WinUSB to the 'Bulk In Interface 0' for the RTL2832 device.



	RTL-SDR read of 433Mhz relay switch like:
	'kebidu 1Pc RF Transmitter 433 Mhz Remote Controls with Wireless Remote Control Switch DC 12V 1CH relay Receiver Module'
	https://www.aliexpress.com/item/32956103016.html?spm=a2g0o.order_list.0.0.21ef1802ISdJsi

	/*
	* Compile and flash this firmware onto a ESP32-S2 and then use a seperate micro USB connector
	* to connect the S2's Pin 19 to USB D- and Pin 20 to USB D+ (and of course +5V to +5V and GND to GND)
	*
	* IMPORTANT: On an ESP32-S2 Mini using Arduino IDE, make sure you have 'USB CDC On Boot' set to 'False'
	* otherwise the 'USB Device Name' that shows up on Windows will not be correct as per the code below.
	*
	* It will appear as a 'USB Mouse' in Windows/Linux/Mac, and move the mouse every 30 seconds or so
	* that you are always online/green. The ESP's LED will flash briefly when this occurs.
	*

	class Session {
	/*
	* Use Apache HTTPClient5 WinClient extensions to automatically gain SSO login via. domain credentials of
	* currently logged in user.
	*/
	// Store SSO cookie's between httpclient and server to maintain session
	private CookieStore cookieStore = new BasicCookieStore();
	// Base http configuration used for all http requests
	private RequestConfig requestConfig = RequestConfig.custom()
	.setConnectTimeout(300, TimeUnit.SECONDS)

	# go run webserver.go
	# Run this in whatever directory you're wanting to quickly serve on port 80

	package main

	import (
	"fmt"
	"log"
	"net/http"
	)

	# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
	# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
	# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
	# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
	# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
	# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
	# THE SOFTWARE.

	# Tested with QT 5.15.2 on Windows 10
	# https://mrfaptastic.github.io