Skip to content

Instantly share code, notes, and snippets.

@artimath
Forked from badlogic/01-update-docs.md
Created June 15, 2025 06:17
Show Gist options
  • Save artimath/e095beca46567ef3154310cdf2d383f7 to your computer and use it in GitHub Desktop.
Save artimath/e095beca46567ef3154310cdf2d383f7 to your computer and use it in GitHub Desktop.
Yakety Documentation (Ordered) - LLM-optimized docs with concrete file references

Update Documentation

You will generate LLM-optimized documentation with concrete file references and flexible formatting.

Your Task

Create documentation that allows humans and LLMs to:

  • Understand project purpose - what the project does and why
  • Get architecture overview - how the system is organized
  • Build on all platforms - build instructions with file references
  • Add features/subsystems - following established patterns with examples
  • Debug applications - troubleshoot issues with specific file locations
  • Test and add tests - run existing tests and create new ones
  • Deploy and distribute - package and deploy the software

Required Documentation Structure

Each document MUST include:

  1. Timestamp Header - Hidden comment with last update timestamp
  2. Brief Overview (2-3 paragraphs max)
  3. Key Files & Examples - Concrete file references for each major topic
  4. Common Workflows - Practical guidance with file locations
  5. Reference Information - Quick lookup tables with file paths

Timestamp Format

Each generated file MUST start with:

<!-- Generated: YYYY-MM-DD HH:MM:SS UTC -->

Process

You will:

  1. Analyze the codebase systematically across 7 key areas (merging development+patterns)
  2. Create or update docs in docs/*.md with concrete file references
  3. Synthesize final documentation into a minimal, LLM-friendly README.md
  4. Eliminate all duplication across files

Analysis Methodology

For each area, agents should:

  1. Examine key files: Look for build configs, test files, deployment scripts, main source files
  2. Extract file references: Note specific files, line numbers, and examples
  3. Identify patterns: Find repeated structures, naming conventions, common workflows
  4. Make content LLM-friendly: Token-efficient, reference-heavy, practical examples

Specific File Requirements

Issue the following Task calls in parallel:

Project Overview (docs/project-overview.md): STRUCTURE:

  • Overview: What the project is, core purpose, key value proposition (2-3 paragraphs)
  • Key Files: Main entry points and core configuration files
  • Technology Stack: Core technologies with specific file examples
  • Platform Support: Requirements with platform-specific file locations

Architecture (docs/architecture.md): STRUCTURE:

  • Overview: High-level system organization (2-3 paragraphs)
  • Component Map: Major components with their source file locations
  • Key Files: Core headers and implementations with brief descriptions
  • Data Flow: How information flows with specific function/file references

Build System (docs/build-system.md): STRUCTURE:

  • Overview: Build system with file references to main build configuration
  • Build Workflows: Common tasks with specific commands and config files
  • Platform Setup: Platform-specific requirements with file paths
  • Reference: Build targets, presets, and troubleshooting with file locations

Testing (docs/testing.md): STRUCTURE:

  • Overview: Testing approach with test file locations
  • Test Types: Different test categories with specific file examples
  • Running Tests: Commands with file paths and expected outputs
  • Reference: Test file organization and build system test targets

Development (docs/development.md): STRUCTURE:

  • Overview: Development environment, code style, patterns (merge with old patterns.md if exists)
  • Code Style: Conventions with specific file examples (show actual code from codebase)
  • Common Patterns: Implementation patterns with file references and examples from the codebase
  • Workflows: Development tasks with concrete file locations and examples
  • Reference: File organization, naming conventions, common issues with specific files

Deployment (docs/deployment.md): STRUCTURE:

  • Overview: Packaging and distribution with script references
  • Package Types: Different packages with build targets and output locations
  • Platform Deployment: Platform-specific packaging with file paths
  • Reference: Deployment scripts, output locations, server configurations

Files Catalog (docs/files.md): STRUCTURE:

  • Overview: Comprehensive file catalog with descriptions and relationships (2-3 paragraphs)
  • Core Source Files: Main application logic with purpose descriptions
  • Platform Implementation: Platform-specific code with interface mappings
  • Build System: Build configuration and helper modules
  • Configuration: Assets, scripts, configs - Supporting files and their roles
  • Reference: File organization patterns, naming conventions, dependency relationships

Critical Requirements

LLM-OPTIMIZED FORMAT

  • Token efficient: Avoid redundant explanations, focus on essential information
  • Concrete file references: Always include specific file paths, line numbers when helpful
  • Flexible formatting: Use subsections, code blocks, examples instead of rigid step-by-step
  • Pattern examples: Show actual code from the codebase, not generic examples

NO DUPLICATION

  • Each piece of information appears in EXACTLY ONE file
  • Build information only in build-system.md
  • Code style and patterns only in development.md
  • Deployment information only in deployment.md
  • Cross-references using: "See docs/filename.md"

FILE REFERENCE FORMAT

Always include specific file references:

**Core System** - Core implementation in src/core.h (lines 15-45), platform backends in src/platform/

**Build Configuration** - Main build file (lines 67-89), configuration files

**Module Management** - Interface in src/module.h, implementation in src/module.c (key_function at line 134)

PRACTICAL EXAMPLES

Use actual code from the codebase:

// From src/example.h:23-27
typedef struct {
    bool active;
    void *data;
    int count;
} ExampleState;

Final Steps

After all tasks complete:

  1. Read all docs/*.md files and create README.md with:

    • Project description (2-3 sentences max)
    • Key entry points and core configuration files
    • Quick build commands
    • Documentation links with brief descriptions of what LLMs will find useful
    • Keep it under 50 lines total
  2. Duplication check: Scan all files and remove any duplicated information

  3. File reference check: Ensure all file paths are accurate and helpful

Agent Instructions

Each agent must:

  1. Read existing file if it exists to understand current content
  2. Analyze relevant codebase files systematically
  3. Extract specific file references throughout analysis:
    • Note important headers, source files, configuration files
    • Include line numbers for key functions/sections when helpful
    • Reference actual code examples from the codebase
  4. Create LLM-friendly content:
    • Token-efficient writing (no redundant explanations)
    • Concrete file paths and examples throughout
    • Flexible formatting (subsections, code blocks, practical guidance)
    • Focus on what LLMs need to understand and work with the code
  5. Include practical workflows with specific file references
  6. Create reference sections with file locations and line numbers
  7. Update timestamp at the top with current UTC time
  8. Read generated file and revise for accuracy and completeness

Success criteria: Each file should be a practical reference that helps LLMs quickly understand the codebase and find the right files for specific tasks.

Special note for development.md: Merge content from both old development.md and patterns.md (if they exist) into a single comprehensive development guide with implementation patterns.

The coordinating agent must:

  1. Wait for all agents to complete
  2. Read all generated files
  3. Remove any duplication found
  4. Create a minimal, LLM-optimized README.md with key file references
  5. Update README.md timestamp with current UTC time
  6. Delete docs/patterns.md if it exists since it's merged into development.md

Files Agent Instructions

The Files agent should create a minimal, token-efficient file catalog:

  1. Discover files: Use Glob and LS to find all source files, configs, and build files
  2. Group by function: Organize files into logical categories (core, platform, build, tests, config)
  3. Brief descriptions: One line per significant file describing its primary purpose
  4. Key entry points: Highlight main files, build configs, and important headers
  5. Dependencies: Note major relationships between file groups

Format: Concise lists with file paths and single-sentence descriptions. Focus on helping LLMs quickly locate functionality, not comprehensive documentation.

Success criteria: LLMs can quickly find "where is the main entry point", "which files handle X", "what are the key headers" without reading detailed descriptions.

Yakety

Real-time speech-to-text application with hotkey recording and local Whisper transcription. Records audio while holding a keyboard shortcut, transcribes using on-device AI, and pastes text directly into the active application.

Key Entry Points

  • src/main.c - Application entry point and transcription pipeline
  • CMakeLists.txt - Build system with whisper.cpp integration
  • src/audio.c - Audio recording and processing core
  • src/transcription.cpp - Whisper model integration

Quick Start

# Build release version
cmake --preset release
cmake --build --preset release

# Run CLI version
./build/bin/yakety-cli

# Run GUI version (macOS/Windows)
./build/bin/Yakety.app  # macOS
./build/bin/Yakety.exe  # Windows

Documentation

  • Project Overview - Core purpose, technology stack, platform requirements
  • Architecture - System organization, component map, data flow patterns
  • Build System - CMake configuration, build presets, platform setup
  • Development - Code style, patterns, implementation guidelines
  • Testing - GUI test suite, dialog validation, test execution
  • Deployment - Packaging, distribution, remote deployment

Platform Support

  • macOS 14.0+ (Apple Silicon) - Cocoa/SwiftUI interface
  • Windows 10+ - Win32 native interface
  • Linux - Experimental CLI support

Technology Stack

  • Audio: miniaudio (cross-platform capture)
  • Speech Recognition: whisper.cpp (local AI inference)
  • Build System: CMake 3.20+ with Ninja/Visual Studio
  • GUI: Platform-native (Cocoa/SwiftUI on macOS, Win32 on Windows)

Project Overview

Overview

Yakety is a real-time speech-to-text application that provides instant transcription through keyboard shortcuts. It records audio while a hotkey is held down, transcribes the speech using OpenAI's Whisper model, and automatically pastes the transcribed text into the active application. The application is designed for efficient voice-to-text input across desktop workflows.

The project targets both CLI and GUI usage patterns, supporting macOS and Windows with platform-specific implementations. It integrates whisper.cpp for on-device transcription, eliminating the need for cloud services while maintaining privacy. The application features a system tray interface for GUI mode and comprehensive keyboard monitoring for seamless user interaction.

Key Files

  • src/main.c: Primary application entry point containing initialization sequence, audio processing pipeline, and keyboard event handling (lines 254-388)
  • src/app.h: Cross-platform application framework with platform-specific entry point macros (lines 6-43) and async execution utilities
  • CMakeLists.txt: Build system configuration managing whisper.cpp integration (lines 28-32), platform-specific compilation (lines 48-85), and distribution packaging (lines 358-535)
  • src/transcription.cpp: Whisper model integration and audio processing core (lines 49-100)

Technology Stack

  • Audio Processing: miniaudio library for cross-platform audio capture in src/audio.c with 16kHz mono configuration (lines 9-11)
  • Speech Recognition: whisper.cpp integration for local transcription processing in src/transcription.cpp (lines 14-15)
  • Platform Abstraction: C-style C++ implementation with platform-specific modules in src/mac/ and src/windows/
  • Build System: CMake with custom modules in cmake/ directory, supporting Ninja and Visual Studio generators
  • GUI Framework:
    • macOS: Objective-C/Swift UI in src/mac/dialogs/ with SwiftUI dialogs
    • Windows: Win32 API in src/windows/ with native dialog implementations

Platform Support

macOS Requirements:

  • Minimum macOS 14.0 (Apple Silicon only, set in CMakeLists.txt line 22)
  • Accessibility permissions for keyboard monitoring (handled in src/main.c lines 78-117)
  • Metal acceleration support via ggml-metal library integration
  • System tray menubar interface in src/mac/menu.m

Windows Requirements:

  • Windows 10+ with Visual Studio 2022 build tools
  • Optional Vulkan support for GPU acceleration
  • WSL development environment supported via scripts in wsl/ directory
  • System tray interface in src/windows/menu.c

Cross-Platform Components:

  • Keyboard monitoring: src/mac/keylogger.c and src/windows/keylogger.c
  • Audio recording: src/audio.c with platform-specific audio device handling
  • Preferences storage: src/preferences.c with platform-specific configuration paths
  • Model management: src/models.c with bundled and downloadable Whisper models defined in src/model_definitions.h

Yakety Architecture

Overview

Yakety is a real-time voice transcription application built with a cross-platform C/C++ core and platform-specific UI layers. The system follows a layered architecture with clear separation between business logic, platform abstraction, and native implementations. The core design prioritizes low-latency audio processing, efficient memory management, and responsive user interaction through a unified hotkey system.

The application operates in two modes: console CLI for development/testing and GUI tray application for production use. Both modes share the same core transcription pipeline but differ in their initialization and user interaction patterns. The system integrates OpenAI's Whisper.cpp for speech recognition, providing local processing without cloud dependencies.

Component Map

Core Business Logic (src/)

  • Main Application: main.c (lines 329-390) - Entry point and initialization flow
  • Audio Processing: audio.c, audio.h - Real-time audio capture and buffering
  • Transcription Engine: transcription.cpp, transcription.h - Whisper.cpp integration
  • Model Management: models.c, models.h - Model loading and fallback logic
  • Input Handling: keylogger.h - Cross-platform hotkey detection
  • Menu System: menu.c, menu.h - Tray/menubar interface

Platform Abstraction Layer (src/)

  • Application Framework: app.h - Cross-platform app lifecycle management
  • Preferences: preferences.c, preferences.h - Configuration persistence
  • Utilities: utils.h - Platform-agnostic helper functions
  • Dialog System: dialog.h - Native dialog abstractions

macOS Implementation (src/mac/)

  • App Backend: app.m - NSApplication integration and event loop
  • UI Dialogs: dialogs/*.swift - SwiftUI-based native dialogs
  • System Integration: menu.m, clipboard.m, overlay.m - Cocoa services
  • Input Capture: keylogger.c - Carbon event monitoring
  • Threading: dispatch.m, dispatch.h - GCD-based async execution

Windows Implementation (src/windows/)

  • App Backend: app.c - Win32 application and message loop
  • UI Components: dialog.c, overlay.c - Win32 GUI elements
  • System Services: menu.c, clipboard.c - Windows shell integration
  • Input Capture: keylogger.c - Low-level keyboard hooks

Build System

  • CMake Configuration: CMakeLists.txt (lines 1-535) - Cross-platform build
  • Whisper Integration: cmake/BuildWhisper.cmake - Whisper.cpp compilation
  • Platform Setup: cmake/PlatformSetup.cmake - Platform-specific configuration

Key Files

Core Headers and Data Structures

src/app.h - Application lifecycle management

  • APP_ENTRY_POINT macro (lines 7-43): Platform-specific main() generation
  • app_main() function (line 46): Unified entry point for CLI and GUI modes
  • AppReadyCallback typedef (line 48): Deferred initialization pattern

src/keylogger.h - Input event handling

  • KeyCombination struct (lines 17-20): Multi-key hotkey support
  • KeyCallback typedef (line 8): Event handler function signature
  • KeyInfo struct (lines 11-14): Platform-agnostic key representation

src/transcription.h - Speech processing interface

  • transcription_process() (line 15): Main audio-to-text pipeline
  • transcription_init() (line 8): Whisper model initialization
  • Thread-safe C/C++ boundary with extern "C" wrapper

src/models.h - Model management

  • models_load() (line 7): Unified model loading with fallback logic
  • models_get_current_path() (line 10): Active model path resolution

src/model_definitions.h - Model catalog and metadata

  • ModelInfo struct (lines 6-12): Model metadata for UI and downloads
  • DOWNLOADABLE_MODELS[] array (lines 15-30): Available models with URLs
  • SUPPORTED_LANGUAGES[] array (lines 49-65): Language configuration

Implementation Files

src/main.c - Application bootstrap and flow control

  • on_app_ready() (lines 254-283): Deferred initialization sequence
  • setup_keylogger() (lines 120-148): Permission handling and hotkey setup
  • process_recorded_audio() (lines 169-215): Complete transcription pipeline
  • AppState struct (lines 28-31): Recording state management

src/transcription.cpp - Whisper.cpp integration

  • whisper_context *ctx (line 17): Global Whisper model instance
  • utils_mutex_t *ctx_mutex (line 18): Thread safety for model access
  • null_log_callback() (lines 29-34): Whisper log suppression

src/preferences.c - Configuration persistence

  • Cross-platform config file handling with JSON-like key-value storage
  • KeyCombination serialization for hotkey preferences
  • Platform-specific config directory resolution

Data Flow

Initialization Sequence

  1. Entry Point: main()app_main() (main.c:329)
  2. Core Setup: Logging, preferences, signal handlers (main.c:340-360)
  3. Platform Init: app_init() calls platform-specific initialization
  4. Deferred Loading: on_app_ready() callback triggered after event loop starts
  5. Model Loading: models_load()transcription_init() (models.c:24-42)
  6. UI Setup: Menu creation, keylogger initialization with permissions
  7. Ready State: Application monitoring for hotkey events

Transcription Pipeline

  1. Input Trigger: Hotkey press detected by platform keylogger (keylogger.c)
  2. Recording Start: on_key_press()audio_recorder_start() (main.c:217-231)
  3. Audio Capture: Platform-specific audio recording via miniaudio
  4. Recording Stop: on_key_release()process_recorded_audio() (main.c:233-250)
  5. Audio Processing: audio_recorder_get_samples() retrieves float buffer
  6. Speech Recognition: transcription_process() → Whisper inference (transcription.cpp:15)
  7. Text Output: clipboard_copy()clipboard_paste() for immediate insertion
  8. UI Feedback: Overlay shows "Recording" and "Transcribing" states

Cross-Platform Abstraction

  • Main Thread Dispatch: macOS uses dispatch_async(dispatch_get_main_queue()), Windows uses PostMessage()
  • Event Loop Integration: macOS NSRunLoop, Windows GetMessage()/DispatchMessage()
  • Permission Handling: macOS Accessibility API, Windows UAC/Admin privileges
  • Resource Management: macOS app bundles with Info.plist, Windows resource files (.rc)

Threading Model

  • Main Thread: UI, event handling, clipboard operations
  • Audio Thread: Real-time audio capture (managed by miniaudio)
  • Background Thread: Whisper inference (CPU/GPU intensive)
  • Synchronization: Mutex protection for Whisper context, atomic operations for state flags

The architecture emphasizes minimal latency for the complete transcription cycle while maintaining thread safety and platform compatibility across macOS and Windows environments.

Build System Documentation

Overview

Yakety uses CMake 3.20+ as its primary build system with multi-language support (C, C++, Swift). The build system is configured through:

  • Main CMake file: CMakeLists.txt - Primary build configuration
  • Build presets: CMakePresets.json - Platform-specific build configurations
  • Helper modules: cmake/ directory with modular build logic
    • cmake/BuildWhisper.cmake - Whisper.cpp dependency management
    • cmake/PlatformSetup.cmake - Platform-specific libraries and frameworks
    • cmake/GenerateIcons.cmake - Asset generation from SVG sources

The system automatically handles whisper.cpp dependency building, model downloading, icon generation, and platform-specific configurations.

Build Workflows

Quick Start Commands

Development builds:

# Release build (recommended for development)
cmake --preset release
cmake --build --preset release

# Debug build
cmake --preset debug
cmake --build --preset debug

Windows debugging with Visual Studio:

# Only on Windows - enables Visual Studio debugging
cmake --preset vs-debug
cmake --build --preset vs-debug

Distribution packaging:

# Build and package for current platform
cmake --build --preset release
cmake --build build --target package

# Platform-specific packages
cmake --build build --target package-macos    # macOS only
cmake --build build --target package-windows  # Windows only

# Upload to server (requires SSH access)
cmake --build build --target upload

Build Targets

The build system generates these executables in build/bin/:

  • yakety-cli - Command-line interface
  • yakety-app - GUI application (platform-specific bundle)
  • recorder - Audio recording utility
  • transcribe - Standalone transcription tool
  • test-* - Platform-specific test executables (macOS only)

Asset Generation

Icons are automatically generated from assets/yakety.svg:

# Manual icon regeneration (automatic during build)
cmake --build build --target generate_icons

Requires: rsvg-convert (librsvg) and magick (ImageMagick)

Platform Setup

macOS Requirements

  • Xcode Command Line Tools: Required for Swift compiler
  • macOS 14.0+: Minimum deployment target
  • Apple Silicon (ARM64): Target architecture
  • System frameworks: Automatically linked
    • CoreFoundation, AppKit, AudioToolbox, AVFoundation
    • Metal frameworks for GPU acceleration

Dependencies:

# Install build tools via Homebrew
brew install librsvg imagemagick ninja cmake

Windows Requirements

  • Visual Studio 2022: For MSVC compiler and debugging
  • CMake 3.20+: Build system
  • Ninja: Fast builds (included in VS2022)
  • Vulkan SDK: Optional GPU acceleration

Environment setup:

  • Set VULKAN_SDK environment variable for GPU support
  • Use winvs.bat script for proper Visual Studio environment

Windows/WSL Remote Development

For development from macOS to Windows via SSH:

Setup scripts:

  • wsl/start-wsl-ssh.bat - Run as Administrator on Windows
  • wsl/setup-wsl-ssh.sh - Configure SSH in WSL

Sync and build workflow:

# 1. Sync source files (excludes build directories)
rsync -av --exclude='build/' --exclude='build-debug/' --exclude='whisper.cpp/' \
  . [email protected]:/mnt/c/workspaces/yakety/

# 2. Configure build
ssh [email protected] "cd /mnt/c/workspaces/yakety && \
  /mnt/c/Windows/System32/cmd.exe /c 'cd c:\\workspaces\\yakety && \
  c:\\workspaces\\winvs.bat && cmake --preset release'"

# 3. Build
ssh [email protected] "cd /mnt/c/workspaces/yakety && \
  /mnt/c/Windows/System32/cmd.exe /c 'cd c:\\workspaces\\yakety && \
  c:\\workspaces\\winvs.bat && cmake --build --preset release'"

# 4. Run CLI
ssh [email protected] "cd /mnt/c/workspaces/yakety && \
  build/bin/yakety-cli.exe"

Linux Requirements (Experimental)

  • GCC/Clang: C/C++ compiler
  • ALSA/PulseAudio: Audio system libraries
  • CMake 3.20+, Ninja: Build tools

Reference

CMake Presets

From CMakePresets.json:

Configure presets:

  • release - Ninja generator, Release build, build/ directory
  • debug - Ninja generator, Debug build, build-debug/ directory
  • vs-debug - Visual Studio 2022, Windows-only debugging

Build presets:

  • release - Build release configuration
  • debug - Build debug configuration
  • vs-debug - Build Windows VS debug configuration

Whisper.cpp Integration

Automatic dependency management via cmake/BuildWhisper.cmake:

  • Auto-clone: Downloads whisper.cpp from GitHub if missing
  • Platform optimization:
    • macOS: Metal GPU acceleration, ARM64 architecture
    • Windows: Native CPU optimization, optional Vulkan GPU
  • Model download: Automatically downloads ggml-base-q8_0.bin (110MB)
  • Static linking: All whisper libraries statically linked

Code Signing (macOS)

Automatic ad-hoc signing via cmake/PlatformSetup.cmake:

# Manual signing
./sign-app.sh  # Signs and removes quarantine

Troubleshooting

Whisper.cpp build failures:

  • Verify internet connection for auto-download
  • Check disk space (whisper.cpp ~500MB + model ~110MB)
  • On Windows: Ensure Visual Studio environment is loaded

Swift compilation warnings:

  • Incremental compilation disabled via CMAKE_Swift_FLAGS
  • Normal for mixed C/Swift projects

Icon generation failures:

  • Install librsvg: brew install librsvg (macOS) or apt install librsvg2-bin (Linux)
  • Install ImageMagick: brew install imagemagick (macOS)

Windows Vulkan not detected:

  • Install Vulkan SDK and set VULKAN_SDK environment variable
  • Restart command prompt after installation

Linking errors:

  • Clean build directories: rm -rf build build-debug
  • Rebuild whisper.cpp: rm -rf whisper.cpp/build
  • On Windows: Match Debug/Release configuration with whisper.cpp

Development Guide

Overview

Yakety is a voice transcription tool using a C-style C++ architecture with platform abstraction. The codebase follows C conventions with minimal C++ usage (only for whisper.cpp integration). Core features:

  • Cross-platform: macOS (Objective-C/Swift) and Windows (Win32 API)
  • Singleton patterns: Audio recorder, preferences, models
  • Platform abstraction: Clean separation between core logic and platform code
  • Minimal dependencies: Uses system APIs directly

Code Style

C-Style C++ Conventions

File Extensions:

  • .c - Pure C code
  • .cpp - C++ code (only when whisper.cpp features needed)
  • .m - Objective-C (macOS platform layer)
  • .swift - SwiftUI dialogs (macOS only)

Naming Conventions:

// Functions: module_action_object
bool audio_recorder_init(void);           // src/audio.c:82
int keylogger_set_combination(combo);     // Keylogger API
void preferences_set_string(key, value);  // src/preferences.h:24

// Types: CamelCase with descriptive names
typedef struct {
    ma_device device;
    float *buffer;
    bool is_recording;        // Atomic access required
} AudioRecorder;              // src/audio.c:14-32

// Constants: UPPER_CASE
#define WHISPER_SAMPLE_RATE 16000        // src/audio.c:10
#define MIN_RECORDING_DURATION 0.1       // src/main.c:26

C-Style Casting:

AudioRecorder *recorder = (AudioRecorder *) pDevice->pUserData;  // src/audio.c:39
const float *input = (const float *) pInput;                     // src/audio.c:45

Header Guards:

#ifndef AUDIO_H
#define AUDIO_H
// ... content ...
#endif // AUDIO_H

Common Patterns

Singleton Pattern

Audio Recorder (src/audio.h, src/audio.c):

// Global singleton instance
static AudioRecorder *g_recorder = NULL;

bool audio_recorder_init(void) {
    if (g_recorder) {
        return false; // Already initialized
    }
    g_recorder = (AudioRecorder *) calloc(1, sizeof(AudioRecorder));
    // ... initialization ...
}

void audio_recorder_cleanup(void) {
    if (!g_recorder) return;
    // ... cleanup ...
    free(g_recorder);
    g_recorder = NULL;
}

Platform Abstraction

Directory Structure:

src/
├── audio.h/c          # Cross-platform core logic
├── utils.h            # Platform abstraction interface
├── mac/               # macOS implementations
│   ├── app.m          # NSApplication handling
│   ├── utils.m        # Platform-specific utilities
│   └── dialogs/       # SwiftUI dialog implementations
└── windows/           # Windows implementations
    ├── app.c          # Win32 application handling
    └── utils.c        # Platform-specific utilities

Interface Pattern (src/utils.h):

// Cross-platform interface
void utils_open_accessibility_settings(void);
bool utils_set_launch_at_login(bool enabled);
double utils_get_time(void);

// Platform implementations differ:
// - src/mac/utils.m: Uses NSWorkspace, CFAbsoluteTimeGetCurrent
// - src/windows/utils.c: Uses ShellExecute, GetTickCount64

App Initialization Pattern (src/app.h):

typedef void (*AppReadyCallback)(void);
int app_init(const char *name, const char *version, bool is_console, AppReadyCallback on_ready);

// Platform-specific implementations:
// - src/mac/app.m: Uses NSApplication, NSApplicationDelegate
// - src/windows/app.c: Uses CreateWindow, message pump

Thread Safety

Atomic Operations (src/audio.c:41, src/utils.h:43-46):

// Thread-safe boolean access
bool utils_atomic_read_bool(bool *ptr);
void utils_atomic_write_bool(bool *ptr, bool value);

// Usage in audio callback (audio thread → main thread)
if (!utils_atomic_read_bool(&recorder->is_recording)) {
    return;
}

Error Handling

Return Code Pattern:

// Success: 0, Failure: -1 or non-zero
int audio_recorder_start(void);          // src/audio.h:17
int keylogger_init(callbacks, userdata); // Returns 0 on success

// Boolean for simple operations
bool audio_recorder_init(void);          // src/audio.h:10
bool preferences_init(void);             // src/preferences.h:9

Error Logging:

if (ma_device_start(&recorder->device) != MA_SUCCESS) {
    utils_atomic_write_bool(&recorder->is_recording, false);
    return -1;
}

SwiftUI Dialog Pattern

Modal Dialog Implementation (src/mac/dialogs/dialog_utils.swift:18-23):

func runModalDialog<T: View, StateType: ModalDialogState>(
    content: T,
    state: StateType,
    windowSize: NSSize = NSSize(width: 400, height: 200),
    windowTitle: String = ""
) -> StateType.ResultType

Dialog State Protocol:

protocol ModalDialogState: ObservableObject {
    associatedtype ResultType
    var isCompleted: Bool { get set }
    var result: ResultType { get }
    func reset()
}

Workflows

Adding a New Feature

  1. Core Logic - Implement in src/ using C-style conventions
  2. Platform Interface - Add declarations to appropriate header (e.g., src/utils.h)
  3. Platform Implementation - Implement in src/mac/ and src/windows/
  4. Integration - Wire up in src/main.c app lifecycle

Platform-Specific Dialog

  1. macOS: Create SwiftUI view in src/mac/dialogs/
  2. Windows: Implement Win32 dialog in src/windows/dialog.c
  3. Interface: Add C function declaration in src/dialog.h

Audio Processing

Audio pipeline follows whisper.cpp requirements:

#define WHISPER_SAMPLE_RATE 16000  // Fixed 16kHz
#define WHISPER_CHANNELS 1         // Mono only

Recording flow (src/main.c:168-215):

Key Press → audio_recorder_start() → data_callback() fills buffer
Key Release → audio_recorder_stop() → get_samples() → transcription_process()

Reference

File Organization

Core Modules:

  • src/main.c - App entry point and lifecycle (329-391)
  • src/audio.c/h - Audio recording singleton
  • src/preferences.c/h - Configuration management
  • src/models.c/h - Whisper model loading
  • src/transcription.cpp/h - Whisper.cpp integration (C++)

Platform Abstraction:

  • src/utils.h - Cross-platform interface definitions
  • src/app.h - Application framework interface
  • src/mac/ - macOS implementations (Objective-C/Swift)
  • src/windows/ - Windows implementations (Win32 C)

Build System:

  • CMakeLists.txt - Main build configuration
  • cmake/PlatformSetup.cmake - Platform-specific setup
  • cmake/BuildWhisper.cmake - Whisper.cpp integration

Key Constants

#define WHISPER_SAMPLE_RATE 16000           // Audio format for transcription
#define WHISPER_CHANNELS 1                  // Mono audio
#define MIN_RECORDING_DURATION 0.1          // Minimum recording length
#define PERMISSION_RETRY_DELAY_MS 500       // macOS permission retry delay

Build Presets

Development:

cmake --preset debug    # Debug build with Ninja
cmake --preset release  # Release build with Ninja

Windows Debugging:

cmake --preset vs-debug # Visual Studio generator for debugging

Common Issues

macOS Accessibility Permissions:

  • Handle in src/main.c:78-117 with dialog prompts
  • Retry mechanism for permission granting

Thread Safety:

  • Audio callback runs on separate thread
  • Use utils_atomic_* for shared state access
  • Main app state in src/main.c:28-33

Model Loading:

  • Single unified function: models_load() in src/models.c
  • Handles download dialogs and fallback logic
  • Path management through preferences system

Memory Management:

  • Consistent use of malloc/free for C compatibility
  • Audio buffer auto-resizing in src/audio.c:54-66
  • Caller owns returned buffers (e.g., audio_recorder_get_samples)

Testing Documentation

Overview

Yakety uses manual interactive tests for GUI components and dialog validation. All test programs are located in /src/tests/ and test specific platform dialog implementations using the app's GUI framework integration.

Test File Locations: /src/tests/test_*.c

Test Types

Dialog Integration Tests

Manual GUI tests that validate platform-specific dialog implementations:

  • /src/tests/test_model_dialog.c - Models & Language selection dialog
  • /src/tests/test_keycombination_dialog.c - Hotkey capture dialog with keylogger integration
  • /src/tests/test_download_dialog.c - Model download progress dialog

Test Structure

All tests follow this pattern:

  • Initialize platform app framework (app_init)
  • Set up test-specific dependencies (keylogger, callbacks)
  • Execute dialog function with test parameters
  • Validate results and clean up resources
  • Exit with status code

Running Tests

Prerequisites

# Configure and build project first
cmake --preset release  # or debug
cmake --build --preset release

Test Execution Commands

Model Dialog Test:

./build/bin/test-model-dialog

Expected: GUI dialog opens for model/language selection, prints selected values or cancellation.

Key Combination Dialog Test:

./build/bin/test-keycombination-dialog

Expected: GUI dialog captures key combinations, prints key codes and modifier flags.

Download Dialog Test:

./build/bin/test-download-dialog

Expected: Downloads test file (1KB from httpbin.org), shows progress dialog, cleans up temp file.

Platform Availability

Tests are only built and available on macOS (requires Cocoa/SwiftUI frameworks). Windows tests would require separate implementations using platform-specific dialog APIs.

Reference

Test File Organization

src/tests/
├── test_model_dialog.c         # Model selection dialog test
├── test_keycombination_dialog.c # Hotkey capture dialog test  
└── test_download_dialog.c      # Download progress dialog test

CMake Test Targets

Test executables are defined in /CMakeLists.txt lines 502-532:

# Test programs (macOS only)
add_executable(test-model-dialog src/tests/test_model_dialog.c)
add_executable(test-keycombination-dialog src/tests/test_keycombination_dialog.c)  
add_executable(test-download-dialog src/tests/test_download_dialog.c)

Test Dependencies

  • Platform library: Core app and dialog functions
  • Cocoa framework: macOS GUI integration (-framework Cocoa)
  • SwiftUI framework: Modern dialog implementations (-framework SwiftUI)
  • Keylogger: Required for hotkey capture testing

Test Output Location

All test executables built to: /build/bin/test-*

Creating New Tests

  1. Add test source file in /src/tests/test_<feature>.c
  2. Follow existing test pattern with app_init, test logic, app_cleanup
  3. Add CMake target in main /CMakeLists.txt (macOS section)
  4. Link required platform libraries and frameworks
  5. Ensure proper cleanup and exit codes for automation

Deployment Documentation

Overview

Yakety provides multiple packaging and distribution options for cross-platform deployment. The build system includes automated packaging targets for creating distributable archives, DMG installers, and deployment to remote servers.

Package Types

CLI Distribution Packages

  • Target: package-cli-macos | package-cli-windows
  • Output: yakety-cli-{platform}.zip
  • Location: ${CMAKE_BINARY_DIR}/
  • Contents: CLI tools (yakety-cli, recorder, transcribe), models, assets

App Distribution Packages

  • Target: package-app-macos | package-app-windows
  • Output: macOS: Yakety-macos.dmg + Yakety-macos.zip | Windows: Yakety-windows.zip
  • Location: ${CMAKE_BINARY_DIR}/
  • Contents: Application bundles with embedded resources

Universal Package Target

  • Target: package
  • Behavior: Platform-conditional (package-macos on Darwin, package-windows on Windows)

Platform Deployment

macOS

# Build release binaries
cmake --preset release
cmake --build --preset release

# Create CLI distribution
cmake --build build --target package-cli-macos
# Output: build/yakety-cli-macos.zip

# Create app distribution with DMG
cmake --build build --target package-app-macos
# Output: build/Yakety-macos.dmg, build/Yakety-macos.zip

# Create all packages
cmake --build build --target package-macos

DMG Creation Process:

  1. Copies app bundle to temp directory
  2. Creates Applications symlink for drag-install
  3. Generates DMG with hdiutil create
  4. Creates compressed ZIP of DMG

Code Signing: Automatic ad-hoc signing with codesign --force --deep --sign -

Windows

# Build release binaries
cmake --preset release
cmake --build --preset release

# Create CLI distribution
cmake --build build --target package-cli-windows
# Output: build/yakety-cli-windows.zip

# Create app distribution
cmake --build build --target package-app-windows
# Output: build/Yakety-windows.zip

# Create all packages
cmake --build build --target package-windows

Windows-specific:

  • CLI executable: yakety-cli.exe
  • GUI executable: Yakety.exe (WIN32 app without console)
  • Vulkan acceleration support (if VULKAN_SDK available)

Remote Deployment

# Upload packages to server
cmake --build build --target upload

Upload Destinations:

  • Target Server: [email protected]
  • Path: /home/badlogic/mariozechner.at/html/uploads/
  • Method: Windows: SCP via batch script | Unix: rsync

Website Deployment

Frontend-only Deploy:

cd website
./publish.sh

Full Deploy with Server Restart:

cd website
./publish.sh server

Docker Control:

cd website/docker
./control.sh start      # Production mode
./control.sh startdev   # Development mode
./control.sh stop       # Stop services
./control.sh logs       # View logs
./control.sh restart    # Restart services

Reference

Build Outputs

  • Binary Directory: ${CMAKE_BINARY_DIR}/bin/
  • CLI Tools: yakety-cli, recorder, transcribe
  • GUI Apps: Yakety.app (macOS bundle) | Yakety.exe (Windows)
  • Models: bin/models/ggml-base-q8_0.bin
  • Assets: bin/menubar.png

Distribution Archives

  • macOS CLI: yakety-cli-macos.zip
  • macOS App: Yakety-macos.dmg, Yakety-macos.zip
  • Windows CLI: yakety-cli-windows.zip
  • Windows App: Yakety-windows.zip

Website Configuration

  • Production Domain: yakety.ai, www.yakety.ai
  • SSL: Let's Encrypt via nginx-proxy
  • Server Stack: Docker (Nginx + Node.js)
  • Deployment: rsync to slayer.marioslab.io

Build Presets

  • Release: cmake --preset release (Ninja, optimized)
  • Debug: cmake --preset debug (Ninja, debugging symbols)
  • VS Debug: cmake --preset vs-debug (Visual Studio, Windows only)

WSL/Remote Development

  • Target: Windows machine at 192.168.1.21
  • Sync Command: rsync -av --exclude='build/' --exclude='whisper.cpp/' . [email protected]:/mnt/c/workspaces/yakety/
  • Build via SSH: Uses cmd.exe with winvs.bat environment setup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment