Skip to content

Instantly share code, notes, and snippets.

@tomaslucas
Created June 9, 2025 17:45
Show Gist options
  • Save tomaslucas/9b54565a1c92e62a4673e1e2a89c58fa to your computer and use it in GitHub Desktop.
Save tomaslucas/9b54565a1c92e62a4673e1e2a89c58fa to your computer and use it in GitHub Desktop.
# Graphical representation of the Continuous Improvement Flywheel
flowchart TD
    %% Style definitions
    classDef development fill:#F8F4FF,stroke:#6B46C1,stroke-width:2px,color:#1F2937
    classDef evaluation fill:#F0FDF4,stroke:#16A34A,stroke-width:2px,color:#1F2937
    classDef ci fill:#FEF2F2,stroke:#DC2626,stroke-width:2px,color:#1F2937
    classDef deployment fill:#F0F9FF,stroke:#2563EB,stroke-width:2px,color:#1F2937
    classDef monitoring fill:#FFFBEB,stroke:#D97706,stroke-width:2px,color:#1F2937
    classDef analysis fill:#FAFAF9,stroke:#78716C,stroke-width:2px,color:#1F2937
    classDef improvement fill:#FFF7ED,stroke:#EA580C,stroke-width:2px,color:#1F2937

    %% Main cycle nodes
    A["πŸ› οΈ 1. Develop & Analyze (Initial)<br/><br/>πŸ“₯ Inputs:<br/>β€’ Initial prompts<br/>β€’ RAG configuration<br/>β€’ Connected tools<br/><br/>πŸ“€ Outputs:<br/>β€’ Initial failure modes<br/>β€’ Error analysis (1st pass)"]
    
    B["πŸ“Š 2. Measure & Build Evaluations<br/><br/>🎯 Actions:<br/>β€’ Translate failures to metrics<br/>β€’ Create automated evaluators<br/>β€’ Build initial golden dataset<br/>β€’ Collaborative evaluation<br/><br/>πŸ“ Artifacts:<br/>β€’ Golden dataset<br/>β€’ Evaluators<br/>β€’ Metrics"]
    
    C["βš™οΈ 3. Integrate Evaluation in CI<br/><br/>πŸ”§ Actions:<br/>β€’ Integrate evaluators as tests<br/>β€’ Add golden dataset as regressions<br/><br/>πŸ“‹ Artifacts:<br/>β€’ CI pipeline with tests"]
    
    D["πŸš€ 4. Deploy (CD) with Observability<br/><br/>πŸ“‘ Actions:<br/>β€’ Instrumented deployment<br/>β€’ Evaluators on real traffic<br/>β€’ LLM-Judge pinning"]
    
    E["πŸ“ˆ 5. Monitor Online Performance<br/><br/>πŸ‘€ Actions:<br/>β€’ Track quality metrics (ΞΈ^, CI)<br/>β€’ Dashboards + alerts<br/>β€’ Product metrics<br/>β€’ User feedback"]
    
    F["🚨 6. Identify Deviations/New Failures<br/><br/>πŸ” Actions:<br/>β€’ Drift detection<br/>β€’ Feedback analysis<br/>β€’ Proactive discovery<br/>β€’ Human sampling"]
    
    G["🧠 7. Re-Analyze (New Errors)<br/><br/>πŸ”¬ Actions:<br/>β€’ Error analysis on problem traces<br/>β€’ Recent issues identified"]
    
    H["πŸ”„ 8. Update Evaluation Artifacts<br/><br/>✨ Actions:<br/>β€’ Add to golden dataset<br/>β€’ Refine evaluators<br/>β€’ Validate LLM-judge (TPR/TNR)<br/>β€’ Update CI tests"]
    
    I["⚑ 9. Improve Pipeline<br/><br/>🎨 Strategies:<br/>β€’ Refine prompts<br/>β€’ Decompose tasks<br/>β€’ Adjust RAG<br/>β€’ Improve tools<br/>β€’ Judicious fine-tuning"]
    
    J["πŸ” 10. Redeploy & Iterate<br/><br/>πŸš€ Actions:<br/>β€’ Launch improved version<br/>β€’ Resume monitoring"]

    %% Main cycle flow
    A --> B
    B --> C
    C --> D
    D --> E
    E --> F
    F --> G
    G --> H
    H --> C
    H --> I
    I --> J
    J --> D

    %% Feedback connections
    F -.-> E
    G -.-> B
    I -.-> A

    %% Apply styles
    class A development
    class B evaluation
    class C ci
    class D deployment
    class E monitoring
    class F analysis
    class G analysis
    class H evaluation
    class I improvement
    class J deployment

    %% Explanatory notes (left side)
    K["πŸ’‘ CYCLE CHARACTERISTICS<br/><br/>πŸ”„ Non-linear loop:<br/>   Phases can be executed<br/>   in different order by context<br/><br/>⚑ Step skipping:<br/>   Possible to omit phases<br/>   when not necessary<br/><br/>πŸ” Internal cycles:<br/>   Iterations within<br/>   process subsections<br/><br/>πŸ“‘ Continuous feedback:<br/>   Information flows in<br/>   multiple directions"]
    
    %% Position notes to the left
    K ~~~ A
    
    style K fill:#FAFAFA,stroke:#6B7280,stroke-width:1px,color:#374151,stroke-dasharray: 3 3
Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment