flowchart TD
%% Style definitions
classDef development fill:#F8F4FF,stroke:#6B46C1,stroke-width:2px,color:#1F2937
classDef evaluation fill:#F0FDF4,stroke:#16A34A,stroke-width:2px,color:#1F2937
classDef ci fill:#FEF2F2,stroke:#DC2626,stroke-width:2px,color:#1F2937
classDef deployment fill:#F0F9FF,stroke:#2563EB,stroke-width:2px,color:#1F2937
classDef monitoring fill:#FFFBEB,stroke:#D97706,stroke-width:2px,color:#1F2937
classDef analysis fill:#FAFAF9,stroke:#78716C,stroke-width:2px,color:#1F2937
classDef improvement fill:#FFF7ED,stroke:#EA580C,stroke-width:2px,color:#1F2937
%% Main cycle nodes
A["π οΈ 1. Develop & Analyze (Initial)<br/><br/>π₯ Inputs:<br/>β’ Initial prompts<br/>β’ RAG configuration<br/>β’ Connected tools<br/><br/>π€ Outputs:<br/>β’ Initial failure modes<br/>β’ Error analysis (1st pass)"]
B["π 2. Measure & Build Evaluations<br/><br/>π― Actions:<br/>β’ Translate failures to metrics<br/>β’ Create automated evaluators<br/>β’ Build initial golden dataset<br/>β’ Collaborative evaluation<br/><br/>π Artifacts:<br/>β’ Golden dataset<br/>β’ Evaluators<br/>β’ Metrics"]
C["βοΈ 3. Integrate Evaluation in CI<br/><br/>π§ Actions:<br/>β’ Integrate evaluators as tests<br/>β’ Add golden dataset as regressions<br/><br/>π Artifacts:<br/>β’ CI pipeline with tests"]
D["π 4. Deploy (CD) with Observability<br/><br/>π‘ Actions:<br/>β’ Instrumented deployment<br/>β’ Evaluators on real traffic<br/>β’ LLM-Judge pinning"]
E["π 5. Monitor Online Performance<br/><br/>π Actions:<br/>β’ Track quality metrics (ΞΈ^, CI)<br/>β’ Dashboards + alerts<br/>β’ Product metrics<br/>β’ User feedback"]
F["π¨ 6. Identify Deviations/New Failures<br/><br/>π Actions:<br/>β’ Drift detection<br/>β’ Feedback analysis<br/>β’ Proactive discovery<br/>β’ Human sampling"]
G["π§ 7. Re-Analyze (New Errors)<br/><br/>π¬ Actions:<br/>β’ Error analysis on problem traces<br/>β’ Recent issues identified"]
H["π 8. Update Evaluation Artifacts<br/><br/>β¨ Actions:<br/>β’ Add to golden dataset<br/>β’ Refine evaluators<br/>β’ Validate LLM-judge (TPR/TNR)<br/>β’ Update CI tests"]
I["β‘ 9. Improve Pipeline<br/><br/>π¨ Strategies:<br/>β’ Refine prompts<br/>β’ Decompose tasks<br/>β’ Adjust RAG<br/>β’ Improve tools<br/>β’ Judicious fine-tuning"]
J["π 10. Redeploy & Iterate<br/><br/>π Actions:<br/>β’ Launch improved version<br/>β’ Resume monitoring"]
%% Main cycle flow
A --> B
B --> C
C --> D
D --> E
E --> F
F --> G
G --> H
H --> C
H --> I
I --> J
J --> D
%% Feedback connections
F -.-> E
G -.-> B
I -.-> A
%% Apply styles
class A development
class B evaluation
class C ci
class D deployment
class E monitoring
class F analysis
class G analysis
class H evaluation
class I improvement
class J deployment
%% Explanatory notes (left side)
K["π‘ CYCLE CHARACTERISTICS<br/><br/>π Non-linear loop:<br/> Phases can be executed<br/> in different order by context<br/><br/>β‘ Step skipping:<br/> Possible to omit phases<br/> when not necessary<br/><br/>π Internal cycles:<br/> Iterations within<br/> process subsections<br/><br/>π‘ Continuous feedback:<br/> Information flows in<br/> multiple directions"]
%% Position notes to the left
K ~~~ A
style K fill:#FAFAFA,stroke:#6B7280,stroke-width:1px,color:#374151,stroke-dasharray: 3 3
Created
June 9, 2025 17:45
-
-
Save tomaslucas/9b54565a1c92e62a4673e1e2a89c58fa to your computer and use it in GitHub Desktop.
# Graphical representation of the Continuous Improvement Flywheel
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment