Readme Example

This is an proposal for how to label code that is AI-generated from code that is human authors.

These annotations are useful for human-driven code review, allowing code reviews to have more context on where the code is coming from. These annotations are also useful for AI and code applications. AIs can find places generated code can be optimized and future models can better distinguish generated code from human code.

Core Spec

Any file with annotated AI code must contain @ai-generated in a comment within the file. AI generated sections are labelled with a preceeding line and following line, taking inspiration from projects like HackCodegen.

The preceding line comment must begin with BEGIN AI SECTION and the following comments line must begin with END AI SECTION.
The comment format uses the line comment format native to the given programming language. Shell langauges and Python comments with given with # followed by space, and C-style languages including JavaScript, Java, and Rust will be // followed by a space.

Metadata

AI generated section can also contain metadata around how the code was generated. Common properties include the model used such as gpt-4 and the prompt.

Properties are cascading. Properties defined in the @ai-generated section will apply to all code section, unless overrides with properties defined in the BEGIN AI SECTION comment.
Properties are key-value assignments in TOML format. This format allows for multi-strings, show in the examples here.

FAQ

What is the purpose of labeling AI-generated code?

Labeling AI-generated code helps in the code review process by providing more context on the source of the code. This information can be useful for both human reviewers and AI applications. Human reviewers can better understand the code's origin, while AI applications can use these labels to optimize generated code or improve the distinction between human and AI-generated code.

How do I label AI-generated code?

To label AI-generated code, you must include an @ai-generated comment within the file. You should also mark the beginning and end of the AI-generated section using line comments. For example:

# @ai-generated model="gpt-4" prompt="Write a simple function to add two numbers."
# BEGIN AI SECTION
def add(a, b):
    return a + b
# END AI SECTION

Can I include metadata in the AI-generated code labels?

Yes, you can include metadata related to the AI-generated code. Common properties to include are the model (e.g., gpt-4) and the prompt used to generate the code. These properties are defined using key-value assignments in TOML format.

What is the TOML format?

TOML (Tom's Obvious, Minimal Language) is a simple and easy-to-read configuration file format. It uses key-value assignments and can handle multi-line strings. In the context of labeling AI-generated code, TOML is used to define properties such as model and prompt.

How do properties cascade in AI-generated code labels?

Properties defined in the @ai-generated section apply to all AI-generated code sections within the file, unless they are overridden by properties defined in the BEGIN AI SECTION comment. This means that if a property is set in the @ai-generated section, it will be used for all AI-generated code sections unless specifically overridden for a particular section.

zurawiki/README.md