This is an proposal for how to label code that is AI-generated from code that is human authors.
These annotations are useful for human-driven code review, allowing code reviews to have more context on where the code is coming from. These annotations are also useful for AI and code applications. AIs can find places generated code can be optimized and future models can better distinguish generated code from human code.
Any file with annotated AI code must contain @ai-generated
in a comment within the file. AI generated sections are labelled with a preceeding line and following line, taking inspiration from projects like HackCodegen.
-
The preceding line comment must begin with
BEGIN AI SECTION
and the following comments line must begin withEND AI SECTION
. -
The comment format uses the line comment format native to the given programming language. Shell langauges and Python comments with given with
#
followed by space, and C-style languages including JavaScript, Java, and Rust will be//
followed by a space.
AI generated section can also contain metadata around how the code was generated. Common properties include the model
used such as gpt-4
and the prompt
.
-
Properties are cascading. Properties defined in the
@ai-generated
section will apply to all code section, unless overrides with properties defined in theBEGIN AI SECTION
comment. -
Properties are key-value assignments in TOML format. This format allows for multi-strings, show in the examples here.
Labeling AI-generated code helps in the code review process by providing more context on the source of the code. This information can be useful for both human reviewers and AI applications. Human reviewers can better understand the code's origin, while AI applications can use these labels to optimize generated code or improve the distinction between human and AI-generated code.
To label AI-generated code, you must include an @ai-generated
comment within the file. You should also mark the beginning and end of the AI-generated section using line comments. For example:
# @ai-generated model="gpt-4" prompt="Write a simple function to add two numbers."
# BEGIN AI SECTION
def add(a, b):
return a + b
# END AI SECTION
Yes, you can include metadata related to the AI-generated code. Common properties to include are the model
(e.g., gpt-4
) and the prompt
used to generate the code. These properties are defined using key-value assignments in TOML format.
TOML (Tom's Obvious, Minimal Language) is a simple and easy-to-read configuration file format. It uses key-value assignments and can handle multi-line strings. In the context of labeling AI-generated code, TOML is used to define properties such as model
and prompt
.
Properties defined in the @ai-generated
section apply to all AI-generated code sections within the file, unless they are overridden by properties defined in the BEGIN AI SECTION
comment. This means that if a property is set in the @ai-generated
section, it will be used for all AI-generated code sections unless specifically overridden for a particular section.