Repo-Local Python Helpers
AgentV’s Python surface currently starts as a repo-local helper example, not a separate runner or published package.
- It mirrors the existing AgentV YAML and stdin/stdout wire shapes.
- It writes canonical YAML and JSONL.
- It still runs evaluations through the AgentV CLI.
The helper lives in examples/features/sdk-python/.
agentv_py.graderwraps Pythoncode-graderscripts over canonicalsnake_casefields.agentv_py.evalsbuilds AgentV-shaped eval definitions and JSONL datasets.run_agentv_eval()shells out toagentv evalor the repo source CLI.
Canonical fields only
Section titled “Canonical fields only”Deprecated wire aliases like output_text, input_text, and reference_answer are not accepted as stdin fields by the Python helper.
Use canonical fields instead:
inputinput_filesoutputexpected_outputtracetrace_summary
Example
Section titled “Example”from agentv_py.grader import Assertion, CodeGraderResult, define_code_grader
def evaluate(context): actual = context.output or "" expected = context.expected_output[0]["content"] passed = actual.strip() == expected.strip() return CodeGraderResult( score=1.0 if passed else 0.0, assertions=[ Assertion( text="Candidate output matches expected output", passed=passed, ) ], )
if __name__ == "__main__": define_code_grader(evaluate)Authoring evals
Section titled “Authoring evals”from agentv_py.evals import EvalDefinition, JsonlCase, write_eval_yaml, write_jsonl
write_jsonl( "evals/dataset.jsonl", [ JsonlCase( id="hello", input=[{"role": "user", "content": "Reply with exactly: hi"}], expected_output=[{"role": "assistant", "content": "hi"}], ) ],)
write_eval_yaml( "evals/dataset.eval.yaml", EvalDefinition( name="python-helper", execution={"target": "local_cli"}, tests="./dataset.jsonl", ),)This keeps Python aligned with existing AgentV files instead of introducing a separate code-first definition language.