snapshot testing: the cure in the AI software generation era
Table of Contents
Open Table of Contents
Background
A typical dev workflow:
- Know your goal
- (Ask an AI Agent to) Implement the code
- Make sure the code works
- Fix bugs, add features
- Keep making sure the code works
The challenge is that
- We don’t know if the code from AI will work
- We don’t know if the code will keep working
- Verifying if the code works involves reading and writing large amount of data
So we are fear of changing code.
Proposal of solution
- Snapshot testing: create scripts that generating a snapshot, which reflects the property you want to maintain
- Use rg/sed etc to reduce the checked in snapshot file size
- Before coding/asking ai to code, define the spec and the snapshot testing process
Every snapshot testing workflow should has a 1 bash file entry, we don’t want to remember arguments.
Example:
scripts/test_1.sh
export RUNTIME_DIR=$PWD/local_data/test_1
export REPO_DIR=$PWD
export SNAPSHOT_DIR=$REPO_DIR/snapshots/test_1
cd $RUNTIME_DIR
cat << EOF > config.json
{
"repo_dir": "$REPO_DIR",
"runtime_dir": "$RUNTIME_DIR"
}
EOF
$REPO_DIR/build/cpp_executable --config ./config.json | rg "Results:" -A 100 > $SNAPSHOT_DIR/test_1.received.txt
diff $SNAPSHOT_DIR/test_1.received.txt $SNAPSHOT_DIR/test_1.approved.txt
Dev projects are workflows: Pipeline scaffolding - Node oriented vibe coding
For any dev projects, there is limited workflows, such as
- build
- test
- benchmark
- integration test
If you look this from a higher level you can see they are consists of Nodes
- build = core code + application code + compile
- test = core code + application code + test code + compile + run test
- benchmark = core code + application code + run_benchmark + benchmark input
- integration test = core code + application code + test input gen + run test + test output verification
And you will find the number of workflows are quite limited, about O(Number of Nodes)
So I propose a way of setting up your project, especially if you want to vibe coding:
./scripts/
compile.sh
run_test.sh
./workflows/
1.sh -> compile.sh
2.sh -> compile.sh + run_test.sh
Then instruct AI to run workflows/*.sh every time it finishes some task
The core idea here is, in the workflow, you can define data input gen, data output verification easily using Python etc. You can leverage pre-seeded random data generator. This part of the code might be 200 lines. And let AI worry about the middle 20K lines.
Defining DAG or pipeline is not a new idea whatsoever,