snapshot testing: the cure in the AI software generation era

Open Table of Contents

Background
Proposal of solution
Dev projects are workflows: Pipeline scaffolding - Node oriented vibe coding

Background

A typical dev workflow:

Know your goal
(Ask an AI Agent to) Implement the code
Make sure the code works
Fix bugs, add features
Keep making sure the code works

The challenge is that

We don’t know if the code from AI will work
We don’t know if the code will keep working
Verifying if the code works involves reading and writing large amount of data

So we are fear of changing code.

Proposal of solution

Snapshot testing: create scripts that generating a snapshot, which reflects the property you want to maintain
Use rg/sed etc to reduce the checked in snapshot file size
Before coding/asking ai to code, define the spec and the snapshot testing process

Every snapshot testing workflow should has a 1 bash file entry, we don’t want to remember arguments.

Example:

scripts/test_1.sh


export RUNTIME_DIR=$PWD/local_data/test_1
export REPO_DIR=$PWD
export SNAPSHOT_DIR=$REPO_DIR/snapshots/test_1

cd $RUNTIME_DIR

cat << EOF > config.json
{
    "repo_dir": "$REPO_DIR",
    "runtime_dir": "$RUNTIME_DIR"
}
EOF

$REPO_DIR/build/cpp_executable --config ./config.json | rg "Results:" -A 100 > $SNAPSHOT_DIR/test_1.received.txt

diff $SNAPSHOT_DIR/test_1.received.txt $SNAPSHOT_DIR/test_1.approved.txt

Dev projects are workflows: Pipeline scaffolding - Node oriented vibe coding

For any dev projects, there is limited workflows, such as

build
test
benchmark
integration test

If you look this from a higher level you can see they are consists of Nodes

build = core code + application code + compile
test = core code + application code + test code + compile + run test
benchmark = core code + application code + run_benchmark + benchmark input
integration test = core code + application code + test input gen + run test + test output verification

And you will find the number of workflows are quite limited, about O(Number of Nodes)

So I propose a way of setting up your project, especially if you want to vibe coding:

./scripts/
        compile.sh
        run_test.sh
./workflows/
        1.sh -> compile.sh
        2.sh -> compile.sh + run_test.sh

Then instruct AI to run workflows/*.sh every time it finishes some task

The core idea here is, in the workflow, you can define data input gen, data output verification easily using Python etc. You can leverage pre-seeded random data generator. This part of the code might be 200 lines. And let AI worry about the middle 20K lines.

Defining DAG or pipeline is not a new idea whatsoever,