AI Filter Validation on Backtest Data
TradeJS can turn AI-enabled backtests into reusable datasets for AI filter validation.
Instead of treating AI review as a live-only gate, you can capture replayable AI rows during backtests and run the same historical trades through updated prompts, models, and approval thresholds later.
This replay is historical, not provider-free. By default, ai-train sends prompt requests to the configured AI provider again; only --localOnly switches replay into a deterministic local mode without provider calls.
Why It Matters
- Iterate on prompt behavior without rerunning the full market simulation every time.
- Compare models, providers, and quality thresholds on the same historical sample.
- Measure false accepts and false rejects before changing live AI gating.
- Keep AI evaluation attached to realized trade outcomes, not synthetic examples.
What TradeJS Records During Backtests
When AI dataset export is enabled in backtests, TradeJS writes per-trade rows with:
- signal identity, symbol, direction, and timestamp
- strategy name
- structured AI payload used to rebuild strategy prompts later
- realized trade profit for historical scoring
- optional test metadata for backtest traceability
Rows are written into per-worker chunk files and later merged into one replay dataset.
How It Works
- Run a backtest with AI dataset export enabled.
- Merge worker chunk files into one timestamped dataset.
- Replay that dataset through the current AI prompt logic.
- Compare approval decisions against realized outcomes before changing live gating.
During replay, TradeJS rebuilds the strategy prompt pair from the saved signal and payload context, so prompt and adapter changes can be evaluated on the same historical sample.
Reproducible CLI Flow
npx @tradejs/cli backtest --config TrendLine:base --ai
npx @tradejs/cli ai-export
npx @tradejs/cli ai-train -n 50 --minQuality 4
Artifacts:
backtest --aiwritesdata/ai/export/ai-dataset-<strategy>-chunk-<chunkId>.jsonlai-exportmerges them intodata/ai/export/ai-dataset-<strategy>-merged-<timestamp>.jsonlai-trainreplays rows from the latest merged file by default
Important:
- default replay still calls your configured AI provider
--localOnlyis the provider-free deterministic gate mode
Useful ai-train flags:
-n, --recentevaluate the latest N trades from the end (0= all rows)--minQualityminimum AI quality threshold required to approve entry-s, --strategypick the latest merged file for one strategy-f, --filereplay a specific merged dataset file
What You Can Validate
- prompt changes in strategy
aiAdapter - provider or model changes
minQualitythreshold tuning- agreement with the original strategy direction
tp / fp / tn / fnbehavior for approval vs realized profitability- deterministic local gate experiments with
--localOnly
How ai-train Scores Approval
- a trade is approved only when AI returns the same direction as the original signal and
quality >= minQuality - historical correctness is measured against realized trade outcome (
profit > 0)
Core Metrics
- approval rate
- precision by quality bucket
- impact on net expectancy proxy
- disagreement rate with strategy direction
tp / fp / tn / fncounts for approval vs realized profitability
Cost-Saving Option
For cost-sensitive manual review loops, some teams run and inspect ai-train from coding agents such as OpenAI Codex or Claude Code instead of building every replay iteration around raw API calls in their own tooling.
This can make prompt-review cycles faster, and depending on your plan, included limits, and usage pattern, it can sometimes be cheaper than running the same workflow through direct per-request API usage. Treat that as an operational option, not a guaranteed pricing advantage, and verify current pricing for your Codex, Claude Code, and provider plans before standardizing the workflow.
Recommended Evaluation Flow
- Change prompt logic or adapter rules in code.
- Replay the same historical rows.
- Check whether approval coverage and realized quality improve.
- Promote the new gate only after the tradeoff is defensible.
That makes AI gating auditable, repeatable, and easier to discuss with strategy authors than ad-hoc prompt testing.