AI Filter Validation on Backtest Data

TradeJS can turn AI-enabled backtests into reusable datasets for AI filter validation.

Instead of treating AI review as a live-only gate, you can capture replayable AI rows during backtests and run the same historical trades through updated prompts, models, and approval thresholds later.

This replay is historical, not provider-free. By default, ai-train sends prompt requests to the configured AI provider again; only --localOnly switches replay into a deterministic local mode without provider calls.

Why It Matters

Iterate on prompt behavior without rerunning the full market simulation every time.
Compare models, providers, and quality thresholds on the same historical sample.
Measure false accepts and false rejects before changing live AI gating.
Keep AI evaluation attached to realized trade outcomes, not synthetic examples.

What TradeJS Records During Backtests

When AI dataset export is enabled in backtests, TradeJS writes per-trade rows with:

signal identity, symbol, direction, and timestamp
strategy name
structured AI payload used to rebuild strategy prompts later
realized trade profit for historical scoring
optional test metadata for backtest traceability

Rows are written into per-worker chunk files and later merged into one replay dataset.

How It Works

Run a backtest with AI dataset export enabled.
Merge worker chunk files into one timestamped dataset.
Replay that dataset through the current AI prompt logic.
Compare approval decisions against realized outcomes before changing live gating.

During replay, TradeJS rebuilds the strategy prompt pair from the saved signal and payload context, so prompt and adapter changes can be evaluated on the same historical sample.

Reproducible CLI Flow

npx @tradejs/cli backtest --config TrendLine:base --ai
npx @tradejs/cli ai-export
npx @tradejs/cli ai-train -n 50 --minQuality 4

Artifacts:

backtest --ai writes data/ai/export/ai-dataset-<strategy>-chunk-<chunkId>.jsonl
ai-export merges them into data/ai/export/ai-dataset-<strategy>-merged-<timestamp>.jsonl
ai-train replays rows from the latest merged file by default

Important:

default replay still calls your configured AI provider
--localOnly is the provider-free deterministic gate mode

Useful ai-train flags:

-n, --recent evaluate the latest N trades from the end (0 = all rows)
--minQuality minimum AI quality threshold required to approve entry
-s, --strategy pick the latest merged file for one strategy
-f, --file replay a specific merged dataset file

What You Can Validate

prompt changes in strategy aiAdapter
provider or model changes
minQuality threshold tuning
agreement with the original strategy direction
tp / fp / tn / fn behavior for approval vs realized profitability
deterministic local gate experiments with --localOnly

How `ai-train` Scores Approval

a trade is approved only when AI returns the same direction as the original signal and quality >= minQuality
historical correctness is measured against realized trade outcome (profit > 0)

Core Metrics

approval rate
precision by quality bucket
impact on net expectancy proxy
disagreement rate with strategy direction
tp / fp / tn / fn counts for approval vs realized profitability

Cost-Saving Option

For cost-sensitive manual review loops, some teams run and inspect ai-train from coding agents such as OpenAI Codex or Claude Code instead of building every replay iteration around raw API calls in their own tooling.

This can make prompt-review cycles faster, and depending on your plan, included limits, and usage pattern, it can sometimes be cheaper than running the same workflow through direct per-request API usage. Treat that as an operational option, not a guaranteed pricing advantage, and verify current pricing for your Codex, Claude Code, and provider plans before standardizing the workflow.

Recommended Evaluation Flow

Change prompt logic or adapter rules in code.
Replay the same historical rows.
Check whether approval coverage and realized quality improve.
Promote the new gate only after the tradeoff is defensible.

That makes AI gating auditable, repeatable, and easier to discuss with strategy authors than ad-hoc prompt testing.

Why It Matters​

What TradeJS Records During Backtests​

How It Works​

Reproducible CLI Flow​

What You Can Validate​

How ai-train Scores Approval​

Core Metrics​

Cost-Saving Option​

Recommended Evaluation Flow​

See Also​