ML Pipeline and Configuration
Main Flow
- Backtests can write ML chunk files (
--ml). npx @tradejs/cli ml-exportmerges chunk files into one JSONL dataset.- Train script builds derived windows (
holdout,prod,walk-forward). - Python train container generates model artifacts and reports.
- Runtime inference uses gRPC (
ML_GRPC_ADDRESS).
Real Commands
npx @tradejs/cli backtest --config trendline --ml
npx @tradejs/cli ml-export
npx @tradejs/cli ml-inspect
npx @tradejs/cli ml-train:trendline:xgboost
Useful .env Example
ML_GRPC_ADDRESS=127.0.0.1:50051
ML_TRAIN_RECENT_DAYS=60
ML_TRAIN_TEST_DAYS=30
ML_TRAIN_WALK_FORWARD_FOLDS=2
ML_TRAIN_FEATURE_PROFILE=all
ML_TRAIN_FEATURE_SET=enriched
ML_TRAIN_ENSEMBLE=1
ML_TRAIN_ENSEMBLE_MEMBERS=3
Dataset File Patterns
Source chunk files:
ml-dataset-<strategy>-<chunkId>.jsonl
Derived split files (generated automatically):
*.holdout-train.<key>.jsonl
*.holdout-test.<key>.jsonl
*.prod.<key>.jsonl
*.walk-forward-fold-<N>.train.<key>.jsonl
*.walk-forward-fold-<N>.test.<key>.jsonl
Local Artifact Chain (Export -> Train -> Signals)
npx @tradejs/cli backtest --mlwrites chunk files to local project folder:data/ml/export/ml-dataset-<strategy>-chunk-<chunkId>.jsonlnpx @tradejs/cli ml-exportmerges chunks into:data/ml/export/ml-dataset-<strategy>-merged-<timestamp>.jsonlnpx @tradejs/cli ml-train:latestreads export files fromdata/ml/exportand writes model aliases to:data/ml/models/<Strategy>.joblibor ensemble aliasesdata/ml/models/<Strategy>.modelN.joblib- ML infer service must read the same model directory (
MODEL_DIR). npx @tradejs/cli signals(with strategy configML_ENABLED=true) sendssignal.strategyto gRPCPredict. Inference service loads<Strategy>.joblib/<Strategy>.modelN.joblibfor that strategy.
If strategy name and model alias prefix match, runtime automatically uses the trained local model.
Quality and Causality
- Train validates lookahead leakage on timestamp-like fields.
- Guard can be disabled only for debugging:
ML_TRAIN_DISABLE_CAUSALITY_GUARD=1
- Runtime inference uses the same feature trimming policy as train.
Reports
Each train run saves markdown and HTML reports with:
- holdout metrics,
- walk-forward metrics,
- threshold tables,
- holdout TOP feature table.