Production Runbook
Use this runbook as a daily operational checklist.
Daily Health Checks
docker compose psor service manager status.npx @tradejs/cli doctor(or equivalent in runtime environment).- Verify API responsiveness and basic UI pages.
- Check Redis/Postgres connectivity and resource usage.
Incident Triage
- Identify failing layer: app, connectors, Redis, Postgres, ML, AI provider.
- Capture logs first (do not restart immediately unless severe).
- Confirm whether issue is global or symbol/strategy-specific.
- Apply scoped mitigation.
Restart Order (Typical)
- Data services (
timescale,redis) - ML inference service
- App service
- Reverse proxy / ingress if needed
Nightly Research Automation
If you run automated strategy research in production, keep it in a separate
agent container rather than inside the main app process.
Recommended pattern:
appcontainer handles UI, APIs, runtime signals, and normal operational cron.agentcontainer runs only the nightly research/agent cron.- nightly flow starts with
npx @tradejs/cli research:autoon a fixed schedule such as00:00inEurope/Moscow. research:autopicks the strategy with the stalest or missing research run, snapshots the effective strategy config into<Strategy>:research, runsbacktest --ai -> ai-export -> ai-train --localOnly, saves the run in Redis, sends a Telegram report, and then invokesnpx @tradejs/cli agent-run.agent-runuses OpenRouter withopenai/gpt-5.4, reasoning effortmedium, creates a review branch fromstable, validates the patch, and pushes a separate branch for manual inspection.
Deployment requirements:
- Mount a writable repo clone with
.gitinto theagentcontainer. - Provide GitHub push credentials to the
agentcontainer, typically via an SSH key dedicated to branch pushes. - Store OpenRouter and Telegram credentials in the runtime user settings used by the CLI.
Minimum settings:
OPENAI_API_ENDPOINT=https://openrouter.ai/api/v1OPENAI_API_KEY=<OpenRouter key>TG_BOT_TOKENTG_CHAT_ID
Rollback
- Keep previous image tags for
appandml-infer. - Roll back app and model aliases independently when required.
- Verify runtime with
doctorand smoke checks after rollback.