TM4 separates autonomy into governed layers:
No layer can bypass another. Autonomy is permitted — only under governance.
Sandboxed, time-limited, deterministic
• All code runs in isolated environments
• Strict time and resource limits enforced
• Deterministic execution for reproducibility
• No external network access or side effects
Proposal generation without evaluation access
• LLM generates candidate solutions
• No access to test results or scores
• Prevents reward hacking and overfitting
• Blind generation ensures honest exploration
Tournaments, multi-axis scoring, evidence capture
• Candidates compete in tournaments
• Multi-dimensional fitness scoring
• All results logged and auditable
• Adversarial test cases prevent gaming
Invariants, anti-cheat rules, progression logic
• Enforces system-wide invariants
• Anti-cheat detection and prevention
• Controls when evolution is allowed
• Maintains audit trail of all changes
Generation (L2) is blind to evaluation (L3). This prevents the system from learning to game the tests.
Test cases are designed to catch shortcuts, heuristics, and superficial improvements.
Every proposal, evaluation, and decision is logged. Evolution can be replayed and verified.
No layer can bypass another. This is the foundation of honest evolution.