I'm working on an open source codebase intelligence tool. One layer of it scores every file 1-10 using 15 deterministic biomarkers. No LLM. AST parsing via tree-sitter plus git history. Wanted to know if the scores actually mean anything. So I ran a time-travel experiment. Setup Scored every file at time T, then counted bug-fix commits over the following 6 months. Three repos: FastAPI (104 files), Pydantic (216 files), Django (542 files). 862 files total. The biomarkers fall into four buckets: - Structural (7): brain_method, nested_complexity, bumpy_road, complex_method, large_method, complex_conditional, primitive_obsession - Duplication (1): dry_violation (Rabin-Karp rolling hash over tree-sitter tokens, survives variable renames) - Test coverage (2): untested_hotspot, coverage_gap - Organizational (5): developer_congestion, knowledge_loss, hidden_coupling, function_hotspot, code_age_volatility What I found On Django: Spearman ρ = -0.34 (p < 0.0001).…