Menu

📰
0

Reddit - Please wait for verification

Computer Science: Theory and Application·/u/Obvious_Gap_5768·4 days ago
#cq5eN7EW
Reading 0:00
15s threshold

I'm working on an open source codebase intelligence tool. One layer of it scores every file 1-10 using 15 deterministic biomarkers. No LLM. AST parsing via tree-sitter plus git history. Wanted to know if the scores actually mean anything. So I ran a time-travel experiment. Setup Scored every file at time T, then counted bug-fix commits over the following 6 months. Three repos: FastAPI (104 files), Pydantic (216 files), Django (542 files). 862 files total. The biomarkers fall into four buckets: - Structural (7): brain_method, nested_complexity, bumpy_road, complex_method, large_method, complex_conditional, primitive_obsession - Duplication (1): dry_violation (Rabin-Karp rolling hash over tree-sitter tokens, survives variable renames) - Test coverage (2): untested_hotspot, coverage_gap - Organizational (5): developer_congestion, knowledge_loss, hidden_coupling, function_hotspot, code_age_volatility What I found On Django: Spearman ρ = -0.34 (p < 0.0001).…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More