Update: This was found in an older version of CrewAI's code interpreter. CrewAI has since removed the tool entirely. The finding is real but historical. The point is the class of problem, not the specific instance. I've been building a static effect analyzer. You point it at a Python or TypeScript file and it tells you what each function does to the outside world: network, filesystem, database, subprocess, etc. I ran it on CrewAI's code interpreter and got this: $ libgaze check code_interpreter.py run_code_unsafe:347 can Unsafe 365 | os.system ( f "pip install {library}" ) 370 | exec ( code, {} , exec_locals ) 2/13 functions are pure. Line 365. library is a string from the LLM. No validation, no allowlist. The LLM decides what gets pip-installed on your machine via os.system with an f-string. The function is honestly named run_code_unsafe . But the function that calls it, _run , is 150 lines long and picks between the "safe" and "unsafe" paths based on a config flag.…