I'm working on an AI Data Analyst in MLJAR Studio. The idea is simple: you ask a question in natural language, AI writes Python code, executes it, and shows the result. But recently I found a small example that reminded me why AI data analysis needs more than code generation. The code worked I was testing a medical data analysis use case with a diabetes CSV file. The first task was simple: load data from this URL AI generated Pandas code with read_csv() . The code executed without errors. The dataframe was displayed. The shape looked correct: 768 rows and 9 columns. So everything looked fine. But then I looked at the dataframe. 148 pregnancies? In the first row, the Pregnancies column had value 148 . That immediately looked wrong. Values like 0 , 1 , 2 , 6 , or 8 make sense for number of pregnancies. But 148 ? No.…