Menu

I Replaced GPT-4 with a Local SLM and My CI/CD Pipeline Stopped Failing | Towards Data Science
📰
0

I Replaced GPT-4 with a Local SLM and My CI/CD Pipeline Stopped Failing | Towards Data Science

Towards Data Science·Benjamin Nweke·about 1 month ago
#F8sbVntW
Reading 0:00
15s threshold

rewrite of the same system prompt. You MUST return ONLY valid JSON. No markdown. No code fences. No explanation. JUST the JSON object. I had written MUST in all-caps. To a language model. As if emphasis would work on something that doesn’t have feelings or, apparently, a consistent definition of “valid JSON.” It didn’t work. Here’s what did. How GPT-4 Ended Up in a Nightly Batch Job Our team consumes research documents, such as PDFs and plain text, and occasionally those pesky semi-structured reports that some vendor clearly exported from a spreadsheet they were very proud of. And part of that pipeline classifies them and extracts structured fields before anything touches the data warehouse. Methodology type, dataset source, key metrics. This sounds like a solved problem. It usually is, until there are about forty types of methods listed and the documents stop looking anything like the ones you trained on. For a while, we handled this using regex, rule-based extractors, and a fine-tuned BERT model.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More