Menu

#Eval

18 posts

Feed·
17 of 18 posts
Anthropic Message Batching: When 50% Off Is Worth the Latency
🖼️
0

Anthropic Message Batching: When 50% Off Is Worth the Latency

DEV Community·Gabriel Anhaia·28 days ago
#LopdfHQx
#ai#anthropic#python#llm#batch#requests

Anthropic Batches API gives you half-price tokens with a 24h SLA. Here is when it earns its keep, and a Python script that runs 1,000 evals through it.

15s
Read More
go-eval: la pieza que faltaba para probar agentes en Go
🖼️
0

go-eval: la pieza que faltaba para probar agentes en Go

DEV Community·igcodinap·about 1 month ago
#ySGNf84b
#el#go#ai#testing#fullscreen#eval

Hace un tiempo empecé a sentir una incomodidad rara construyendo aplicaciones con LLMs en Go. Go...

15s
Read More
Your RAG Eval Set Is Probably Wrong. The Test That Catches It.
📰
0

Your RAG Eval Set Is Probably Wrong. The Test That Catches It.

DEV Community·Gabriel Anhaia·about 1 month ago
#CZFQCZO0
#ai#rag#llm#eval#drift#queries

Three ways eval sets go wrong in production: leakage, drift, judge bias. Plus a 40-line drift detector you can ship today.

15s
Read More
5 RAG Failure Modes Nobody Warns You About in the Tutorials
📰
0

5 RAG Failure Modes Nobody Warns You About in the Tutorials

DEV Community·Gabriel Anhaia·about 1 month ago
#0RkOK81t
#ai#rag#llm#database#chunks#eval

The five RAG failures that survive your eval suite and break in production. Each one with a small mitigation snippet you can paste in today.

15s
Read More
Anthropic April 23 Postmortem: 3 Confounding Changes Behind Claude Code's Month-Long Quality Drop
📰
0

Anthropic April 23 Postmortem: 3 Confounding Changes Behind Claude Code's Month-Long Quality Drop

DEV Community·정상록·about 1 month ago
#IpKQoEUP
#anthropic#change#ai#code#claude#eval

From Dev Community: Anthropic April 23 Postmortem: 3 Confounding Changes Behind Claude Code's Month-Long Quality Drop

15s
Read More