Tenacious-Bench v0.1: a small B2B sales-outreach benchmark with contamination checks

1 / 2

Tenacious-Bench v0.1: a small B2B sales-outreach benchmark with contamination checks

DEV Community·Beamlaka·about 1 month ago

#UNPtktdf

#agents #ai #llm #machinelearning #bench #tasks

Reading 0:00

15s threshold

General sales benchmarks often miss how real outbound agents fail: overclaiming on weak signals, unsafe “bench” commitments, tone that drifts into pushy follow-ups, and gaps between what the rep promises and what delivery can support. For a class project (TRP1 Week 11), I built Tenacious-Bench v0.1, a compact, machine-scored task set aimed at those failure modes—not generic helpfulness. What’s in the dataset The public release is on Hugging Face: https://huggingface.co/datasets/Bnobody/tenacious_bench_v0.1 . It currently exposes 168 rows in the hub viewer, with splits aligned to how I train and evaluate: train (105) and validation (63). Tasks mix several authoring modes—programmatic sweeps, multi-LLM synthesis with judge filtering, trace-informed scenarios, and hand-authored adversarial cases—so the bench isn’t a single-generator monoculture.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Tenacious-Bench v0.1: a small B2B sales-outreach benchmark with contamination checks