If your "scraper" is a requests.get() followed by re.findall(r'<div class=\"price\">.*?</div>', html) , I have bad news. You don't have a scraper. You have a layout sensor. The first time the dev team renames the class, adds a wrapper <span> , or A/B tests a new pricing component, your pipeline goes silent. Not loud, not error-throwing — silent. Empty rows in the dataset. No alarm. You find out a week later when a stakeholder asks why the dashboard looks weird. I rebuilt our Idealista scraper this quarter and the regex stage was the thing I deleted first. The 3-item checklist Before you write another re.findall against HTML, check: Is there a stable accessibility role or label? ( getByRole('heading', { name: /price/i }) — survives class renames.) Is the data actually in the rendered page, or is it injected via JSON? (Often the JSON-LD <script> block has everything you need, no DOM walking.) Can you assert the schema fails loud?…