We’ve been running a series of experiments using ChatGPT 5.4 integrated into a website chatbot across different environments: 🌐 a main website 🛒 a 1,000-product e-commerce demo store 🍳 a 570-page cooking blog 🎯 Goal: simulate realistic user behavior and observe how the model responds over time. ⚙️ Test setup The chatbot is designed to (no self promo here, just context): 📌 answer strictly based on website content (RAG-like approach) 🧭 guide users through product discovery and content navigation Over time, we intentionally tested recurring patterns: 🔎 product comparisons 💰 price-based filtering 🔀 cross-entity queries (multiple products, categories) 🧠 more complex “shopping intent” scenarios 💡 The idea was to approximate real-world usage, not synthetic benchmarks.…