we merged networking and security a couple months ago. triage time went up. environment is AWS with Transit Gateway, inline Palo Alto firewalls, and Okta for identity. mix of EC2, EKS, and some on-prem VMware. traffic goes through centralized inspection. symptoms show up as latency and intermittent drops. hard to tell if it’s routing, firewall policy, or identity timing. this has turned into a recurring SASE troubleshooting problem where no single layer gives a complete picture. we pull VPC flow logs, firewall logs, and packet captures, but each view is partial. changes in one layer don’t line up with the others. recent incident took hours to isolate. traffic was blocked by a firewall app-id override while identity hadn’t propagated yet. looked like a network issue at first. how are you isolating the failure domain quickly in setups like this? submitted by /u/Upper_Caterpillar_96 [link] [comments]