After weeks of no problems in STAGING and loadtests looking great, it turns out there's just one teensy little flaw in caching Drupal pages longer than a couple days...
Drupal nukes the CSS/JS optimized/consolidated temp files every couple days.
So the monitoring sees a nice valid HTML from an HTTP 200, but humans see a theme-less site.
Worse, the thing corrects itself somehow, or our guys jump on it and flush the cache, and it's all good...
For a couple days.
Digging into the Drupal caching source code even deeper, and I'm convinced: This things wasn't architected; It just grew.
It's a spaghetti code mess.
I defy any Drupal core dev to correctly describe the caching "architecture" of D6 pages, CSS, JS, blocks, views, etc in any coherent way.
I officially give up now.
When the VP of Marketing complains the site is slow, I'll just say "Yes, it is. That's Drupal."
We'll have to spider the silly thing after every cron job, actually, instead of doing that. Which sucks, but there it is.