I asked Ian Christian to describe the issue and how it was resolved:
Well… explaining it is a little hard…. The key to figuring it out was this:
At the bottom of every page it shows when the page was generated, and how long it took. I suspect in wordpress somewhere it might have told you this too – but I’m not sure.
What we were seeing was the browser occasionally hanging. So I turned to a command line tool to grab the pages, and noticed it was taking 300 seconds to generate a single page. This seemed like a large clue that *something* was timing out. 300 seconds was likely some time out figure. My guess was your web site was trying to make an HTTP call to somewhere to get data, which was breaking. The reason the issues were temperamental was that most of the time pages were being served directly out of cache – which whilst it took 5 minutes to generate the result was cached.
So next step, was to look for HTTP connections out from your server
$ netstat –anpl | grep :80
This showed a lot of connections sitting in SYN_SENT state. This indicates a SYN (the first part of the TCP 3 part handshake) had happened, but it was waiting for the remote host to SYN,ACK back (the 2nd part – the 3rd part being an ACK).
The IP address that these connections were to was the external IP address of your server, which sits behind a firewall on a private address.
This turned out to be that for some reason, your site was trying to talk to itself via http://www.trefor.net URLs – probably this is how the caching engine works.
From a command line, a simple test ( curl –D – http://www.trefor.net ) proved that your server couldn’t talk to itself via it’s external IP. This might be because the firewall deliberately doesn’t allow it to prevent loops.
The work around was to modify the dns file /etc/hosts, to make your server resolve www.trefor.net as it’s internal IP address.
Time to generate a page was then drastically reduced from 300 seconds to 0.4ish seconds.
I’d like to thank Ian, Andy Goldschmidt and the team for their help in getting this sorted. “Under the hood” the web is a complicated system to run. The problem could have been down to many areas: a problematic plug-in, different version of Apache to that used on the previous server etc etc etc.
It feels great to have site that loads properly again.