Been having intermittent problems with trefor.net since moving the site to a new virtual platform at Christmas. It’s all sorted now. Thanks to the lads at the Timico Datacentre.
I asked Ian Christian to describe the issue and how it was resolved:
Well… explaining it is a little hard…. The key to figuring it out was this:
At the bottom of every page it shows when the page was generated, and how long it took. I suspect in wordpress somewhere it might have told you this too – but I’m not sure.
What we were seeing was the browser occasionally hanging. So I turned to a command line tool to grab the pages, and noticed it was taking 300 seconds to generate a single page. This seemed like a large clue that *something* was timing out. 300 seconds was likely some time out figure. My guess was your web site was trying to make an HTTP call to somewhere to get data, which was breaking. The reason the issues were temperamental was that most of the time pages were being served directly out of cache – which whilst it took 5 minutes to generate the result was cached.
So next step, was to look for HTTP connections out from your server
$ netstat –anpl | grep :80
This showed a lot of connections sitting in SYN_SENT state. This indicates a SYN (the first part of the TCP 3 part handshake) had happened, but it was waiting for the remote host to SYN,ACK back (the 2nd part – the 3rd part being an ACK).
The IP address that these connections were to was the external IP address of your server, which sits behind a firewall on a private address.
This turned out to be that for some reason, your site was trying to talk to itself via http://www.trefor.net URLs – probably this is how the caching engine works.
From a command line, a simple test ( curl –D – http://www.trefor.net ) proved that your server couldn’t talk to itself via it’s external IP. This might be because the firewall deliberately doesn’t allow it to prevent loops.
The work around was to modify the dns file /etc/hosts, to make your server resolve www.trefor.net as it’s internal IP address.
Time to generate a page was then drastically reduced from 300 seconds to 0.4ish seconds.
I’d like to thank Ian, Andy Goldschmidt and the team for their help in getting this sorted. “Under the hood” the web is a complicated system to run. The problem could have been down to many areas: a problematic plug-in, different version of Apache to that used on the previous server etc etc etc.
It feels great to have site that loads properly again.
5 replies on “Diagnosing very slow website loading problem”
This is one reason we need IPv6, to get rid of NAT and this nonsense of “inside” and “outside” IP addresses.
Not 100% sure the Tref server was behind NAT, he talks of a firewall which becomes very important if hosting something on a public IP. In essence there will be services that they wish to restrict to certain IP ranges, or deny access to those IP blocks that always seem to be posting 99.9% of the majority of spam to blogs.
Keeping a blog spam free is almost a full time job in itself.
IPv6 while well overdue is still struggling at the consumer CPE level, have seen a good many devices mess it up or not provide a basic set of firewall rules to protect the novice user.
Well, it’s a shame there’s no “like” button for Aled’s comment.
But I also think that’s possibly a utopian view of the world. There is now a whole industry built around the “inside/outside” philosophy, and orgs actually buy into this as well. So there’s not just the mindset change needed to “deploy IPv6” but “remove NAT/proxies, etc.” as well.
I thought it was the toilet roll of trackers on your blog – 48 according to Ghostery 🙂
I’ve had similar issues in the past and used IPtables firewall rules to route traffic to a server’s external IP from within the same LAN back to the server internally and some routers with MultiNAT etc do this out of the box.
“Well, it’s a shame there’s no “like” button for Aled’s comment.”
— added to the list…. 🙂