The International Simutrans Forum

Information and Announcements => Information & Announcements => Archived Announcements => Topic started by: Isaac Eiland-Hall on August 24, 2010, 03:44:21 am

Title: Server downtime of ~7 hours
Post by: Isaac Eiland-Hall on August 24, 2010, 03:44:21 am
What a comedy of errors. It would be funny (i.e. a "comedy") if it hadn't taken us offline for all afternoon and evening. ARGH.

Long story short: I have been having issues with the attempts to find the source of the hacking. I was told I shouldn't "bump" my ticket. So when the server went offline, I ****umed it was another hacking situation.

After 1.5 hours of being down, I did bump the ticket; but for $30/mo, they don't provide emergency service. So I had to wait until it had been down for ~3 hours to follow-up with a complaint...

Another hour to hear a reply, which adivsed that it wasn't a result of their work.

So, an emergency ticket was opened with iWeb. The response there WAS immediate; but it took them ~30mins to investigate (including getting a keyboard to the server itself); then they had to p**** it on to another department. All told, it took nearly two hours to figure out the problem, which was an error in the network configuration. Technically, iWeb's fault - I downgraded from 100Mbps port to 10Mbps port (since we don't use it, and it's $10/mo I'm paying for no use)......

So the length of the outage is due to an extremely rare mistake from iWeb, combined with dealing iwth the hacking issue.

I apologize for the downtime.
Title: Re: Server downtime of ~7 hours
Post by: Isaac Eiland-Hall on August 24, 2010, 04:24:37 pm
Update: Downtime this morning of ~2 hours somewhat related. I'm too tired right now to go into much detail, but basically it was a different problem caused by an attempt to kill a hacking in progress.

Then an additional ~30mins unavailability because the nameserver wasn't working right after repairs to the system.

Everything *should* be up again finally/fully/firmly - a final check is being done (and additional reboot + checks to try and prevent further problems)