Alright. What do we know? We know that the asset servers were taken down for an out-of-sequence maintenance window on Thursday morning. This could be because of a progressive failure or breakdown, or because of an upgrade to stave off a failure.Failure in this case would be capacity being exceeded by load on a more or less consistent basis. The work was done, and everything was smiles. And then things went wrong.
Now we know very little, except that there were assorted minor asset issues, finally culminating in what was described by one person as "totally screwed up behavior". Things got taken down. Things got brought back up. Things got taken down, again.
"Second Life is still down while we update our central storage hardware". Okay, this makes it look like a capacity problem. New hardware not running up to spec? Firmware upgrades? Faster drives?
Ten minutes later: "Second Life is still down while we work with our storage hardware vendor". Oh, dear. Now that isn't a good sign. You don't talk to your hardware vendor at two in the morning if something isn't shaped like a piece of fruit. [ colloq. (chiefly Brit., orig. R.A.F. slang). to go (also turn) pear-shaped: to go (badly) wrong, to go awry. ]
This suggests either an upgraded piece of hardware that is -- frankly -- not doing the job, or a bad firmware update, or just crashing under load. Maybe a dud RAID card. Maybe a SCSI controller with inadequate grounding. What we can reasonably assume is that the vendor's hardware isn't doing the job that it's supposed to be doing, and that Linden Lab are trying to get their assistance to determine why.
So, this would mean that the vendor isn't happy, Linden Lab isn't happy, the people who handle the grid hardware and systems aren't happy, and we're not happy. All the major parties are unhappy. Shakespeare would schedule a pirate attack around this point in proceedings.[Update: 7:30AM SLT -- Linden Lab reports that the grid may be inoperable as late as noon today]
[Update: The grid reopened around 11am SLT, and Linden Lab have provided a detailed explanation of the issues with the storage array.]












1. Prior to things going south, I was teaching a basic scripting lesson on creating a color-shifting titler when my scripts stopped saving properly after compiles. TUi Lecturer Compulov Weeks reported that requests to take a copy of his own stuff were taking ages to resolve, and a lot of NCi newbies also reported issues with summoning scripts out of their inventories.
When things went pear-shaped, you didn't know how bad they were till the rolling boot went through your sim :(
P.S. Tateru? I guess you can see the monkeys and black slab beating on the secondlife.com website now. Cold Solace, really...
Posted at 10:13AM on Dec 29th 2006 by Patchouli Woollahra