Server hardware failure??

General topics and discussion on Valor.
Carloboy
Lancer
Posts: 5
Joined: Wed Sep 26, 2012 4:16 pm

Server hardware failure??

Postby Carloboy » Tue Apr 23, 2013 8:09 pm

Are you guys leasing these servers or colo? Any decent DC should, at the very least capable of n+1 redundancy....a single hardware failure is causing this whole thing? If so im assuming this must be your authentication server?

Guysssss, come on now, its a money making business....redundancy should have been the 1st priority of your admins...a failover should have been there and a simple DNS change would have caused minimal downtime, while you figured out what was goin on with the 1st.

If i was an admin/head of IT and had a server down for 3 days just for a hardware failure....id quit in shame and find another career, seriously...all this time i was thinking your critical servers were getting ddos'd and DC null routed it your IP...then come to find out it was something infinitely simple...i hope to the god of linux that you guys used RAID and had offsite backups...

PS not bashing on Quark...but hopefully next time your "admins" have contingency plans for events like this...3 days down is unacceptable the hosting world, no matter the reason...especially when you have players to pay to play the game

mclueppers2
Knight
Posts: 50
Joined: Tue Apr 23, 2013 4:10 am
Location: W198, GW2

Postby mclueppers2 » Wed Apr 24, 2013 4:19 am

Carloboy wrote:Are you guys leasing these servers or colo? Any decent DC should, at the very least capable of n+1 redundancy....a single hardware failure is causing this whole thing? If so im assuming this must be your authentication server?

Guysssss, come on now, its a money making business....redundancy should have been the 1st priority of your admins...a failover should have been there and a simple DNS change would have caused minimal downtime, while you figured out what was goin on with the 1st.

If i was an admin/head of IT and had a server down for 3 days just for a hardware failure....id quit in shame and find another career, seriously...all this time i was thinking your critical servers were getting ddos'd and DC null routed it your IP...then come to find out it was something infinitely simple...i hope to the god of linux that you guys used RAID and had offsite backups...

PS not bashing on Quark...but hopefully next time your "admins" have contingency plans for events like this...3 days down is unacceptable the hosting world, no matter the reason...especially when you have players to pay to play the game


Obviously Playmesh didn't learn from the previous outage. Now I suppose that Quarkgames just inherited the admin stuff and didn't do much in terms of redundancy and avoiding single-point of failures. T&C still don't give us any rights in terms of SLA so the only people that really care now are the real money spenders.
Of course I don't believe that everybody from the engineering team are engaged to fix the problem - no one will manage to spend 2-3 days in a DC without sleep and proper food. Or maybe the senior sysadmin is on a vacation atm and the junior stuff is unable to handle the situation.

Phantom
Lancer
Posts: 7
Joined: Fri Oct 05, 2012 5:04 pm

Postby Phantom » Wed Apr 24, 2013 4:34 am

mclueppers2 wrote:Obviously Playmesh didn't learn from the previous outage. Now I suppose that Quarkgames just inherited the admin stuff and didn't do much in terms of redundancy and avoiding single-point of failures. T&C still don't give us any rights in terms of SLA so the only people that really care now are the real money spenders.
Of course I don't believe that everybody from the engineering team are engaged to fix the problem - no one will manage to spend 2-3 days in a DC without sleep and proper food. Or maybe the senior sysadmin is on a vacation atm and the junior stuff is unable to handle the situation.


Since Quark is simply a re-brand from PlayMesh there is no reason to believe that any of the staff have changed (over and above general churn).

User avatar
LordFirefall
Posts: 1002
Joined: Thu May 31, 2012 4:15 am
Location: Montival

Postby LordFirefall » Wed Apr 24, 2013 4:43 am

I'd be interested to know how many of the tech experts here have ever actually ran a tech department in a decent size company. If so, did you have unlimited funds, or did you have constraints on what you could spend? Every decision I've ever had to make ended up boiling down to having more needs and wants than actual budget. As such, it forced me to make hard decisions and assume a fair amount of risk. Functional decisions boiled down to winnowing down a dozen great ideas down to the 1-2 I could afford, and everyone thought their great idea should be one of those 1-2.
W95 Praetorian Guard Guild Leader
Kakao: LordFirefall or Firefall

MyName999
Scholar
Posts: 475
Joined: Wed Mar 14, 2012 4:26 pm

Postby MyName999 » Wed Apr 24, 2013 5:11 am

Fact is that, as adults, a sooooooo loooooong shutdown has to be clearly explained.

The silence is a war strategy, so Quark plz explain us what's going on. Just saying "We've got a problem and are trying to solve it. You'll see when it'll be gone" isn't enough.

Lots of people paid to play this game, making them client to Quark. As client we've the right to know exactly what's up!

mclueppers2
Knight
Posts: 50
Joined: Tue Apr 23, 2013 4:10 am
Location: W198, GW2

Postby mclueppers2 » Wed Apr 24, 2013 6:18 am

LordFirefall wrote:I'd be interested to know how many of the tech experts here have ever actually ran a tech department in a decent size company. If so, did you have unlimited funds, or did you have constraints on what you could spend? Every decision I've ever had to make ended up boiling down to having more needs and wants than actual budget. As such, it forced me to make hard decisions and assume a fair amount of risk. Functional decisions boiled down to winnowing down a dozen great ideas down to the 1-2 I could afford, and everyone thought their great idea should be one of those 1-2.


Now does the fact that I run a $2 mil worth infrastructure that makes millions of revenue each day make me expert? Do you know what will happen if that infrastructure stops working? Worst-case scenario - recovery from bare metal in less than 6 hr. but not 3 days.

DirtySouthATL
Guardian
Posts: 233
Joined: Sun Nov 13, 2011 10:57 am

Postby DirtySouthATL » Wed Apr 24, 2013 6:30 am

mclueppers2 wrote:Now does the fact that I run a $2 mil worth infrastructure that makes millions of revenue each day make me expert? Do you know what will happen if that infrastructure stops working? Worst-case scenario - recovery from bare metal in less than 6 hr. but not 3 days.



I'm going to have to go with lueppers on this one. Had my company pulled something of this nature, we'd be at about $15,000,000.00 worth of revenue loss due to the downtime we've now experienced with this game. Yes that's millions. I sure as **** wouldn't have a job anymore, neither would anyone else in my company for that matter. My company specializes in hosted environments and well, I can tell you, if our system goes down, we have a back up that rolls over within 10 minutes of it failing. We don't have unlimited funds, but we do require an obscene amount of money from our customers to deliver a service that's running 24-7 without interruption.
Guess what, that same location you guys have your servers, is where we had ours, so when this happened the last time, we decided to do what anyone with common sense would do, we dropped everything we had planned on doing and created a back up plan and rolled it out. We know the minute a server goes down due to our back up plan and we are able to get it up and online again within minutes.
This is not rocket science you guys. Quark promised us a new way of doing things the last time this happened, and it's VERY obvious that didn't take place. We are all just pointing that out. If y'll are going to continue taking our money, at least put it to use where it needs to be used the most. Screw the graphics. Get a stable environment.
Valor is the eHarmony of the nerd world

User avatar
fortheLOVE
Posts: 76
Joined: Fri Oct 05, 2012 10:51 am
Location: Texas

Postby fortheLOVE » Wed Apr 24, 2013 6:38 am

DirtySouthATL wrote:I'm going to have to go with lueppers on this one. Had my company pulled something of this nature, we'd be at about $15,000,000.00 worth of revenue loss due to the downtime we've now experienced with this game. Yes that's millions. I sure as **** wouldn't have a job anymore, neither would anyone else in my company for that matter. My company specializes in hosted environments and well, I can tell you, if our system goes down, we have a back up that rolls over within 10 minutes of it failing. We don't have unlimited funds, but we do require an obscene amount of money from our customers to deliver a service that's running 24-7 without interruption.
Guess what, that same location you guys have your servers, is where we had ours, so when this happened the last time, we decided to do what anyone with common sense would do, we dropped everything we had planned on doing and created a back up plan and rolled it out. We know the minute a server goes down due to our back up plan and we are able to get it up and online again within minutes.
This is not rocket science you guys. Quark promised us a new way of doing things the last time this happened, and it's VERY obvious that didn't take place. We are all just pointing that out. If y'll are going to continue taking our money, at least put it to use where it needs to be used the most. Screw the graphics. Get a stable environment.


hahahahhahahahha!!! where's the 'like' button..? don't think i could agree with this more.
-enjoythefall
-278392193
w110 Anarchy (Anonymous)

User avatar
LordFirefall
Posts: 1002
Joined: Thu May 31, 2012 4:15 am
Location: Montival

Postby LordFirefall » Wed Apr 24, 2013 7:03 am

mclueppers2 wrote:Now does the fact that I run a $2 mil worth infrastructure that makes millions of revenue each day make me expert? Do you know what will happen if that infrastructure stops working? Worst-case scenario - recovery from bare metal in less than 6 hr. but not 3 days.


It gives you a better insight than most. In your case, double and triple redundancies would be appropriate. However, Quark does not have the same level of loss potential. As such, how is it a sound business decision for them to have that same level of redundancy?

We are approaching two days of outage right now. While I don't like it, it's not unreasonable for them.
W95 Praetorian Guard Guild Leader

Kakao: LordFirefall or Firefall

mclueppers2
Knight
Posts: 50
Joined: Tue Apr 23, 2013 4:10 am
Location: W198, GW2

Postby mclueppers2 » Wed Apr 24, 2013 7:16 am

LordFirefall wrote:It gives you a better insight than most. In your case, double and triple redundancies would be appropriate. However, Quark does not have the same level of loss potential. As such, how is it a sound business decision for them to have that same level of redundancy?

We are approaching two days of outage right now. While I don't like it, it's not unreasonable for them.


You're talking about the same Quark that proudly announced top 25 in AppStore revenue? Queue expander and 3 res boosts are worth 200 gold that is good $10. Now just think about how much money Quark makes each time they open a new world and each week after that! Now obviously you're familiar with IT stuff and prices - how much do you think they'll need to build a double-resilience? I'm not talking about complicated scenarios with 2+ redundancy factors.


Return to “General”

Who is online

Users browsing this forum: No registered users and 9 guests