INN
  • Articles
    • EVE Online
    • Interviews
    • Gaming
  • Podcasts
    • The Meta Show
    • Cartridge 2 Cloud
    • Push To Talk
  • Calendar
  • Staff
    • Contact Us
    • Join Us
    • Legal
  • Submissions
Saturday, May 10, 2025 15:56:36
INN
  • Articles
    • EVE Online
    • Interviews
    • Gaming
  • Podcasts
    • The Meta Show
    • Cartridge 2 Cloud
    • Push To Talk
  • Calendar
  • Staff
    • Contact Us
    • Join Us
    • Legal
  • Submissions
  • Login/Register
INN
INN
  • Home
  • Articles
  • Calendar
  • Staff
  • Contact Us
  • Sov Map
Copyright 2021 - All Right Reserved
Uncategorized

Eve Online: Behind the July 15 Downtime

by TMC Archives August 9, 2015
by TMC Archives August 9, 2015 0 comment
229

On July 15 Eve Online experienced its longest downtime since Incarna released in 2011 and a recent devblog explains why. Some veterans of the game remember the warnings to set a long skill training before patch day. Back when CCP released expansions every six months it wasn’t uncommon for these massive patches to take the system down for several hours or even days. That time is long past though and the six week deployments have been very uneventful for the most part. Newer Eve players have likely never experienced a downtime longer than an hour.

When the second half of the Aegis patch deployed on July 14, it brought a sweeping change to nullsec sovereignty mechanics which should already be familiar to readers of this site. That change rolled out with little issue and the denizens of New Eden rallied behind the call to entosis all the things. This resulted in hundreds of sovereignty event timers. All seemed well and the developers in Reykjavík likely had no idea that their small hotfix patch the following day would bring Tranquility to its knees.

Eve Online is unique in the MMORPG genre because all the players reside in a single shard where they can interact with each other. Some may mistakenly assume single shard means single server but this couldn’t be further from the truth. The Tranquility cluster consists of hundreds of server blades and—back in 2013—was reported to have the equivalent of 4 terabytes of memory and 2.5 terahertz of CPU speed. At one time the cluster was regarded as one of the world’s largest supercomputers.

Rebooting Tranquility requires tight coordination between the individual servers as the cluster comes online. Each server transitions through four stages before it is considered ready for use. On July 15 several of these servers were stuck in the final stage which prevented Tranquility from coming online. The developers worked frantically to determine the cause and get the cluster operational. After some trial and error they found that deleting the sovereignty event and vulnerability data allowed Tranquility to boot. Perhaps the players were too zealous when waving their entosis wands.

Resetting the prior day’s sovereignty data wasn’t an acceptable solution, so the developers kept searching for the root cause. They narrowed the problem down to the processing of sovereignty events and were able to boot by removing that data. However, the nodes would fail when they manually added that data after the cluster was online. Next they tried turning off the server log messages and the server came online without any issues. Satisfied that they had a working solution all log messages were removed from the new sovereignty code and Tranquility was brought online ending the nearly 12 hour ordeal.

Log messages are important for finding bugs in a distributed system like Tranquility so removing them is not a valid long-term fix. The developers are performing experiments on Tranquility during downtime to search for the underlying problem. There were two log message channels used in the new sovereignty code—one for generic messages and one for sovereignty campaign messages. Each worked fine with low volumes of data similar to what would be seen on the test server. However, the campaign channel causes processing to grind to a halt when presented with a large volume of data like that seen on the live server.

The developers at CCP continue to search for the root cause for the problems with the logging channel. Since these issues only appear on the Tranquility cluster they only have a few minutes each day to run tests. It will likely take them a while to narrow down the problem and restore logging.

This article originally appeared on TheMittani.com, written by Turk Fezzik.

Share 0 FacebookTwitterPinterestEmail
TMC Archives

previous post
NYX VALUED AT 159 BILLION ISK DOWNED BY THE BASTION
next post
This Week in Sov

You may also like

World War Bee: Week 55

July 26, 2021

World War Bee: Week 52

July 5, 2021

World War Bee: Week 51

June 28, 2021

Book Review: Andrew Groen’s Empires of EVE, Volume...

June 26, 2021

World War Bee: Week 50

June 21, 2021

World War Bee: Week 47

May 31, 2021

World War Bee: Week 45

May 17, 2021

CCP Weekly Wrap-Up: May 4-10 2021

May 11, 2021

World War Bee: Week 43

May 3, 2021

Clash of Lowsec Titans: A Trillion ISK Snuffed...

April 11, 2021

Let your voice be heard! Submit your own article to Imperium News here!

Would you like to join the Imperium News staff? Find out how!

  • Facebook
  • Twitter
  • Youtube
  • Twitch
  • Discord

©2023 - All Right Reserved. Designed and Developed by Imperium News

INN
  • Articles
    • EVE Online
    • Interviews
    • Gaming
  • Podcasts
    • The Meta Show
    • Cartridge 2 Cloud
    • Push To Talk
  • Calendar
  • Staff
    • Contact Us
    • Join Us
    • Legal
  • Submissions
Sign In
Connect with:
Google Twitter Disqus Twitch.tv

Keep me signed in until I sign out

Forgot your password?

Do not have an account ? Register here

Password Recovery

A new password will be emailed to you.

Have received a new password? Login here

Register New Account
Connect with:
Google Twitter Disqus Twitch.tv

Have an account? Login here