Content of the article: "[TECHNICAL] A case for an alternative server design for EVE (EVE 2.0 when?)"
Given the huge battle that just happened in M2- and the pretty bad lag that was all around, I've been thinking about what Platform Architecture could keep up with a battle like that.
What follows is going to be a quite technical approach to backend game design. If there's people that specialize in this area, please comment below, I'm eager to hear the limitations or mistakes of assumptions that I'm making in this area. In the same breath: if CCP wants to comment on this or correct my wrong assumptions, please do, eager to hear from you!
Furthermore, please also realize that most of my suggestions would most likely be close to EVE 2.0 instead of a simple patch applied to the server. And many of the suggestions haven't been thought out when this game was launched 17 years ago, hell, half of the stuff didn't even exist yet.
TLDR: Double sided Event Driven Design with scalable workers
Current design and its flaws
So what's currently happening? Currently we are having what is called a synchronous design, this can also be described as a blocking design. Every system is on a node (we can just call this a server for simplicity's sake) which takes in commands constantly. It executes all of these commands every second, this is known as a server tick.
assumption: the blocking effect comes from the part that once you issue a command, it does not actually happen until the server has confirmed it has received the command and responded back to the client with a confirmation.
Let's use a quick example:
- User initiates warp to a planet
- Server receives the command and executes it/queues it
- Server responds to the client confirming the command
- Client shows the warp animation and warps
problem: As a single server has to handle all these commands from more and more people, the calculations no longer scale with N, they become exponentially scaling. A single pilot warping to a stargate is 1 command. Two pilots shooting missiles at each other requires both sides inputs, the server to calculate the damage, the reach of the missiles, and then tell both pilots the damage they did / received.
As such the only logical solution is to allow the server more time to do its calculations. This is tidi: a single server tick becomes 2, 3, 4, … seconds, allowing the server to crunch all the needful and respond in a coherent manner.
Some might be thinking that this is a network issue: on the input side, it's simply not. API Gateways can easily handle 10 000 requests per second, which can be increased much higher. Hell, an average web server can easily handle a 1000 requests per second.
Event Driven Design
In essence the server is already somewhat running an event driven design, it's currently just doing it blockingly. Event driven design is a manner in which the only commands that are being sent to your platform (backend) are commands. The output of those commands happens after the server is done handling it.
The client does not wait for the server to finish it's business, it just carries on. Where might you have seen this kind of design? Think banks, these almost exclusively work with event driven architecture, for the simple fact that it's great for auditing and reversing actions.
So how would this work with our previous example?
- User iniates warp to a planet
- Server receives the timestamped command with the warp command
- Server acknowledges the receiving of the command
- Client shows the user warping
- Server simply adds the command to a queue (this can be a queue per system / global / grid / …)
- A worker node picks up the command from the queue, verifies it (timestamp, is ship alive, is ship warp scrambled, etc etc etc)
- The worker executes the command and modifies the ship's SQL entry with the status of warping to the location
- The worker also creates a new event: warp start with timestamp of the original timestamp + align time
- The client is "subscribed" to the status of the ship, as such, any changes are broadcasted from the server to all the users which are subscribed to this ship
- The client receives from the server that at the original timestamp warp was initiated
- The client further assumes that warp has started at timestamp + align time, since this requires no input, no further commands are needed
This accomplishes a few items:
- your "entrypoint" is just a huge queue to which you can send commands. This can be something along the lines of kafka or other messaging queues, you just need a high velocity queueing system. Secondly, you can put the whole server on this queue, this can be a fault-tolerant, cloud hosted queuing system.
- you can scale your workers and specialize them, you'll have warping workers, market order workers, damage calculation orders, grid workers, etc.
- scale your workers horizontally and/or vertically: the moment the market order worker reaches 80% CPU or Memory load, you simply assign it more CPU or Memory, or simply create another one
- crashes can be handled smoother: if a worker crashes, this means that the message is never taken out of the queue (dead letter queue), as such another worker can take it out when it's available and the broken worker can be restarted (container, vm, whatever)
- subscribe to what you wish: you don't want to subscribe to local chat? Close it, the client desubscribes from local chat and done
- replay-ability: you can now just store / request a replay file from the server: it contains the initial grid (upon your warpin e.g.) and all the events you received after that. This file will be minutely small, so you can play in potato mode, and render it for video effects in 12K+
- allows "overkills", which are easy to calculate: 500 ravens shoot a volley at a single typhoon, 1 event for each raven that fires, workers process all the ravens' damage and apply it to the typhoon with a simple check:
if typhoon_hp > 0. If it hits 0, send an event that it has died, 500 raven will cause this event multiple times, not a problem, a worker sets hp = 0 and send a message out to everybody subscribed the target has died. The other events that come in from the parallel damage applications are simply ignored since the target is already dead. Calculate the killmail later (or have another worker just for calculating killmails)
Double sided EDD
So how is this double sided? Well, your game would also be doing all of this, pretending to be a server and executing and calculating these things internally in the client. Only when the server sends an update to something you subscribe to which isn't in line with the current actions, the client will rectify it.
Back to our example:
- User initiates warp to a planet
- Client begins warp procedure, sends message to server it's starting warp procedure
- Server receives the message and starts the warp procedure
- Everybody who is subscribed to this grid receives this command (every client can now render that ship starting warp procedure)
- Hostile locks up the target and enabled warp disruptor
- Server receives the warp disruptor command, processes it and cancels the warp command (if the warp disruptor command was given before the initial warp timestamp + align time)
- People subscribed to the grid (everybody on grid) receive the warp disruption command
Wait, how does this differ from the current game? Well, your client will have more "independence", it'll run a lot smoother since it'll execute the actions on its own and only apply corrections or new inputs to the running state.
Obviously the server is always containing the "ultimate state of truth". A person landing on this grid will ask the server for the grid contents, not another client.
The database(s), synchronizations, and failures
Obviously the game still needs to maintain quite a few databases, ship status, positions on grid, damage applied, etc. Now, a part of these databases will be double sided, both in the user and on the server. Furthermore, much of the logic of the server will also be moved somewhat to the client.
So how will this look from a client's point of view?
- Warp to grid, when you are in warp (and landing point has been determined), send a request to the server
- Server sends you the grid at the current time
- Server subscribes you to the this grid and will start streaming all the relevant events of this grid
==> This means the client can "build" the grid itself based on the current status + all the events which are added on top of that.
Every so often (10 seconds, 30 seconds, a minute, …) the client requests from the server the current grid. It's checks the timestamp from the new grid it has received and fixes the mistakes it has missed in it. If you do this by hard fixing it, you'll get some tearing as suddenly things seem to teleport to new directions. Obviously you can do this gently and it'll be a lot nicer for the players.
What happens if an event that was sent from the server never makes it to the client or vice versa? Well, simply have the request go out again. Now this is where you can end up with some weird shenanigans. In our example, the client could have never received the "warp scramble" event from the server, thus causing it to be in warp already on your screen. Everybody else (including the server) would have you warp scrambled though. This however, is something that can never be truly avoided.
This is the typical: "I WAS BEHIND THE CORNER HOW CAN I BE KILLED???" — queue baby rage.
Doing this whole thing would require a massive rework on (most likely) both server & client side, however no wheels need to be reinvented. All the formulas, damage applications, speed calculations, fittings, etc etc can all be copied over 100%.
Another note: EVE has not made its own life very simple in these things.
- Missiles need a worker that constantly (probably ticking…) recalculates the trajectory of the missile (if it just simply approaches)
- Alternatively rewrite missiles so that they would intercept a target – this requires a much heftier calculation up front, but only changes in the direction of the target would require updates
- Drones, don't get me started on these, each drone is an entity on the grid, have their own "brain", their own orbits, their own damage, rate of fire, … What a PITA, no real idea how to deal with these straight forward
- Fighters, same stuff applies here, so much extra events just by launching and setting them to attack & orbit-ing a target
- Is some Tidi needed? Imagine the last battle of M2-, now imagine there's 0% tidi, it'll be over in a wiff, and in my opinion it'll lose some of its magic. I would suggest that if huge numbers load on the same grid, the developers simply "introduce" some tidi to let the game run at 50% speed. This will also help with older systems and keep them alive and allow it to calculate "some" frames :D.
Finally: thanks for sticking with me in this lengthy post. Feel free to leave your suggestions / discussions / whatever else in the comments.
Leave the EVE politics out of this, the post has nothing to do with the war =)
- How I got 170 killmarks on my Daredevil – An in-depth look into “ultralock” mechanics
- The major problem is BSG’s netcode design; RMT and hacking is a byproduct of that
- Finally found out why the game is so unresponsive and full of desync
Top 7 NEW Games of January 2021
New year - new month - new games. Take a look at the first 2021 games you’ll be playing on PC, PS5, PS4, Xbox Series X, Xbox One, Switch, and more.
More about EVE OnlinePost: "[TECHNICAL] A case for an alternative server design for EVE (EVE 2.0 when?)" specifically for the game EVE Online. Other useful information about this game:
Top 10 Best Video Games of 2020 (So Far)
In times of uncertainty, video games allow us to escape from the stress of the real world. For this list, we’ll be looking at some of the best games released in the first half of 2020.