Recent Development Update & Performance Improvement

It’s been a few weeks since I’ve written a development update on the blog, but I’d like to share some of what I’ve been working on lately. I’ve been programming hard to push OC to the point where all major features for the planned version 1.0 release are in the game, and I’ve put a lot of the peripheral work (like blogging) off to the side in order to accomplish that goal. I think I’m finally there. All sorts of things from media consumption by NPCs to disease modeling have been improved significantly, and I need to write about all these subsystems.

I need to write more about combat and provide tutorials, too, so there should be a great deal of updates and posts in the near future.

For now, I’d just like to touch on the work I’ve done over the past week or two to improve the performance of OC. I’ll cover the following topics, in brief:

  • Targeted minimum system requirements.
  • OC’s serialization overhaul.
  • Memory optimization.
  • Processing optimization.

Targeted Minimum System Requirements

*Keep in mind, this section is tentative as of today’s date, and it’s subject to change as testing progresses.*

This is probably the part of the article that most people care about, since it’ll give a feel for how well your computer will be able to play Outer Colony. Before I get into numbers, it’s important to note that OC is not designed like most other games. OC does a great deal of things that most game architectures don’t have to worry about, and OC doesn’t do the high end, 3D graphics of most AAA games. This means that OC needed to be designed differently from most games, and it uses your computer’s hardware in a different sort of way.

In short, the two most important parts of your computer for running OC are:

  • Your CPU.
  • Your RAM.

Specifically, with respect to your CPU, the more cores (and the faster they are) the better. OC’s engine is fundamentally multi-threaded, so the more cores you throw at it, the more entities it will be able to handle.

RAM becomes important for running larger world sizes and multiplayer servers with more players. Given a naive implementation, OC would demand impossibly large volumes of memory. As such, it relies on a two-tiered, in-memory compression scheme to offload parts of the world that aren’t being processed at any given moment. This is discussed a bit more below, but the important thing to keep in mind is that OC loves RAM, and more RAM = larger worlds and more multiplayer.

So, how much RAM and how fast of a CPU will you need? The short answer I’m currently targeting 3GB of RAM and a reasonably capable, dual core CPU as a bare minimum. I’ll specify a more detailed answer as we get closer to release, but if you have a PC built within the last few years, you should be in fine shape. I’ll probably recommend 6-8+ GB of RAM, and a modern 4+ core CPU. With this, you’ll be in good shape! If you’ve got a heavy duty rig with 16+ GB of RAM and 6+ cores at your disposal, you’re going to have a blast with giant worlds and larger populations. OC will scale with your hardware, so the more you’ve got, the better it’ll be.

Other system components to keep in mind:

  • Your GPU. Your graphics card can help OC’s performance by handling aspects of world rendering. Though the graphics aren’t complicated, OC will use hardware acceleration on most systems under most circumstances.
  • Your hard drive. Faster hard drives can give you marginally faster world save and load times.

OC’s Serialization Overhaul

One of the biggest development tasks over the last two weeks was completely changing OC’s underlying serialization system. This was important for decreasing OC’s memory footprint and decreasing the amount of bandwidth it needed for multiplayer games.

What is serialization, and how does it relate to OC? Without getting too technical and boring, serialization is one way that computer programs can format, store, and send data they they use. In Java, there are as many serialization formats as there are stars in the sky, each with its own set of pros and cons. To this point, OC was using Java’s native serialization to persist, compress, and communicate objects. Java serialization is amazing, and I can’t emphasize how much I like it. It’s a no-brainer to use in a basic way, which made it a useful placeholder for rapid development in the early going.

However, it probably wasn’t the best permanent solution for OC. First, its performance is not wonderful. The binaries that it produces are large (more memory use), and the serialization / deserialization performance can be comparatively sluggish (more processing). For a performance sensitive application like OC, this does matter.

Second, while Java serialization is easy to use, it’s also easy to make mistakes with, because nearly all of its internal mechanics are implicit. While it’s very possible to version these serialized objects, it’s more error prone than other serialization formats that demand explicit schemas.

To address these concerns, I switched from Java’s native serialization to Google protocol buffers. Again, without getting into too many details, the end result was a significant decrease in memory footprint and faster region load times. Much, much faster! OC also now has an explicitly defined schema and a robust mechanism for maintaining backwards compatibility. This is a huge help for maintaining world saves through new releases after the version 1.0 launch.

The only downside of the process: man, was it time consuming. Every serializable class (there are hundreds and hundreds of them) needed to be touched. Not just touched, but mapped to a protocol buffer message. Code needed to be written to convert the object to the serialized data and to construct the object from serialized data. The schema itself, which is rather large, needed to be written. Constantly compiling the protocol buffers adds steps to my development workflow. There are always costs to this kind of thing, and it took me more than a week of extremely long days to overhaul the entire code base.

One of the biggest changes, too, was in going from a cyclic graph structure for the data with Java serialization to a tree structure for protocol buffers. When you serialize an object graph in Java and deserialize it, references within the serialized structure are maintained and restored. Very neat! Protocol buffers, however, don’t work like this. You’re just jamming bits into data holders and pulling bits out from data holders. It doesn’t do any JVM magic. In a way, this is good, as it forces you to be very conscious of what is referencing what and how those references are restored throughout deserialization. This is something a developer should be doing (at least in their head) anyway, but Java serialization makes it easy skip this work, which can be somewhat tedious.

For the most part, going from cyclic to acyclic graphs for my object hierarchies wasn’t an issue. Since OC wasn’t serializing everything into one giant graph to begin with, I was already doing some of my own work to restore references properly on deserialization. There are, however, a few instances where my object hierarchy does contain cyclic references that were being handled automatically by Java’s native serialization, and the new deserialization processes had to account for this. This is why the serialization format change wasn’t just a matter of mapping fields in objects to fields in protocol buffers. Some real thought had to go into it. But that thought was productive and helped me to further clean OC’s overarching object model.

Overall, the serialization format switch was a lot of work, but it’s yielded a great many benefits. Well worth the investment in time, from both a performance and code smoothness standpoint.

Memory Optimization

Once the serialization overhaul was complete, I implemented a round of memory improvements that will help OC run on lesser hardware.

Memory_Smaller

Screenshot from my profiler of choice, showing OC running on less than 2GB of memory. Not bad!

I spent some time experimentally determining ideal region sizes for compression and improving OC’s overall compression scheme. As I mentioned above, OC’s compression mechanism operates using two tiers of compression. Obviously, the naive implementation for storing OC data (a giant, 3D matrix) wouldn’t work. It’d call for terabyte+ sizes of memory for the largest worlds that have been generated. Since that’s not practical, steps are necessary to make it work.

Maybe someday I’ll write in detail about OC’s compression scheme, but the first (and most important) compression layer involves exploiting non-randomness in OC’s world structure. OC uses several internal matrix representations to store world regions in an efficient way. Sometimes I can define implicit values for sections of the world based on metadata, sometimes I can use sparse matrix representations, and sometimes I can crunch groups of tiles together into fewer values. When I don’t need to access a section of world data in a fast way, it can live in this compressed form.

The second tier of compression involves serializing the structures compressed by the first tier, then subjecting the bytes to a traditional compression algorithm. Right now, I’m using deflate, but I’ve considered using LZ4 instead. There are memory footprint vs. speed considerations here, and while I’ll have to conduct some more experimentation before release, my current solution is good enough for now. I’ve got OC worlds running on less than 2GB of memory, which is pretty good, in my book.

Processing Optimization

The last bit of work I’ve done in this round of optimization was aimed at cutting down on processing. A bit of engine tinkering happened to more radically decouple NPC processing from frame rendering, in order to keep OC’s performance as smooth as possible.

Of all the AI problems that OC tackles, you might be surprised to hear that pathing is one of the most consistently difficult challenges, from a computational standpoint. Large numbers of entities finding their way around a gigantic, constantly changing, open world is hard to do in a way that won’t overwhelm a processor. One of the things I worked on was a heuristic for recognizing when a path can’t be generated. Sometimes, entities simply can’t get from point A to point B in OC worlds. Imagine a tall mesa, surrounded on all sides by vertical cliffs. There’s no way for a person to get to its top without building stairs.

But if an NPC sees something it wants to get on the top of that mesa, it’ll attempt to find a path up there. Without a heuristic to tell the NPC that it can’t get up there, this will cause the pathing system to furiously flail, considering node after node after node in a futile attempt to find a way up.

But introducing a set of heuristics to determine “I can’t get there” or “I shouldn’t get there” is a hard problem. You might be thinking of a few straightforward solutions, but there are weird scenarios that have caused issues for all the straightforward answers I’ve come to. Perhaps at some point, I’ll write an article on pathing in OC, but I think I took some good steps to decrease the likelihood of pathing choking the processor for individual entities that are trying to deal with strange sets of pathing circumstances. It was a solid day of work, but it’s yielded some promising results. More play testing will be the key to getting all of this as right as possible.

Conclusion

It was a busy couple of weeks working to improve the performance of OC, but it’s approaching a point where it’ll comfortably run sizable expeditions on lower end computers.

More updates and posts about all the gameplay introduced in February should be coming soon! Keep an eye on our Twitter @VoyagerGames for frequent updates!

Posted in Uncategorized