Friday, April 29, 2016

ZorbaTHut Talks Multicore!







Keep up with the amazing progress that we’ve made with Multicore Rendering in RIFT with this latest update from Lead Rendering Engineer Ben “ZorbaTHut” Rog-Wilhelm!


Hello Telarans!

As many of you know, we’ve been working hard on upgrading Multicore Rendering. Now that we’ve implemented improvements, let’s talk more about multithreading as it pertains to Rendering in RIFT. Warning: a lot of this is technical talk and may not be suited to all readers – some may want to escape back into RIFT to experience the changes directly rather than read about them! For our fellow techno-geeks, let’s continue…

In terms of the code that runs on your computer, “rendering” can be roughly split into two parts; “deciding exactly how to render stuff” and “sending those render commands to the graphics card.” Both of these tend to be expensive, and RIFT, as with most other games, used to do all of that work in a single thread. Note that while I’m dividing this into two parts, the actual rendering process isn’t a simple matter of doing one part followed by another – the “render” process consists of both interleaved in a very complicated matter.

With the exception of the newest rendering interfaces, all modern rendering APIs effectively require developers to send render commands to the graphics card on a single thread. There’s not much we can do to affect this. “Multicore rendering” therefore involves mostly the first step, but with respect to the limitation of the second step.

When you’re dealing with any project the size of Rift’s multicore rendering system, you have to split up the job into manageable chunks. This feature took over a year to complete and so there was a lot of complex scheduling to split it into manageable chunks.

First, we had to deal with global state. What does this mean? Every time a graphics card renders something, it needs a destination, known as a “render target”. The screen is the most obvious destination, but we frequently render to textures, for use in later render steps. (In fact, if you’re using the high-quality renderer, virtually all of our rendering is done to intermediate textures!) Our rendering system assumed that the graphics system would have exactly one render target at a time. This is a perfectly reasonable assumption with a single-threaded renderer, but has to be fixed for multicore, where you might have five threads all generating commands for different render targets. That information was in our core rendering module, “Renderer”, which represented the device itself, handled resource allocation, and provided information about the device’s capabilities. We created a new module, “Context”, intended to represent the rendering state of a single thread (including the render target and many other similar chunks of rendering state), then moved thousands of lines of code from Renderer into our new Context. Our rendering system was still single-threaded, so we still had exactly one Context, but it was a necessary organizational step.

An important concept in programming is “abstraction.” Take something like DirectX. It’s designed to give developers extensive control over graphics hardware, and it succeeds, but many of the features it provides are difficult to harness directly. When a programmer sees something like this they often build a system on top of it that is easier to use and less bug-prone. Unfortunately this always introduces limitations, and so high-performance areas are sometimes built “to the metal,” avoiding the abstractions and interacting directly with DirectX for the sake of sheer speed. Since all our multithreading work took place in our abstraction layer, these “fast” areas were, ironically, now standing in the way of performance.

Some areas could be easily changed, some had to be rewritten almost entirely; Rift’s lighting code is new as of several months ago, and for weeks before that, I was running around the world flipping between the new and old system just to be absolutely certain the new system worked exactly like the old.

Finally, we could extract that third rendering step, “send the render commands to the graphics card,” from the other steps. As long as we were sending render commands directly to the graphics hardware we would never be able to multithread the rest of our rendering pipeline. We essentially inserted our own layer in between the rendering subsystem and DirectX; instead of sending commands to DirectX, it would store the commands in a carefully-coded memory-dense buffer so we could stream those commands out as quickly as possible later. This took a lot of work to get right. The process ended up being rolled into the above-mentioned “Context” module; we split it into ImmediateContext, which sent commands straight to DirectX, and BufferedContext, which stored up commands for future dispatching in a rapid burst.

At this point we could change the entire renderer into “buffered” mode – processing everything in a single thread, storing all the commands in a temporary buffer, and then sending them in a batch. This was much slower than our original single-threaded mode but useful for debugging the buffering system in isolation; we’ve preserved that option in our debug builds all the way up to today.

The next step was to actually use these tools to split our rendering into multiple threads. That should be easy, right? After all, we’ve dealt with our global state, we’ve set up a serialization system so all our actual commands can be sent to the graphics card in the right order – we should be able to just create a pile of Contexts, aim each one at a chunk of our game, and it should just work! Well, as anyone who’s tried to multithread an existing massive system knows, it’s never that easy. While we had a semi-functioning renderer working quite quickly, we spent months tracking down weird timing issues, thread contention bugs, and bits of global state that we were not aware were global. This was completely expected – there’s no way to do this besides trying it and observing what happens – but it was still a very long and gradual process.

As we squashed bugs, it became clear that this was also not providing the performance gains we’d hoped to see at this stage. I’m going to make up some numbers here; bear with me. Pretend that, before this change, the entire rendering system from beginning to end took 10ms, including generating the rendering commands on a single thread and sending those commands to the graphics card. After all this work, we found that we were spending about 4ms on generating the render commands across multiple threads and storing them in buffers, but then another 4ms sending those render commands out to the graphics card. This gives us a gain of 2ms, but that’s not really much of a gain; perhaps a 10% framerate increase at best. We started our Multicore Closed Beta around this time to help us squash the remaining bugs, but we knew we had a lot more work to do for the Multicore update to achieve the goals we’d set.

Up until this point, we’d simply replaced our single-threaded rendering with a chunk of multicore rendering that internally ran in parallel, but returned to the main thread only when all of that processing was complete. (That’s an oversimplification, but it’s basically accurate.) In order to gain the performance we wanted, we’d have to start processing the next frame while still sending the rendering commands from the previous frame.

This was a pretty significant challenge. Like most games, we rendered our UI last, overlaying it on top of a finished 3d scene. However, our UI system is done within a third-party package; we have source code for it, but converting it to use our rendering abstraction would be an enormous job. Instead, we re-organized our render pathways so we rendered our UI first, onto a separate temporary buffer. Then we’d render our 3d scene, and as a final step, we’d composite our UI onto that 3d scene.

This let us continue sending render commands to the graphics card until the next frame is about halfway done, overlapping all the network communication and game state update that has to be done before rendering the next frame. In most cases, this segment takes more than 4ms, so sending our render commands to the graphics card is effectively “free” – it happens simultaneously with something else we need to do anyway. This led to the next and one of the most key changes that really started to deliver the improvements we wanted.

Rift has a step called the “scene update”.. This is where we update the positions of all objects, along with bounding boxes, animations, level-of-detail mesh states, fades in progress, and a small mountain of other tasks. RIFT has always had some elements that utilized multiple cores, and this is one of them, but it’s always been limited to a single thread. Up until this point, the rest of the game was paced such that our serial scene update always finished on time, but the multicore rendering optimizations meant that aspect of RIFT needed to speed up to avoid being the bottleneck. The final improvement we made (so far!) is to do a better job of threading that scene update process. This could in theory be done in single-threadedwithout the multicore renderer mode as well, and we’ll probably enable it by default for everyone once we’re satisfied it works, but right now it’s tied to the multicore checkbox.

Multicore is officially in “open beta” now, and is available for use by everyone. We’ve been watching stability and crash reports, and while we still see a few very uncommon issues, we’re at the point where multicore is just as stable as the old single-threaded renderer. We’re seeing performance gains ranging up to 50% (sometimes higher). We strongly recommend giving it a try1!
Note that there are issues in the low-quality renderer that currently prevent us from offering a multicore low-quality renderer; however, if you’re using the low-quality renderer, you may find the high-quality multicore renderer is actually faster – give it a shot!

At Trion, we’re always looking for ways to improve gaming experience for everyone, and this Multicore been a really productive effort. It’s exciting to see the very positive feedback from players, and hope that you’ll log in soon to try it out too!

Many thanks to all of the players who helped us Alpha test the Multicore Update – without their contribution, this wouldn’t have gone nearly as smoothly.

Zorbathut
Ben Rog-Wilhelm, RIFT Lead Rendering Engineer

0 comments:

Post a Comment

 

Star Wars Gaming news

RIFT: News and guides © 2009