PerformanceWorking on the lower decks, some performance problems have been building up for awhile now. Even though the game's resolution is super low and there's no obviously fancy surface shaders going on, the geometry/object count is pretty high. Normally that wouldn't be a problem but since the 1-bit rendering technique requires two passes, sending all the geometry to the GPU twice eats up a lot of frame time. Yesterday I decided to sit down and see if I could get all the rendering done in a single pass (plus post-processing).
The Old Way
The original technique required rendering the scene in two passes:
Scene Pass 1: Sectioning The sectioning pass just draws the vertex colors, which have been pre-processed to define areas that should be separated by edges. Wireframe lines are later drawn along these edges in post.
Old method: Sectioning pass
RED = Tool-generated random hash (lower bits)
GREEN = Manually-set adjustment color
BLUE = Tool-generated random hash (upper bits)
ALPHA = unused
Scene Pass 2: Lighting/TexturingLighting/texturing runs the full Unity SurfaceShader pipeline to generate lightmaps + dynamic light + textures. Dithering and other logic is applied to these results in post.
Old method: Lighting/texturing pass
RED = Light value
GREEN = Markup value (0: normal, 0.5: ordered dithering, 1:dust)
BLUE = Texture value
ALPHA = unused
Post-processing CombinerThese scene passes are written to two render textures which are then combined in a post process to make the final buffer. Having the light and texture in separate channels from the lighting/texturing buffer enables me to adjust their gradient ramps separately, which I use to make the hard shadows and blown-out textures that help with legibility.
The final post-processed output, 30fps
Separating the two passes like this makes sense for a couple reasons:
1. Easy to visualize the two main features of the rendering style: wireframe lines and dithered 1-bit surfaces
2. Two 32-bit RGBA buffers gives plenty of space for the data.
Both of those reasons aren't worth the framerate hit though. Unity's not very good at reusing scene data for subsequent passes, so even when the sectioning pass runs at +100fps, sending all the geometry twice bogs things down too much.
CombiningSo the goal was to combine the two scene passes into one with the hope that performance improves on lower end video cards. That last bit is important because it precludes me from using MRT. For a single pass, everything has to fit in 32-bits.
One particular complication is that Unity's SurfaceShaders by default don't allow writing to the alpha channel, so you're effectively limited to just 24-bits in an RGBA buffer. I tried to be happy with that for a long time before finally tracking down a fix, which just became possible in Unity 5.
The problem. Alpha being "helped" to 1 for all surface shaders.
You can't easily edit the generated code directly, but you can redefine the offending macro in your own code, which is thankfully included
after UnityCG.cginc:
The fix requires undefining the macro in your own surface shaders
With that, you have use of the full 32-bits and can write anything to the alpha as long as it's non-zero.
Single Scene PassSo now I just had to pack 48 bits from the two separate passes into 32 bits for the single scene pass. The basic idea was to chop off the lower 8 bits of the sectioning hash (leaving 16 bits), reduce the lighting/texturing output to a single 8-bit channel, and use the alpha as a markup value to specify which "pass" the RGB values were coming from. Because the final output is dithered 1-bit, very few bits are actually required for the lighting/texturing. The result:
RGB
Alpha
I was also able to use Unity's shader Queue tags to control draw order: set the sky shader to "Queue = Background" and the dust shader to "Queue = Transparent".
Post-processing CombinerThe post-processing step now just has to do the edge detection (with 2 channels instead of 3), the darkness check (to invert wireframe lines in darkness), the dithering (bluenoise or ordered based on the alpha bits), and the dust inversion. There is some extra cost to doing more of the lighting/texturing combination in the scene shader instead of the post processing shader. But the final output looks identical to the old two-pass method, and it runs significantly faster.
Single scene pass, 60fps
Optimizing EarlyThere's always a danger in doing optimizations like this before the game content is mostly done. Now that I have an extra 30fps to work with there's a good chance I'll paint myself back in to poor performance. A common trick for smarter programmers is to keep a few secret optimizations in their pocket until the very end. If you optimize too early, the artists will just fill up the scene again and you'll be in a spot where it's much harder to get in framerate. I'll have to rely on self-discipline instead.
I am glad this worked out ok though. There's a (justified) perception that a 640x360 1-bit game should not have framerate problems on any machine. There's a lot going on behind the scenes that makes the rendering more intensive than it looks initially, but I'd rather match the perception than make excuses.