I've been busy with moving / conferences / job / etc. and haven't done too much work on Dwarfcorp in the meantime.
A few days ago, I was looking at the core of the rendering system to figure out why sometimes, the game runs at >60 FPS, but while in others it runs at something like 10 FPS. I narrowed down the problem to trees/shrubs. If there are more than something like 20 trees on the screen, the game slows to a crawl. It was pretty embarassing, as it caused me to avoid making any videos in tree-filled areas. In the core of the rendering system, I was doing something called "batching" where I would just draw objects that shared the same properties (like texture) together, but still make a large number of draw calls. For grass motes, I was just putting all the grass sprites into a huge model (that wasted a bunch of memory).
The trouble was, for trees, having too many on the screen would cause a huge drop in FPS due to the overhead of making all the draw calls. Ditto with other batched objects like the particles in the particle system. For the grass, I got decent-ish FPS, but each grass chunk uses a large amount of memory.
I discovered an alternative called "hardware instancing", where you send data to draw the same set of triangles over and over again with different properties. This is perfect for my game, since I mostly just draw the same (simple) model over and over again. With hardware instancing, I can get the speed benefits of putting everything into a model, along with the memory benefits of batching.
I also discovered that since I'm using pixel art without anti-aliasing, I can just skip Z-sorting and instead use alpha testing for binary transparency. This is *way* faster than sorting the sprites, and fixes the z-fighting issues I was having.
I threw together a little test of these two things together to see if they worked in principle. I give you 10,000 tree sprites sorted perfectly and running at >60 FPS:
In the future, I will re-write the way static geometry is handled so that I can take advantage of this and run the game much faster.
I also fixed the AI problems I was talking about by hacking at it and adding a few special cases to prevent dwarves from, say, picking up the same resource or placing a block at the same location. They do this just by "reserving" certain resources for themselves, similar to the concept of how a mutex works in multithreaded applications.
One of the cool things that the new Sim City did was to treat all moving entities as a particle system, essentially seeing the homes as faucets turning on to spill its inhabitants down the streets, flowing into the factories which acted like buckets with localized gravity in them... sort of.*
Maybe that sort of abstraction isn't necessary here though.
Another new idea is the "flow fields" of Planetary Annihilation, if you care to watch that video. I found it really incredible. Maybe these ideas aren't suited here, but perhaps considering them will yield some new inspiration?
* Sort of not cool that the marketing department then stated the exact opposite, but still.
I think flow fields and other techniques for pathfinding are cool for some applications. They're very efficient, and don't require much intelligence on the part of the agent. The trouble is, an agent in a flowfield can't easily form a "long term plan" and will fall into local minima, perhaps getting stuck in a loop. I want the dwarves in my game to be able to decide "I need to go to this stockpile, grab this thing, and then take it over there." without the player telling them each step. At the moment, planning isn't a huge bottleneck -- its just annoying for me to code.
So what's the bottleneck now? Is the actual rendering an issue, or is it mostly iterating through the data? Are you storing full chunks by slice/row/column, or as smaller chunks for (better?) cache access? Since there appear to be sizable swaths of the same voxel type, can those be aggregated and rendered together or would that be too complex for not much reward?
Right now I apparently spend 36% of my time on updating (AI, physics, input, etc.), and the remainder on rendering. I spend most of my rendering time rendering things like trees and shrubs. The actual voxel rendering is now so optimized that it's negligible in comparison. Even rendering the water is pretty fast.
Any thoughts on using behavior trees for your AI? Don't take that as a hint to go off on a coding tangent though
. For my own debugging, I've had difficulty balancing the need to visualize what's going on in my behavior tree with the time it takes to implement the feedback visuals. So far, I really don't have much feedback beyond drawing simple debug gizmos and relying on lots of unit tests for the core.
At my job (at a robotics lab), we started out at the very beginning implementing behavior trees, and code everything at that level. It turns out that the benefits of behavior trees (online code creation, inspecting the logic as it runs, automatic parallelization, etc.) are *far* outweighed (at least in our experience) by the numerous disadvantages. To implement even basic logical constructs (like nested if-else statements, loops, switches, etc.) requires a ton of boilerplate code. Sharing state requires boilerplate code as well. Even with operator overloading, or using data files, constructing behaviors is still tedious. I guess this is why most games I have heard of resort to scripting languages of some kind rather than state machines, action planning, or behavior trees.