Edit:
OK got out an experimental release.
See post #10 below for release info!
Finally came up with a design for batching sprites in a way I actually like using texture atlases. Basic idea is to do runtime atlas creation via Least Recently Used (LRU) caches. If anyone has a moment, please look over the design -- maybe you see something stupid before I go an implement it.
Typical atlas problems:
- edge bleeding
- building the atlas
- picking what images go in what atlas
- supporting texture hotswapping/hotloading
- making it dead-simple to integrate new images
Some of these harder problems are solved by Unity or other engines for the user. But for those doing custom tech, we actually have to worry about the last two bullet points. However, by worrying about these problems they can be solved in new or novel ways that can grant an advantage over pre-packaged engines.
Note: I don't support hardware UV repeating or mipmaps. This simplification works well for 2D games, and can work in 3D games if strict limitations on texturing are set in place at project inception.
Now for some potential solutions!
- edge bleeding - Pinch UV coordinates inward slightly to create a numeric buffer zone between different images. I've tested this myself for the last couple years, and have never seen problems. There's no reason to pad atlases with buffer pixels. That's silly.
- building the atlas - I've made a single-file C header for loading images, and it can construct pretty decent atlases in-memory.
- picking what images go in what atlas - Typically the images are hand-picked to sit together in atlases to try and get images that draw at the same time in the same atlas. Instead I've opted for a run-time atlas builder; it keeps a rolling set of most recently used textures and builds atlases on the fly (more details later).
- supporting texture hotswapping/hotloading - usually super annoying if atlases are constructed as a preprocessing step.
A good option could simply to *not* place textures into atlases while the game is running, and just load up new or hotswapped textures individually. Instead, supporting hotswapping of textures is trivial if atlases are built at run-time. - making it dead-simple to integrate new images - Ideally an artist can simply save an image in a directory to grant access to the texture. This problems is very similar to hotswapping. Luckily adding new images is trivial if atlases are hidden away in a run-time API.
Run-time atlases. For 2D sprite based games I believe building atlases at run-time is a superior solution to preprocessing atlases. Ideally an API can be created for a sprite batching service. Love2D has a fairly good
sprite batch API here. Sprites are pushed into a buffer from game code, and that's it. The batcher can be flushed to screen. Here's an example:
struct Texture
{
uint64_t id;
int gl_id;
int w, h;
v2 u, v;
};
#define SPRITE_MAX_FRAMES 32
struct Sprite
{
int depth = 0;
v2 scale = v2(1, 1);
transform tx = make_transform();
int paused = 0;
float seconds = 0;
int frame_current = 0;
int frame_count = 0;
Texture frames[SPRITE_MAX_FRAMES];
};
void PushSprite(void* ctx, Sprite* s);
void Flush(void* ctx);
Internally the sprite batch API (PushSprite) can be implemented in many ways. My design buffers all the sprites pushed. When the batch is flushed all the fun happens. Flush looks like this:
// preprocess
for each sprite
lookup what atlas it belongs to and assign the gl_texture
if it belongs to no atlas, it goes to the lonely_buffer
update timestamp of texture
// render
sort sprites by gl_texture
sort sprites by depth
make a batch per gl_texture
draw each batch
// postprocess
for each atlas
check for texture decay metric (greater than <N> textures are "old enough")
if hit metric, flush atlas's textures to lonely buffer and remove atlas
sort gl_textures in lonely buffer by timestamp
for each texture in lonely_buffer
if old enough, remove from buffer
*note* this can be trivially implemented due to sort order
if greater than <N> items in lonely buffer
construct atlas of most recently used textures
There is a distinction between a texture and a gl_texture. A texture is the struct shown above in the sprite batch example. A gl_texture is a handle returned from glGenTextures. An atlas is one glGenTextures. A texture in the lonely buffer belongs to no atlas yet, and is a standalone glGenTextures.
The overall summary is: textures are first put into the lonely_buffer. Each texture is also a gl_texture, which implies a separate draw call. The lonely_buffer is sorted once it contains enough entries to make an atlas (on the previous Flush from the last render tick). The most recently used entries are prioritized to construct an atlas. To draw, all sprite instances are sorted first by gl_texture, then depth (and/or other material/shader parameters), then batches are constructed and issued as draw calls. All sprites drawn have their associated texture timestamps updated. After rendering each atlases is scanned to see if it contains a lot of old and unused textures. If so, all atlas textures that were recently used are flushed to the lonely_buffer. Then the lonely_buffer is sorted and all entries that are too old are culled. If the lonely_buffer is large enough, the most recent timestamped entries are removed to construct a new atlas.
The entire run-time atlas system is completely hidden. It proactively groups texture used at roughly the same time into atlases and adapts as time goes on. The number of draw calls issued can always be calculated. Assuming no differences in depth/material stuff, there is one draw call per atlas and one per unique gl_texture in the lonely_buffer.
The cons of this runtime system:
- Performance hits upon pre and post process stages
- System is fairly complex and nontrivial to implement
- If each texture comes from a separate image on-disk, opening individual files on disk can be very slow. For example on Windows just performing permissions and related work to open a file is time-consuming.
Luckily the first two points can be completely mitigated by good old-fashioned engineering chops. This kind of system will be performance dominated by cache coherency. Some clever planning can ensure the performance hits are really minimal and negligable compared to the time saved by dramatically lowering the draw call count.
The third problem is a big one. This can be solved by some kind of file abstraction system, and probably shouldn't be solved by a sprite batching system. For my own game I'm using a virtual path system that can mount directories or .zip files without requiring any code differences between the two. If images are zipped up and mounted, then file io permissions only happen once on the zip file, effectively avoiding the entire problem.
Finally, another system can watch the batcher and record atlases at different parts of the game with a log. When ready to be released and shipped, the logs can be used to do a single preprocessing step and construct on-disk atlases. The run-time atlas system can be swapped by a different batch API implementation that uses the preprocessed atlases. This is all optional, and there's nothing stopping anyone from just shipping the run-time atlas system too
Conclusion:
- Run-time atlas can make the asset-pipeline for hotswapping or adding new images to the game trivial. All the atlas complexity is hidden behind a run-time API, instead of living on-disk.
- Atlases are built at run-time based on some metrics to guarantee textures are actually drawn at the same time as their neighboring textures.
- Additional bookkeeping and complexity is required at run-time. Though it can be hidden behind a good API, there are still performance costs.
- For large numbers of textures some kind of file abstraction would become necessary to avoid the large run-time cost of opening many individual files.
- Querying for pixels happens more often than traditional preprocessed assets. Since pixels need to be used when either A) a texture is placed in the lonely buffer, or B) a texture is placed into an atlas, there is a significant RAM hit or disk i/o hit involved. Either image pixels must be stored in RAM in case they are needed, or pixels must be fetched off disk as needed.
Thoughts? Comments? Concerns?