Welcome, Guest. Please login or register.

Login with username, password and session length

 
Advanced search

1411508 Posts in 69374 Topics- by 58429 Members - Latest Member: Alternalo

April 26, 2024, 08:51:13 AM

Need hosting? Check out Digital Ocean
(more details in this thread)
TIGSource ForumsDeveloperTechnical (Moderator: ThemsAllTook)Run-time Texture Atlases - Design Thread (2D games)
Pages: [1]
Print
Author Topic: Run-time Texture Atlases - Design Thread (2D games)  (Read 2330 times)
qMopey
Level 6
*


View Profile WWW
« on: November 15, 2017, 08:03:18 AM »

Edit:
Quote
OK got out an experimental release.

See post #10 below for release info!



Finally came up with a design for batching sprites in a way I actually like using texture atlases. Basic idea is to do runtime atlas creation via Least Recently Used (LRU) caches. If anyone has a moment, please look over the design -- maybe you see something stupid before I go an implement it.

Typical atlas problems:
  • edge bleeding
  • building the atlas
  • picking what images go in what atlas
  • supporting texture hotswapping/hotloading
  • making it dead-simple to integrate new images

Some of these harder problems are solved by Unity or other engines for the user. But for those doing custom tech, we actually have to worry about the last two bullet points. However, by worrying about these problems they can be solved in new or novel ways that can grant an advantage over pre-packaged engines.

Note: I don't support hardware UV repeating or mipmaps. This simplification works well for 2D games, and can work in 3D games if strict limitations on texturing are set in place at project inception.

Now for some potential solutions!
  • edge bleeding - Pinch UV coordinates inward slightly to create a numeric buffer zone between different images. I've tested this myself for the last couple years, and have never seen problems. There's no reason to pad atlases with buffer pixels. That's silly.
  • building the atlas - I've made a single-file C header for loading images, and it can construct pretty decent atlases in-memory.
  • picking what images go in what atlas - Typically the images are hand-picked to sit together in atlases to try and get images that draw at the same time in the same atlas. Instead I've opted for a run-time atlas builder; it keeps a rolling set of most recently used textures and builds atlases on the fly (more details later).
  • supporting texture hotswapping/hotloading - usually super annoying if atlases are constructed as a preprocessing step.
     A good option could simply to *not* place textures into atlases while the game is running, and just load up new or hotswapped textures individually. Instead, supporting hotswapping of textures is trivial if atlases are built at run-time.
  • making it dead-simple to integrate new images - Ideally an artist can simply save an image in a directory to grant access to the texture. This problems is very similar to hotswapping. Luckily adding new images is trivial if atlases are hidden away in a run-time API.

Run-time atlases. For 2D sprite based games I believe building atlases at run-time is a superior solution to preprocessing atlases. Ideally an API can be created for a sprite batching service. Love2D has a fairly good sprite batch API here. Sprites are pushed into a buffer from game code, and that's it. The batcher can be flushed to screen. Here's an example:

Code:
struct Texture
{
uint64_t id;
int gl_id;
int w, h;
v2 u, v;
};

#define SPRITE_MAX_FRAMES 32

struct Sprite
{
int depth = 0;
v2 scale = v2(1, 1);
transform tx = make_transform();

int paused = 0;
float seconds = 0;
int frame_current = 0;
int frame_count = 0;
Texture frames[SPRITE_MAX_FRAMES];
};

void PushSprite(void* ctx, Sprite* s);
void Flush(void* ctx);

Internally the sprite batch API (PushSprite) can be implemented in many ways. My design buffers all the sprites pushed. When the batch is flushed all the fun happens. Flush looks like this:

Code:
// preprocess
for each sprite
lookup what atlas it belongs to and assign the gl_texture
if it belongs to no atlas, it goes to the lonely_buffer
update timestamp of texture

// render
sort sprites by gl_texture
sort sprites by depth
make a batch per gl_texture
draw each batch

// postprocess
for each atlas
check for texture decay metric (greater than <N> textures are "old enough")
if hit metric, flush atlas's textures to lonely buffer and remove atlas

sort gl_textures in lonely buffer by timestamp
for each texture in lonely_buffer
if old enough, remove from buffer
*note* this can be trivially implemented due to sort order

if greater than <N> items in lonely buffer
construct atlas of most recently used textures

There is a distinction between a texture and a gl_texture. A texture is the struct shown above in the sprite batch example. A gl_texture is a handle returned from glGenTextures. An atlas is one glGenTextures. A texture in the lonely buffer belongs to no atlas yet, and is a standalone glGenTextures.

The overall summary is: textures are first put into the lonely_buffer. Each texture is also a gl_texture, which implies a separate draw call. The lonely_buffer is sorted once it contains enough entries to make an atlas (on the previous Flush from the last render tick). The most recently used entries are prioritized to construct an atlas. To draw, all sprite instances are sorted first by gl_texture, then depth (and/or other material/shader parameters), then batches are constructed and issued as draw calls. All sprites drawn have their associated texture timestamps updated. After rendering each atlases is scanned to see if it contains a lot of old and unused textures. If so, all atlas textures that were recently used are flushed to the lonely_buffer. Then the lonely_buffer is sorted and all entries that are too old are culled. If the lonely_buffer is large enough, the most recent timestamped entries are removed to construct a new atlas.

The entire run-time atlas system is completely hidden. It proactively groups texture used at roughly the same time into atlases and adapts as time goes on. The number of draw calls issued can always be calculated. Assuming no differences in depth/material stuff, there is one draw call per atlas and one per unique gl_texture in the lonely_buffer.

The cons of this runtime system:
  • Performance hits upon pre and post process stages
  • System is fairly complex and nontrivial to implement
  • If each texture comes from a separate image on-disk, opening individual files on disk can be very slow. For example on Windows just performing permissions and related work to open a file is time-consuming.

Luckily the first two points can be completely mitigated by good old-fashioned engineering chops. This kind of system will be performance dominated by cache coherency. Some clever planning can ensure the performance hits are really minimal and negligable compared to the time saved by dramatically lowering the draw call count.

The third problem is a big one. This can be solved by some kind of file abstraction system, and probably shouldn't be solved by a sprite batching system. For my own game I'm using a virtual path system that can mount directories or .zip files without requiring any code differences between the two. If images are zipped up and mounted, then file io permissions only happen once on the zip file, effectively avoiding the entire problem.

Finally, another system can watch the batcher and record atlases at different parts of the game with a log. When ready to be released and shipped, the logs can be used to do a single preprocessing step and construct on-disk atlases. The run-time atlas system can be swapped by a different batch API implementation that uses the preprocessed atlases. This is all optional, and there's nothing stopping anyone from just shipping the run-time atlas system too Smiley

Conclusion:
- Run-time atlas can make the asset-pipeline for hotswapping or adding new images to the game trivial. All the atlas complexity is hidden behind a run-time API, instead of living on-disk.
- Atlases are built at run-time based on some metrics to guarantee textures are actually drawn at the same time as their neighboring textures.
- Additional bookkeeping and complexity is required at run-time. Though it can be hidden behind a good API, there are still performance costs.
- For large numbers of textures some kind of file abstraction would become necessary to avoid the large run-time cost of opening many individual files.
- Querying for pixels happens more often than traditional preprocessed assets. Since pixels need to be used when either A) a texture is placed in the lonely buffer, or B) a texture is placed into an atlas, there is a significant RAM hit or disk i/o hit involved. Either image pixels must be stored in RAM in case they are needed, or pixels must be fetched off disk as needed.

Thoughts? Comments? Concerns?
« Last Edit: November 20, 2017, 10:19:11 AM by qMopey » Logged
Crimsontide
Level 5
*****


View Profile
« Reply #1 on: November 15, 2017, 12:40:01 PM »

I'm sort of confused, doesn't bindless texturing make atlases obsolete?
Logged
qMopey
Level 6
*


View Profile WWW
« Reply #2 on: November 15, 2017, 12:43:42 PM »

Yep totally does. But this stuff is necessary if you want to run on hardware that doesn't support bindless texturing. And so this kind of complexity is still relevant, and will be for some time Sad

Edit: Nice thing about a good sprite pushing API is a bindless texture implementation can be trivially supported without touching user code Cheesy
« Last Edit: November 15, 2017, 12:55:21 PM by qMopey » Logged
Crimsontide
Level 5
*****


View Profile
« Reply #3 on: November 15, 2017, 01:35:25 PM »

Yep totally does. But this stuff is necessary if you want to run on hardware that doesn't support bindless texturing. And so this kind of complexity is still relevant, and will be for some time Sad

Edit: Nice thing about a good sprite pushing API is a bindless texture implementation can be trivially supported without touching user code Cheesy

Ok, sorry, not trying to undermine what you're doing.  Was just curious.  Another question if I may (playing devils advocate), what hardware out there doesn't support bindless textures?
Logged
qMopey
Level 6
*


View Profile WWW
« Reply #4 on: November 15, 2017, 01:43:23 PM »

Yep totally does. But this stuff is necessary if you want to run on hardware that doesn't support bindless texturing. And so this kind of complexity is still relevant, and will be for some time Sad

Edit: Nice thing about a good sprite pushing API is a bindless texture implementation can be trivially supported without touching user code Cheesy

Ok, sorry, not trying to undermine what you're doing.  Was just curious.  Another question if I may (playing devils advocate), what hardware out there doesn't support bindless textures?

No problem! I appreciate the question Smiley

I was trying to look this up last night actually for OpenGL/ES. From what I read, it seems OpenGL 4.4 and GLES 3.2 are the minimum GL requirements for bindless textures. Anything that doesn't support those versions or newer would need an alternative solution like atlases. For iOS it seems like most popular devices can support GLES 3.0, but 3.2 is needed for EXT_gpu_shader5 and IMG_bindless_texture:
https://en.wikipedia.org/wiki/OpenGL_ES
https://www.khronos.org/registry/OpenGL/extensions/IMG/IMG_bindless_texture.txt
https://www.khronos.org/registry/OpenGL/extensions/EXT/EXT_gpu_shader5.txt

Any older graphics cards for desktops won't have GL 4.4+ or GLES 3.2+ support.
Logged
JWki
Level 4
****


View Profile
« Reply #5 on: November 16, 2017, 12:29:04 AM »

I always feel so condescending when people tell me they're trying to support as wide a range of GPUs as they can and thus use like GL 3.0 or whatever and I'm like I won't even touch anything < 4.5 for my own sanity. But then again I mostly do 3D and either exactly know what GPUs my code will have to run on or just assume that it won't be finished too soon anyways so targeting GeForce400+ / Radeon HD7000+ is fine.
Logged
qMopey
Level 6
*


View Profile WWW
« Reply #6 on: November 16, 2017, 07:35:58 AM »

I always feel so condescending when people tell me they're trying to support as wide a range of GPUs as they can

You feel like you are condescending to them, or then to you? Tongue

It is definitely a matter of preference. Personally I just really like older 2D games and grew up playing them. In a way I don’t want to “betray” the older hardware. There’s also the idea that sales could potentially be higher if more hardware is supported. Especially outside the US.
Logged
Polly
Level 6
*



View Profile
« Reply #7 on: November 16, 2017, 07:43:01 AM »

I always feel so condescending when people tell me they're trying to support as wide a range of GPUs as they can and thus use like GL 3.0 or whatever and I'm like I won't even touch anything < 4.5 for my own sanity.

Don't worry about it. Different people / projects, different priorities Wink
Logged
qMopey
Level 6
*


View Profile WWW
« Reply #8 on: November 16, 2017, 11:39:44 AM »

Working on experimenting with an implementation. Turns out this problem can make for a pretty decent single-header lib. My WIP currently uses 4 different user defined callbacks... That's kind of a lot and is pushing on boundary of acceptable. We'll see if it turns out OK in the end *fingers crossed*.

The exciting part is this cool Swedish guy Mattias Gustavsson has been posting high quality single-file headers on github, one of which was a hashtable implementation in C. It's a clean and cache-efficient implementation. It can also return to the user an array of entries/keys, and they can be looped over like a typical array! Brilliant.

I was talking to Mattias on Twitter, and discovered the table can trivially be sorted. It's an actual sorted array, but also a hash table. Perfect for implementing priority queues or LRU caches. Amazing. His whole hashtable implementation is quite an epiphany!

https://twitter.com/RandyPGaul/status/931243522383945728
https://github.com/mattiasgustavsson/libs/issues/7
https://github.com/mattiasgustavsson/libs

Whenever I finish up this experiment I'll see about posting the whole atlas header-lib here for fun Smiley
Logged
qMopey
Level 6
*


View Profile WWW
« Reply #9 on: November 16, 2017, 05:11:21 PM »

Got an implementation working. Still need to test it out some more, but so far API looks like this:

Logged
qMopey
Level 6
*


View Profile WWW
« Reply #10 on: November 20, 2017, 09:51:37 AM »

OK got out an experimental release.

Initialization:

Code:
// setup tinyspritebatch configuration
// this configuration is specialized to test out the demo. don't use these settings
// in your own project. Instead, start with `spritebatch_set_default_config`.

spritebatch_config_t config = get_demo_config();
//spritebatch_set_default_config(&config); // turn default config off to test out demo

// assign the 4 callbacks
config.batch_callback = batch_report;                       // report batches of sprites from `spritebatch_flush`
config.get_pixels_callback = get_pixels;                    // used to retrieve image pixels from `spritebatch_flush` and `spritebatch_defrag`
config.generate_texture_callback = generate_texture_handle; // used to generate a texture handle from `spritebatch_flush` and `spritebatch_defrag`
config.delete_texture_callback = destroy_texture_handle;    // used to destroy a texture handle from `spritebatch_defrag`

// initialize tinyspritebatch
spritebatch_init(&sb, &config);

Pushing sprites from the game to tinyspritebatch:

Code:
#define push_sprite(sp) \
spritebatch_push(&sb, sp.image_id, images[sp.image_id].w, images[sp.image_id].h, sp.x, sp.y, sp.sx, sp.sy, sp.c, sp.s, (SPRITEBATCH_U64)sp.depth)

void scene0()
{
sprite_t basu = make_sprite(0, 0, 0, 1.0f, 0, 0);
sprite_t bat = make_sprite(1, 30, 30, 1.0f, 0, 0);
sprite_t behemoth = make_sprite(2, 80, 30, 1.0f, 3.14159265f / 4.0f, 0);
sprite_t crow = make_sprite(3, 70, -50, 1.0f, -3.14159265f / 4.0f, 0);

push_sprite(basu);
push_sprite(bat);
push_sprite(behemoth);
push_sprite(crow);
}

And requesting batches:

Code:
// Run tinyspritebatch to find sprite batches.
// This is the most basic usage of tinypsritebatch, one defrag, tick and flush per game loop.
// It is also possible to only use defrag once every N frames.
// tick can also be called at different time intervals (for example, once per game update
// but not necessarily once per screen render).
spritebatch_defrag(&sb);
spritebatch_tick(&sb);
spritebatch_flush(&sb);
sprite_verts_count = 0;

If anyone were so inclined, take a peek at the header, and maybe even try out the demo, featuring some of Cave Story's sprites as test images Smiley


Overall I'm not sure how I feel needing sporadic access to image pixels... At the very least, images could be stored compressed in RAM and sent to tinyspratch after a decompress step. This could be a pretty good compromise to lower RAM usage. Still though, even with the performance hits, for 2D games I am still feeling like this strategy may be superior to preprocessing atlases, at least for development. Personally, getting rid of on-disk atlases is just too nice for trivializing asset hotloading for me to pass up Smiley

The initial design for tinyspritebatch mostly worked, but I realized after implementing I forgot something important. When an atlas is decayed, only the live-textures remain, and are put into a new atlas. Once those decayed textures come back, they are again placed into a new atlas. This leaves two fragmented atlases. Unless the user stops drawing the images in each atlas, they will remain in two separate atlases. The solution was to introduce a new code path for merging two atlases together based on some metric Smiley In the end it worked out without too much fuss.
« Last Edit: November 20, 2017, 10:13:16 AM by qMopey » Logged
ferreiradaselva
Level 3
***



View Profile
« Reply #11 on: November 20, 2017, 03:52:48 PM »

Just passing by to say this is looking awesome  Grin

I like it's "context-based" (a context structure with callbacks).
Logged

qMopey
Level 6
*


View Profile WWW
« Reply #12 on: November 24, 2017, 09:59:03 PM »

Hey thanks! I realize this topic is not something people think about very often, and prefer to just sweep atlases under the rug with a preprocessing step. But, these kinds of changes are what can make a custom engine worth the hassle Smiley
Logged
qMopey
Level 6
*


View Profile WWW
« Reply #13 on: April 14, 2018, 06:24:18 PM »

Finished first release!



It seems to be working quite well, better than I had hoped for. This header has turned out to be very valuable, and I'm happy to share it, or at least the overall idea here on these forums Smiley

The tiles in the above gif are loaded with another header called tinytiled.h. It loads JSON exported map files from the Tiled map editor.

Source: https://github.com/RandyGaul/tinyheaders/blob/master/tinyspritebatch.h
Demo from above gif via SDL and other similar headers: https://github.com/RandyGaul/tinyheaders/tree/master/examples_tinygl_and_tinytiled_and_tinyspritebatch
Logged
Pages: [1]
Print
Jump to:  

Theme orange-lt created by panic