TIGSource Forums

Developer => Technical => Topic started by: giantrobotbee on November 07, 2013, 06:40:15 AM



Title: Validate this Architectural Idea
Post by: giantrobotbee on November 07, 2013, 06:40:15 AM
I hate working in a bubble, especially when I'm trying to learn something new. I'm rolling an engine from scratch for a 2D game I'm working on. I know that I could use one of the many tools, libraries, or engines out there to do this easier, but I like the holistic approach.

Anyways, here's where I'm at:

I'm using OpenGL programmable pipeline (3.x) with C++ and a small library that mostly resembles the Artemis framework for Entity/Component deliciousness. That all aside, what I want to do is manage a few texture atlases, perhaps one for all dynamic sprites (animated) and another for static backgrounds and other things that don't animate.

My understanding is that, because each texture's actual texture data must be uploaded as a sampler2D uniform, and I can't change uniforms mid draw-call, that I'd need to rebind the texture uniforms for my current texture and then call draw on that batch.

So my idea was to have a "MaterialManager" which generates Material objects that encompass a Shader and a Texture. When I draw, I want to take all active meshes that share a material, batch them into a single VBO, bind the texture, and draw. Then repeat for the next active Material. The idea here is that I'd only be making two MAYBE three draw calls per frame. My assumption is that the vertex layout is identical for each batch (since they're all just 2D quads with XYZ, RGB, and UV coords per vertex), so I can reuse the VAO and actually the VBO too, since I'd just be replacing the data with glBufferData() for each batch.

My other assumption here is that I would transform all vertices on the CPU. Each Mesh would own it's model matrix, the Camera object owns the View matrix and then right before I upload a batch, I multiply the entire batch by the Projection matrix. My understanding is that this is acceptable since the transforms I've seen look like: finalVert = (Proj * (View * (Model * vert)))

Does this make sense? Have I totally gone off the rails? Have I missed the point? I'm willing to accept this approach is totally bonkers, but I'll never know if I don't ask. Again, I hate working in a bubble.

Any feedback is appreciated. Cheers!


Title: Re: Validate this Architectural Idea
Post by: motorherp on November 07, 2013, 06:53:43 AM
(http://www.aic.cuhk.edu.hk/web8/Hi%20res/car%20and%20wheel.jpg)

Batching objects by material type to save draw calls has been the industry standard way of doing things since the beginning of time, there's really nothing new or out of the ordinary about what you are suggesting.  Even if you want to write your own engine, I'd really recommend reading up on current methods and techniques, you'd save yourself a lot of time.


Title: Re: Validate this Architectural Idea
Post by: giantrobotbee on November 07, 2013, 07:04:16 AM

Batching objects by material type to save draw calls has been the industry standard way of doing things since the beginning of time, there's really nothing new or out of the ordinary about what you are suggesting.  Even if you want to write your own engine, I'd really recommend reading up on current methods and techniques, you'd save yourself a lot of time.

Save time? What do I look like, a developer? Oh, wait... :facepalm:

No, but seriously, thanks for the feedback. I know I should go read more, but I worry that time spent reading is time spent not coding. Though there's a balance in there somewhere. TO THE LIBRARY!


Title: Re: Validate this Architectural Idea
Post by: soryy708 on November 07, 2013, 07:09:03 AM
Quote
has been the industry standard way of doing things
Well, he works on a bubble...


Title: Re: Validate this Architectural Idea
Post by: wbahnassi on November 07, 2013, 10:13:00 AM
Unless you plan on drawing over 500 different objects in a frame, there is little use of batching altogether (assuming PC here, mobiles are different). If your many objects are of a few different kinds then the batching system you suggested maybe too much of a solution. You can also just batch your sprites by texture/state the way XNA sprites do (keeps batching while drawing sprites using the same texture, then sends the batch and opens a new one upon receiving a new sprite texture).

Now the true downside of using your approach is that you cannot do scaling/rotation to your sprites without sacrificing borders around their art in the atlas (or else you bleed from adjacent sprite art in the atlas). I completely hate to even to go to that venue, so take this into consideration... or until HW vendors allow texture addressing modes to work on sub-rectangles :noir:


Title: Re: Validate this Architectural Idea
Post by: Gregg Williams on November 07, 2013, 10:25:07 AM
Yes thats pretty normal. For rotation/scaling/linear filtering bleed, you'll probably want to use a texture packer that can add extra border pixels or transparency around each sprite automatically as part of the texture atlas packing process.


Title: Re: Validate this Architectural Idea
Post by: Fallsburg on November 07, 2013, 10:51:48 AM
Maybe I'm not understanding, but why are scaling/rotation an issue with bleeding?  He wouldn't need to scale/rotate the uvs, only the end sprite which should be unaffected by such rotation.  Now, there can be texture filtering issues, so yeah, you would either need to pad or pack your sprites properly to handle that, but other than that there shouldn't be any issues.


Title: Re: Validate this Architectural Idea
Post by: giantrobotbee on November 07, 2013, 03:15:03 PM
Unless you plan on drawing over 500 different objects in a frame, there is little use of batching altogether (assuming PC here, mobiles are different). If your many objects are of a few different kinds then the batching system you suggested maybe too much of a solution. You can also just batch your sprites by texture/state the way XNA sprites do (keeps batching while drawing sprites using the same texture, then sends the batch and opens a new one upon receiving a new sprite texture).

Now the true downside of using your approach is that you cannot do scaling/rotation to your sprites without sacrificing borders around their art in the atlas (or else you bleed from adjacent sprite art in the atlas). I completely hate to even to go to that venue, so take this into consideration... or until HW vendors allow texture addressing modes to work on sub-rectangles :noir:

Forgive me if I'm overblowing the issue, but my rationale for batching, at the core, is to avoid running out of texture memory on the GPU and also to keep the overall size of the game down. Not so much because any target platform demands it (it is a PC only thing at the moment), but because I like keeping things small whenever possible. It's the web developer in me. Even if I can cut it down to three or four batches per frame to draw <500 objects, that's still far fewer than 500 textures and 500 draw calls.


Title: Re: Validate this Architectural Idea
Post by: wbahnassi on November 10, 2013, 02:02:20 PM
I'm not sure how batching can save on texture memory. If any, it actually wastes memory because every sprite now has to grow in size by 8 pixels at least in each dimension.


Title: Re: Validate this Architectural Idea
Post by: giantrobotbee on November 10, 2013, 02:20:25 PM
I'm not sure how batching can save on texture memory. If any, it actually wastes memory because every sprite now has to grow in size by 8 pixels at least in each dimension.

I don't understand. If I have two big texture atlases, isn't that less than 500 separate images that I load for every mesh on the screen?


Title: Re: Validate this Architectural Idea
Post by: wbahnassi on November 10, 2013, 02:37:35 PM
I don't follow. What do you mean by 500 separate images for every mesh? ???

Grouping sprites in an atlas means simply collecting the sprite images into one big image that has them laid out next to each other. Let's say our game is made of 64 sprites 25*25 pixels each. You may lay them in one big atlas that is 200*200, or keep them separate in 64 images and load them as such in the game. What I was saying is that an atlas will require adding borders around your sprites so it won't be 200*200 anymore, but something around 232*232 for example. Now you are wasting 32*32 pixels of memory... not that it's a big number at all, but it is definitely not "memory savings" :giggle:


Title: Re: Validate this Architectural Idea
Post by: giantrobotbee on November 10, 2013, 02:57:13 PM
I don't follow. What do you mean by 500 separate images for every mesh? ???

Grouping sprites in an atlas means simply collecting the sprite images into one big image that has them laid out next to each other. Let's say our game is made of 64 sprites 25*25 pixels each. You may lay them in one big atlas that is 200*200, or keep them separate in 64 images and load them as such in the game. What I was saying is that an atlas will require adding borders around your sprites so it won't be 200*200 anymore, but something around 232*232 for example. Now you are wasting 32*32 pixels of memory... not that it's a big number at all, but it is definitely not "memory savings" :giggle:

Right, ok I understand that part. But the alternative to a texture atlas is having separate images for each sprite (2d Quad Mesh, in my terminology here), right? So if I have 100 things on the screen, then my options are 100 different images or one texture atlas. What I'm trying to do is avoid the 100 different images. My use of the word "texture" is the one big atlas (compound image, as you've described) as opposed to unique images per sprite. Surely I'm getting savings in that respect, no?


Title: Re: Validate this Architectural Idea
Post by: Christian Knudsen on November 10, 2013, 03:58:29 PM
I believe uploading one big texture atlas to the GPU is considerably faster than having to upload each individual texture when needed for the relevant mesh.


Title: Re: Validate this Architectural Idea
Post by: wbahnassi on November 10, 2013, 06:42:56 PM
Right, ok I understand that part. But the alternative to a texture atlas is having separate images for each sprite (2d Quad Mesh, in my terminology here), right?

Right.

So if I have 100 things on the screen, then my options are 100 different images or one texture atlas. What I'm trying to do is avoid the 100 different images. My use of the word "texture" is the one big atlas (compound image, as you've described) as opposed to unique images per sprite. Surely I'm getting savings in that respect, no?

Nope. No savings there at all unfortunately :shrug2:  To make things closer, you can try this small experiment: Save a few sprites in .raw format and note down the sum of their file sizes, then group the sprites together in one atlas and save that in one .raw file and note its file size. The result? Both will be equal... This is very close to what you will end up with in the game run-time. No memory savings... just a different way of storing things :)

I believe uploading one big texture atlas to the GPU is considerably faster than having to upload each individual texture when needed for the relevant mesh.

Uploading -as per moving the data to VRAM- is not going to be quite different between 1 large atlas versus many images of equal total size, as the speed of this operation depends on the bus speed and the way you upload the data (best in serial contiguous blocks). Regardless, this operation usually only happens once at start-of-day (or so it should, I hope :) ), which means optimizing for it is probably a waste of dev time as no one will complain why the game takes an additional 2 seconds to launch.

On the other hand, binding the texture to a shader for drawing is almost a noop operation compared to uploading texture data to VRAM. The binding operation won't start to show in your profiler until you repeate it hundreds of times in a single frame, which brings me back to the point of my original note:
If you are drawing about 500 different sprites in a screen, then I wouldn't bother doing such optimization. And even if I'm exceeding 500 different sprites, I'd first consider the XNA approach (batch while possible) as it has no disadvantages compared to the atlas approach.


Title: Re: Validate this Architectural Idea
Post by: Christian Knudsen on November 11, 2013, 05:29:41 AM
I shouldn't have said uploading. I'm only really familiar with OpenGL's immediate mode, and there binding a texture is expensive, so you want to make texture atlases to lower the amount of texture bindings.


Title: Re: Validate this Architectural Idea
Post by: giantrobotbee on November 11, 2013, 06:21:03 AM
Right, ok I understand that part. But the alternative to a texture atlas is having separate images for each sprite (2d Quad Mesh, in my terminology here), right?

Right.

So if I have 100 things on the screen, then my options are 100 different images or one texture atlas. What I'm trying to do is avoid the 100 different images. My use of the word "texture" is the one big atlas (compound image, as you've described) as opposed to unique images per sprite. Surely I'm getting savings in that respect, no?

Nope. No savings there at all unfortunately :shrug2:  To make things closer, you can try this small experiment: Save a few sprites in .raw format and note down the sum of their file sizes, then group the sprites together in one atlas and save that in one .raw file and note its file size. The result? Both will be equal... This is very close to what you will end up with in the game run-time. No memory savings... just a different way of storing things :)

I believe uploading one big texture atlas to the GPU is considerably faster than having to upload each individual texture when needed for the relevant mesh.

Uploading -as per moving the data to VRAM- is not going to be quite different between 1 large atlas versus many images of equal total size, as the speed of this operation depends on the bus speed and the way you upload the data (best in serial contiguous blocks). Regardless, this operation usually only happens once at start-of-day (or so it should, I hope :) ), which means optimizing for it is probably a waste of dev time as no one will complain why the game takes an additional 2 seconds to launch.

On the other hand, binding the texture to a shader for drawing is almost a noop operation compared to uploading texture data to VRAM. The binding operation won't start to show in your profiler until you repeate it hundreds of times in a single frame, which brings me back to the point of my original note:
If you are drawing about 500 different sprites in a screen, then I wouldn't bother doing such optimization. And even if I'm exceeding 500 different sprites, I'd first consider the XNA approach (batch while possible) as it has no disadvantages compared to the atlas approach.

I see. I think I understand what you're saying now. I definitely need to do more research anyways, but I appreciate your clarification. One thing I still see as a problem though is that if I need to actually draw a texture on a mesh by uploading the texture to the shader as a uniform, then for each new texture, won't I need to make another call to glDraw*? I've been led to understand that I'd want to make as few draw calls per frame as possible. If I'm dealing with even 100 unique textures, wouldn't that amount to 100 separate draw calls? Or am I totally misunderstanding how this works?

Also, I am talking about modern programmable pipeline OpenGL.


Title: Re: Validate this Architectural Idea
Post by: wbahnassi on November 11, 2013, 07:59:54 AM
One thing I still see as a problem though is that if I need to actually draw a texture on a mesh by uploading the texture to the shader as a uniform, then for each new texture, won't I need to make another call to glDraw*? I've been led to understand that I'd want to make as few draw calls per frame as possible. If I'm dealing with even 100 unique textures, wouldn't that amount to 100 separate draw calls? Or am I totally misunderstanding how this works?

Also, I am talking about modern programmable pipeline OpenGL.

First, let's use the proper terminology. When you want to use a texture for drawing, you bind it not upload it. Not that I'm pedantic to these kinds of things, but the latter makes it sound too expensive :)

You are right though. 100 completely different textures will require 100 separate draw calls. But 100 draw calls is not a large number at all. You should easily be able to do 500 draw calls and only lose 1-2 milliseconds on the CPU for it (if you lose more than that then you must be doing something terribly wrong with the API).

On the GPU side, the texture atlas approach is definitely faster in performance. But again, it should not save any more than 0.5ms. Is half a millisecond performance loss is really worth the effort of having to pack sprites in an atlas and add borders to them and write a table of their offsets/dimensions in the atlas and lose the ability to use cool addressing modes such as wrap and mirror?

I shouldn't have said uploading. I'm only really familiar with OpenGL's immediate mode, and there binding a texture is expensive, so you want to make texture atlases to lower the amount of texture bindings.

True, binding one texture is definitely faster than doing the thing 500 different times. But keep the scale in mind. It is "expensive" in relation to other API operations such as submitting draw calls and switching states. All these operations basically take nothing on the CPU (they just push a command on a buffer). On the GPU, the "expensive" part comes from the fact that the GPU cannot process further draw commands if a texture switch is involved. So it does something like:

 --> Draw with texture 1
 --> Draw with texture 1 (executes in parallel with the previous)
 --> Draw with texture 1 (executes in parallel with the previous)
 --> Draw with texture 1 (executes in parallel with the previous)
 --> Draw with texture 2 (waits until all previous are done)
 --> Draw with texture 2 (executes in parallel with the previous)
 --> Draw with texture 1 (waits until all previous are done)
 --> Draw with texture 1 (executes in parallel with the previous)

A sprite-based game should never have trouble drawing 100 different textures on the screen at 60 FPS or even 120 FPS. Point is: think practical :coffee:


Title: Re: Validate this Architectural Idea
Post by: Gregg Williams on November 11, 2013, 08:02:30 AM
You're not misunderstanding how it works. Other than the idea that batching saves memory (although it can slightly), but for the most part its just to allow you to not have to do a ton of draw calls and texture binds. Doing the 100 individual texture approach, your also doing 100 draw calls where you send the GPU a single quad or 2 triangles at a time, which is hardly efficient. Although as has been pointed out, if your on a desktop with a decent GPU, this probably isn't a bottle neck, until say the first particle system or so forth..



Title: Re: Validate this Architectural Idea
Post by: giantrobotbee on November 11, 2013, 11:26:45 AM
Awesome. I really appreciate all the feedback. Sorry about the real beginner level stuff here. I realize I'm probably over-architecting a solution here, but now I have some new insight to consider. Thanks again!