Glaiel-Gamer
Guest
|
|
« on: March 27, 2009, 08:41:09 PM » |
|
I'm gonna consolidate all my opengl questions in here from now. This one is slightly faster for large numbers of sprites, but slightly slower for small numbers (40 = small, 400 = large): void Texture::bind(){ glBindTexture(GL_TEXTURE_2D, tex); }
as opposed to this one: void Texture::bind(){ if(cbound != this){ glBindTexture(GL_TEXTURE_2D, tex); cbound = this; } }
anyway as of now I'm not sorting sprites in terms of their currently bound texture. Any suggestions? Should I splice all the textures into one massively large texture and not worry about rebinding them? Or will consciously sorting the order of which textures are swapped solve most lag problems? (stress test = rendering 400 animated sprites, then doing a 4 pass post process shader)
|
|
« Last Edit: March 31, 2009, 04:17:37 PM by Glaiel-Gamer »
|
Logged
|
|
|
|
Glaiel-Gamer
Guest
|
|
« Reply #1 on: March 27, 2009, 09:24:32 PM » |
|
keeping them on the same texture actually slows it down for some reason
|
|
|
Logged
|
|
|
|
nihilocrat
|
|
« Reply #2 on: March 28, 2009, 10:48:28 AM » |
|
The guy who made the Canvas plugin for OGRE3D uses an "Atlas", where he uses some sort of packing algorithm to store all the sprites needed onto a single texture, and simply slices out the regions he needs for each particular sprite. He also does something to alleviate the need for material-switching, keeping batch count to one, but that's probably a higher-level OGRE-specific thing. I'm also really confused why the one-texture approach isn't faster. I think the best way of optimizing OpenGL is to reduce the amount of communication between CPU and GPU to the smallest possible, but I'm not a guru or anything. This means, vertex buffers, batching, etc.
|
|
|
Logged
|
|
|
|
Glaiel-Gamer
Guest
|
|
« Reply #3 on: March 28, 2009, 12:01:18 PM » |
|
well as of currently I can set up my texture objects to have a "crop" rectangle which wraps 0-1 to that crop area so it wouldn't be too much of a stretch to have each "texture" reference the same texture, and it would propagate up through the rest of the program.
But it seems useless if it isn't gonna actually speed up much.
|
|
|
Logged
|
|
|
|
Saint
|
|
« Reply #4 on: March 28, 2009, 12:16:37 PM » |
|
keeping them on the same texture actually slows it down for some reason
If you have a very large texture instead of several small, you might have to do more fetches as the memory isn't laid out in a way that's optimized for the texture cache; meaning the card will have to access the memory several times and copy parts of the texture to cache while it might be able to fit an entire smaller texture in the cache and thus only needs to copy once. Accessing memory takes time, so this is likely why you see a slowdown. Sorting the drawcalls by textures is a good idea. Trying to keep your state changes to a minimum will also help.
|
|
|
Logged
|
|
|
|
Glaiel-Gamer
Guest
|
|
« Reply #5 on: March 28, 2009, 12:59:48 PM » |
|
If I was sorting by texture, I'd have to enable the z-buffer to keep my layering correct. Is using the z-buffer slower or faster than switching texture states more times than necessary?
If I bind an already bound texture, is there overhead in that or should I manually check for that?
|
|
|
Logged
|
|
|
|
Saint
|
|
« Reply #6 on: March 28, 2009, 01:14:49 PM » |
|
If I was sorting by texture, I'd have to enable the z-buffer to keep my layering correct. Is using the z-buffer slower or faster than switching texture states more times than necessary?
If I bind an already bound texture, is there overhead in that or should I manually check for that?
Using the Z buffer is likely somewhat slower since that's a per-pixel test, and you also need to push an additional coordinate for each vertex resulting in lower cache efficiency. It depends on your GPU and drivers though, as most of these things do. Yes, there is an overhead for binding an already bound texture, but at the same time there's an overhead for checking for it so it depends a lot on how often you expect the test to fail. This is also highly dependent on drivers. I would suggest doing as you have already done and simply test it with the content you will be using.
|
|
|
Logged
|
|
|
|
Snakey
|
|
« Reply #7 on: March 28, 2009, 01:23:50 PM » |
|
It depends on how you are drawing your actual sprites. There are three primary methods of drawing quads onto the screen for example.
1. Sending GL commands 2. Precompiling GL commands into call lists, then using call lists per frame 3. Using extensions
Method 1 is the slowest but is the most flexible. It's slow because you're sending the commands to the gpu per frame. The gpu has no chance of really caching anything. Adopting an atlas method within this method is pretty easy.
Method 2 is quite a lot faster than method 1 since you're precompiling the commands into a list on the gpu and then just invoking it via the call list call. Adopting an atlas method is a bit harder since you can't modify the call list easily (modifying the call list per frame is sort of pointless).
Method 3 is pretty much as fast as you're going to get since you are then dealing directly with memory i/o, particularly if you're using vertex buffer objects with or without shaders. Vertex buffer objects with vertex shaders can adopt the atlas method really easily, provided the gpu has said extensions.
It is preferable to batch as much as you can rather doing a lot of pointless checking. For example, if the renderer wanted to render five sprites, using texture binds 1, 3, 1, 1, 5, it is much better to batch like so, 1, 1, 1, 3, 5. Since you're batching, theres no need to check if you've already bound to a number, plus you'll never rebind to the same texture id again. Using a simplistic checking system won't stop the system from rebinding to the same texture if the follow patterns or random patterns. Batching will.
At the end of the day, reducing the number of GL commands is the best optimization. Texture binding isn't a frame killer as it used to be now, but even then batching + reduction of GL commands is going to be the best thing you can do.
As for the z-buffer, since thats primarily dealt by the gpu, it's hardware accelerated ... so I doubt it has any performance impact at all.
|
|
|
Logged
|
I like turtles.
|
|
|
Glaiel-Gamer
Guest
|
|
« Reply #8 on: March 31, 2009, 04:18:05 PM » |
|
Anyone know the math behind GL_LINEAR for magnification? Also some links on how to use glReadPixels properly would be appreciated.
|
|
|
Logged
|
|
|
|
Glaiel-Gamer
Guest
|
|
« Reply #9 on: March 31, 2009, 06:58:45 PM » |
|
Anyone know the math behind GL_LINEAR for magnification? Also some links on how to use glReadPixels properly would be appreciated.
Solved Anyone have a good 2-pass bloom shader? The one I have isn't really very good
|
|
|
Logged
|
|
|
|
Glaiel-Gamer
Guest
|
|
« Reply #10 on: March 31, 2009, 08:44:31 PM » |
|
Anyone know the math behind GL_LINEAR for magnification? Also some links on how to use glReadPixels properly would be appreciated.
Solved Anyone have a good 2-pass bloom shader? The one I have isn't really very good Solved it :D
|
|
|
Logged
|
|
|
|
mcc
|
|
« Reply #11 on: March 31, 2009, 09:44:02 PM » |
|
I'm not sure what's happening in that screenshot, but whatever it is it's attractive
|
|
|
Logged
|
|
|
|
havchr
|
|
« Reply #12 on: April 01, 2009, 05:23:00 AM » |
|
Here's my performance tips: - figure out what is outside of your view(do this step fast) and don't send that as opengl-commands to render. - sort your list of things to render, front-to-back for early-z-cull. in my engine I have a "layer number", which describes the overall rendering-order. this allows me to render a set of transparent sprites at the very end, so they blend. - only calculate new positions in a scenegraph if you need. - Use Vertex Buffer Objects (search for opengl VBO) or display lists if you are not using that for rendering, you are lazy and code runs slow - Batch together drawcalls. Drawing a million boxes with a million drawcalls is slow,drawing a million boxes with one drawcall, is fast.. If you want the boxes to animate independantly store an ID-per-vertex and use that in the vertex-shader to fuck your million boxes up...
|
|
|
Logged
|
Pizza is delicious.
|
|
|
Oddball
|
|
« Reply #13 on: April 01, 2009, 02:52:46 PM » |
|
- Use Vertex Buffer Objects (search for opengl VBO) or display lists if you are not using that for rendering, you are lazy and code runs slow In my experience VBOs are slower than Vertex Arrays on systems with shared video memory. Not an issue if you're aiming at the hard core market, but most casual gamers have intergrated graphics with shared memory. Just something to consider.
|
|
|
Logged
|
|
|
|
|