Welcome, Guest. Please login or register.

Login with username, password and session length

 
Advanced search

1411274 Posts in 69323 Topics- by 58380 Members - Latest Member: bob1029

March 28, 2024, 03:32:07 AM

Need hosting? Check out Digital Ocean
(more details in this thread)
TIGSource ForumsDeveloperTechnical (Moderator: ThemsAllTook)Generating MipMaps on the GPU, GLES2 iOS
Pages: [1] 2
Print
Author Topic: Generating MipMaps on the GPU, GLES2 iOS  (Read 6722 times)
PompiPompi
Level 10
*****



View Profile WWW
« on: March 14, 2013, 11:26:10 AM »

Is it possible to generate mipmaps faster than glGenerateMipMaps on OpenGLES2 in iOS? For instance by doing this in a fragment shader?
I assume glGenerateMipMaps is computed on the CPU, since otherwise it might stall the GPU just to generate mipmaps, which is bad.
I can just try and see, but I know it will take time because of poorly documented gl function and the unclear side effect C nature of Open GL.

Have you done this before?
Logged

Master of all trades.
ThemsAllTook
Administrator
Level 10
******



View Profile WWW
« Reply #1 on: March 14, 2013, 11:43:50 AM »

Is it possible to generate mipmaps faster than glGenerateMipMaps on OpenGLES2 in iOS?

Pregenerating them for all of your textures and uploading each mipmap level at texture upload time might be faster.
Logged

PompiPompi
Level 10
*****



View Profile WWW
« Reply #2 on: March 15, 2013, 03:21:19 AM »

You are answering a question I did not ask, so it doesn't help me much.
Lets just say that I cannot do any offline calculations.
Logged

Master of all trades.
ham and brie
Level 3
***



View Profile
« Reply #3 on: March 15, 2013, 01:58:02 PM »

I assume glGenerateMipMaps is computed on the CPU, since otherwise it might stall the GPU just to generate mipmaps, which is bad.

I'd actually hope it would be done on the GPU. Especially if I was doing something like generating mipmaps for a texture created each frame by drawing to an FBO. Profiling on an iPad and iPod touch, it does seem to be.
Logged
PompiPompi
Level 10
*****



View Profile WWW
« Reply #4 on: March 16, 2013, 02:27:23 AM »

It's very unlikely it's being done on the GPU.
If it's done on the GPU it means OpenGL needs to either constantly allocate texture for this kind of work or allocate a new texture each time glGenerateMipmaps is called.
It also means that OpenGL needs to compile shaders at some point of the program which means memory and CPU time resources.
It also means you cannot generate mipmaps in parallel to other openGL operations that are being processes on the same thread.
What makes you think it's being done on the GPU?

Edit: it is also possible the GenerateMipmaps happen on a seperate thread but not on the GPU. Which makes you think like it's on the GPU because the command doesn't wait when processed on the thread you call it.
Logged

Master of all trades.
Polly
Level 6
*



View Profile
« Reply #5 on: March 16, 2013, 03:39:14 AM »

Useful article http://www.g-truc.net/post-0256.html
Logged
ham and brie
Level 3
***



View Profile
« Reply #6 on: March 16, 2013, 04:30:53 AM »

What makes you think it's being done on the GPU?

Because profiling you can see CPU and GPU utilisation.

...It has just occurred to me to see what difference a non-square texture makes.

On my 3rd gen iPod touch (older device, supports OpenGL ES 2, single core CPU), generating mipmaps for 512x512 texture can be done at 60fps. GPU utilisation is around 44% and CPU is low. If I change it to 512x256 (area halved), then it slows to less than 20fps, GPU utilisation is low, CPU is the clear bottleneck (nearly all time spent in glGenerateMipmap()).

Similar on my 3rd gen iPad: it can manage 2048x2048 at 60fps, but 512x256 is under 30, bottlenecked by the CPU.

So, looks like if the GPU can do the work, it is much faster at doing it (which is why I hoped it would be), but it won't do non-square textures.
Logged
PompiPompi
Level 10
*****



View Profile WWW
« Reply #7 on: March 16, 2013, 07:51:45 AM »

Err, I don't see from your "benchmark" that the mipmaps are generated on the GPU.
Are you just putting a glGenerateMipMaps inside a loop?
If you put something inside a loop than you might get 100% CPU usage no matter what you put inside the loop, unless you are blocking.
Getting 20 fps for half the texture instead of 60 fps sounds very wrong.Sounds like you are doing it wrong.

Polly: Except for one bullet point saying "hardware accelration" I don't see any mention that it is actually done on the GPU.
Logged

Master of all trades.
ham and brie
Level 3
***



View Profile
« Reply #8 on: March 16, 2013, 09:31:46 AM »

Are you just putting a glGenerateMipMaps inside a loop?
I am using it once each frame.

Each frame:
I use a framebuffer object to clear the top layer of the texture to a colour which varies each frame
I use glGenerateMipmap()
I use the texture to colour a rotating cube

Quote
Getting 20 fps for half the texture instead of 60 fps sounds very wrong.Sounds like you are doing it wrong.
That result makes good sense. What I saw from the profiler was consistent with the GPU taking the load of generating mipmaps when the texture is square. The GPU device utilisation is high, CPU load is fairly low. Changing the texture size (but keeping it square) affects the GPU load. If I try to use a 1024x1024 texture on my iPod touch, the framerate drops, GPU device utilisation is up to 100%, while CPU even goes down (because with fewer frames drawn it has even less to do).

When the texture is not square (but still power-of-two) then the load is clearly on the CPU, and the CPU is much slower at doing the work.

You seem to want to believe your guesswork and assumptions over someone who has got evidence from trying it in practice.
Logged
PompiPompi
Level 10
*****



View Profile WWW
« Reply #9 on: March 16, 2013, 09:45:10 AM »

Correlation does not imply causation.

Seems retarded that generating mipmaps on the GPU will be only possible for square textures. I could easily implement a pow2 GPU mipmap generation for both square and non square.
Logged

Master of all trades.
ham and brie
Level 3
***



View Profile
« Reply #10 on: March 16, 2013, 10:23:32 AM »

Correlation does not imply causation.
Do you use that line to ignore any measurements that don't fit your guesswork? Do you do any profiling?

Quote
Seems retarded that generating mipmaps on the GPU will be only possible for square textures. I could easily implement a pow2 GPU mipmap generation for both square and non square.
It's not really a matter of whether it's easy to write in code. Restrictions such as something only working quickly in hardware for square textures are usually to keep the chip simpler.
Logged
PompiPompi
Level 10
*****



View Profile WWW
« Reply #11 on: March 16, 2013, 10:53:42 AM »

I think you don't know what you are talking about. You measured a specific scenario that involves several things, it doesn't isolate glGenerateMipmaps.
Your explainations of the cause from the very simple data you collected are misleading rationalizations.
Anyway, whatever... I will keep looking for the answer myself.
Logged

Master of all trades.
Columbo
Level 0
***


View Profile
« Reply #12 on: March 16, 2013, 03:05:03 PM »

I think you don't know what you are talking about. You measured a specific scenario that involves several things, it doesn't isolate glGenerateMipmaps.
Your explainations of the cause from the very simple data you collected are misleading rationalizations.
Anyway, whatever... I will keep looking for the answer myself.

Wow - that's a pretty rude response to someone who's taken the time to help with a question you've posted. Sounds to me like Ham and Brie at least knows to measure, not assume for one and has some experience in interpreting profile results.

If, with a non-square texture he's seeing a load of CPU time in glGenerateMipmaps, and with a square texture he's seeing little CPU time in glGenerateMipmaps and a measurable increase in GPU utilization, then that's compelling evidence that some work is being done on the GPU.

Also, I'm not sure your reasoning for why a GPU implementation is so unlikely is sound. I don't see any obstacle to the driver allocating memory, compiling shaders and inserting the necessary items into the command list to generate the mipmaps. Any GPU stalls/flushes that it'll have to insert between the GPU mipmap generation and the first use of the texture by the GPU would be relatively small (remember this is a platform where shaders are compiled lazily, so avoiding stalls obviously isn't particularly high on the driver writer's priorities). Nor would I be that surprised that the driver writers only bothered handling the most common case and fall-back onto the CPU implementation for non-square textures.

Still, if you find any evidence to the contrary or if you find you can significantly beat glGenerateMipmaps, that'd be interesting.
Logged

PompiPompi
Level 10
*****



View Profile WWW
« Reply #13 on: March 17, 2013, 10:32:19 AM »

I don't have any evidence because the OpenGL driver is a black box to me.
Unless I will find documents online detailing about the implementatiopn of GLES2 on iOS, or unless I have access to the source code I really can't tell that much.

Your measurements are equivalent to measuring which sport car is faster by measuring their engine cycle on neutral.
Logged

Master of all trades.
ham and brie
Level 3
***



View Profile
« Reply #14 on: March 17, 2013, 04:21:27 PM »

Not remotely an accurate analogy. I used profiling to answer the question of whether the load was on the GPU or CPU and show how the load on each varied with texture size. That is very much the kind of question the instrumentation is able to answer.

Perhaps it would be easier for you to believe if you could try it for yourself?

I'm using XCode 4.6.1

Create a project using the iOS OpenGL game as the stating point. Use Mipmap as the class prefix.

Replace the contents of the MipmapViewController.m file with:
Code:
#import "MipmapViewController.h"

@interface MipmapViewController () {}
@property (strong, nonatomic) EAGLContext *context;
@end
@implementation MipmapViewController

- (void)viewDidLoad
{
    [super viewDidLoad];
    self.context = [[EAGLContext alloc] initWithAPI:kEAGLRenderingAPIOpenGLES2];
    self.preferredFramesPerSecond = 60;
    GLKView *view = (GLKView *)self.view;
    view.context = self.context;
    [EAGLContext setCurrentContext:self.context];

    const GLsizei texture_width = ( 1 << 10 );
    const GLsizei texture_height = ( 1 << 10 );

    GLuint texture;
    glGenTextures(1, &texture);
    glBindTexture(GL_TEXTURE_2D, texture);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, texture_width, texture_height, 0, GL_RGBA, GL_UNSIGNED_BYTE, NULL);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
}

- (void)glkView:(GLKView *)view drawInRect:(CGRect)rect
{
    glGenerateMipmap(GL_TEXTURE_2D);
    glClear(GL_COLOR_BUFFER_BIT);
}

@end

Since you felt I hadn't isolated glGenerateMipmap(), this is cut down to just calling glGenerateMipmap() and clearing the screen each frame.

You'll need a device attached to run on.

Start profiling by pressing ⌘I and it should come up with a window that lets you choose OpenGL ES Driver from in the section Graphics.

The "Instruments" program should start.

Next to the chart for the "OpenGL ES Driver" there should be an "i" in a circle. Click it, then on the configure button. There should be a list of possible "statistics to list" to select from. Toggle on "Device Utilization %", click done. Under "Statistics to Observe" there should now be a "Device Utilization %" to toggle on.

You can stop and restart profiling by clicking the red circle button.

The textual tables are more useful than the graphs. If you select the OpenGL ES Driver instrument you should see a table with Device Utilization % as one of the columns. If you select the Time Profiler instrument you can see in the call tree the CPU time used and break it down by calls. You can compare the CPU time with the time you've run for.

You can leave Instruments open, modify the code (such as changing the texture size) and then run the new version to profile it by pressing ⌘I. Instruments will keep measurements from previous runs so you can compare.
Logged
PompiPompi
Level 10
*****



View Profile WWW
« Reply #15 on: March 17, 2013, 09:30:58 PM »

Ok thanks.
Well you are still bound by the GLView's frame rate, so your test still doesn't isolate glGenerateMipmaps.
Perhaps a more plausible test would be to create and fill as many possible textures as possible of the same size, then sleep a bit, then measure the overall time it takes to perform glGenerateMipMaps on all the textures.
Assuming that they didn't "optimize" glGenerateMipMaps to happen only when the texture is actually used, it might give a better result.
Maybe simply binding the texture would make it generate the mipmaps in case they are lazy.

It seems odd that it would use GPU implementation for square textures and CPU for non square, because that would defeat the purpose of making an effort to optimize for square textures. It's like making the job half done, it doesn't help much.
Logged

Master of all trades.
ham and brie
Level 3
***



View Profile
« Reply #16 on: March 17, 2013, 11:33:31 PM »

Well you are still bound by the GLView's frame rate, so your test still doesn't isolate glGenerateMipmaps.
This doesn't make sense. You can comment out glGenerateMipmap() and see how that changes the work done. What you are looking for is whether the call to glGenerateMipmap() adds to the GPU utilisation or the CPU time. This test program does answer the question of whether the work is being done by the GPU or CPU.
Quote
Perhaps a more plausible test would be to create and fill as many possible textures as possible of the same size, then sleep a bit, then measure the overall time it takes to perform glGenerateMipMaps on all the textures.
That isn't really any more "plausible" and it makes it a bit harder to see whether the work was done by the GPU.
Quote
It seems odd that it would use GPU implementation for square textures and CPU for non square, because that would defeat the purpose of making an effort to optimize for square textures. It's like making the job half done, it doesn't help much.
This sort of thing is common when you are using a GPU, which is why it ocurred to me to try non-square. Some things the hardware will do quickly. Some things it won't. It makes mipmap generation for square textures many times faster, so it's not true to say it doesn't help much. Where it's important that glGenerateMipmap() is fast, you can make the texture square.
Logged
PompiPompi
Level 10
*****



View Profile WWW
« Reply #17 on: March 18, 2013, 12:09:39 AM »

You see, the thing is... when you draw to GLView and bound it to 60 FPS, the CPU time will increase because it is stalled and waiting for the GPU to synch to 60 FPS!
Logged

Master of all trades.
ham and brie
Level 3
***



View Profile
« Reply #18 on: March 18, 2013, 02:33:45 AM »

Though there is a bit of CPU work used to do the screen update loop, when the CPU doesn't have work to do the vast majority of the time between frames does not count as CPU time.

Anyway, the main point of the test is to see whether the GPU is doing the work. The way the measurements given about the GPU from the profiler works it's better to give the GPU work each frame rather than try to give it just a single lump.
Logged
PompiPompi
Level 10
*****



View Profile WWW
« Reply #19 on: March 18, 2013, 11:05:51 AM »

You measure the time the CPU waits for the GPU, not the actual time the CPU works. What is so hard to understand?
Logged

Master of all trades.
Pages: [1] 2
Print
Jump to:  

Theme orange-lt created by panic