Well you are still bound by the GLView's frame rate, so your test still doesn't isolate glGenerateMipmaps.
This doesn't make sense. You can comment out glGenerateMipmap() and see how that changes the work done. What you are looking for is whether the call to glGenerateMipmap() adds to the GPU utilisation or the CPU time. This test program does answer the question of whether the work is being done by the GPU or CPU.
Perhaps a more plausible test would be to create and fill as many possible textures as possible of the same size, then sleep a bit, then measure the overall time it takes to perform glGenerateMipMaps on all the textures.
That isn't really any more "plausible" and it makes it a bit harder to see whether the work was done by the GPU.
It seems odd that it would use GPU implementation for square textures and CPU for non square, because that would defeat the purpose of making an effort to optimize for square textures. It's like making the job half done, it doesn't help much.
This sort of thing is common when you are using a GPU, which is why it ocurred to me to try non-square. Some things the hardware will do quickly. Some things it won't. It makes mipmap generation for square textures many times faster, so it's not true to say it doesn't help much. Where it's important that glGenerateMipmap() is fast, you can make the texture square.