Welcome, Guest. Please login or register.

Login with username, password and session length

 
Advanced search

1411587 Posts in 69386 Topics- by 58445 Members - Latest Member: Mansreign

May 06, 2024, 07:24:30 AM

Need hosting? Check out Digital Ocean
(more details in this thread)
TIGSource ForumsDeveloperTechnical (Moderator: ThemsAllTook)OpenGL Matrix 2D Function Override
Pages: [1]
Print
Author Topic: OpenGL Matrix 2D Function Override  (Read 1620 times)
Glaiel-Gamer
Guest
« on: June 07, 2009, 05:21:46 PM »

I took over the openGL matrix stack and overwrote the GL calls with some defines, so that I can eventually route the functions to directX without changing my openGL rendering calls within the engine, and so I can translate my immediate mode rendering to vertex pointer lists and perhaps manually multiply the translations if it's not the identity, considering most of the time I'm only sending 4 coordinates at a time.

This is optimized for 2D, so I basically ignore all Z values, and the "axis of rotation" (just allowing Z-rotation).

Results are promising so far, I got over a 2X speed boost on my benchmark test.

Anything I'm doing blatantly wrong here that could get me even more speed? Compiler optimizations seem to do a pretty good job here

Code:
////////////////////////////////////////////////////////////////////////////////
//Matrix Manipulation///////////////////////////////////////////////////////////

//matrix stack / multiplication
typedef float matrix[2][3];

matrix MatrixStack[32];
#define m MatrixStack[M]
int M = 0;


void glaielTranslatef(float x, float y, float z){ //ignore Z
  //glTranslatef(x, y, z);
  m[0][2] = m[0][0] * x + m[0][1] * y + m[0][2];
  m[1][2] = m[1][0] * x + m[1][1] * y + m[1][2];
}
void glaielScalef(float x, float y, float z){ //ignore Z
  //glScalef(x, y, z);
  m[0][0] *= x;
  m[1][0] *= x;
  /////////////
  m[0][1] *= y;
  m[1][1] *= y;
}
void glaielRotatef(float theta, float x, float y, float z){ //ignore Axis
  //glRotatef(theta, x, y, z);
  if(theta == 0) return;
  theta *= 3.141592653589793f/180.0f;
  float s = std::sin(theta);
  float c = std::cos(theta);
 
  float A = m[0][0];
  float D = m[1][0];
 
  m[0][0] = A*c + m[0][1]*s;
  m[0][1] = m[0][1]*c - A*s;
 
  m[1][0] = D*c + m[1][1]*s;
  m[1][1] = m[1][1]*c - D*s;
}
void glaielLoadIdentity(){
  glLoadIdentity();
  m[0][0] = 1; m[0][1] = 0; m[0][2] = 0;
  m[1][0] = 0; m[1][1] = 1; m[1][2] = 0;
}
void glaielPushMatrix(){
  //glPushMatrix();
  matrix & n = MatrixStack[M++];
  m[0][0] = n[0][0]; m[0][1] = n[0][1]; m[0][2] = n[0][2];
  m[1][0] = n[1][0]; m[1][1] = n[1][1]; m[1][2] = n[1][2];
}
void glaielPopMatrix(){
  //glPopMatrix();
  M--;
}



void glaielBegin(int mode){
  float mat[16] = {m[0][0], m[1][0], 0, 0,
                   m[0][1], m[1][1], 0, 0,
                   0,       0,       1, 0,
                   m[0][2], m[1][2], 0, 1};
  glLoadMatrixf(mat);
  glBegin(mode);
}
void glaielEnd(){
  glEnd();
}

Benchmark:
1000*20*1000 matrix operations, stack going from 0 to 20 depth repeatedly

OVERRIDDEN OPENGL MATRIX TRANSFORMATIONS:
Time: 12423ms

OPENGL MATRIX TRANSFORMATIONS:
Time: 26122ms
Logged
Kaelan
Level 1
*


Malcontent


View Profile WWW
« Reply #1 on: June 07, 2009, 05:36:38 PM »

You might want to use independent translation/scale/rotation values instead of matrices if you're really after speed, but it's hard to say whether that would work without knowing how your rendering architecture works. If you're using a scenegraph of some sort, it should be pretty easy to do, since you don't need a matrix for any node of the graph that isn't actually rendering geometry. This also lets you go without using the matrix stack, which is a good thing since GL's matrix stack tends to be tiny anyway.

Why is your matrix 2x3? GL and D3D's are both 4x4. Dropping columns/rows may have nasty implications later, even if your game is straight 2D. This may be why you're seeing a performance boost.

P.S. why the individual element copies in PushMatrix? You should be able to just use memcpy.
Logged

Glaiel-Gamer
Guest
« Reply #2 on: June 07, 2009, 05:43:55 PM »

You might want to use independent translation/scale/rotation values instead of matrices if you're really after speed, but it's hard to say whether that would work without knowing how your rendering architecture works. If you're using a scenegraph of some sort, it should be pretty easy to do, since you don't need a matrix for any node of the graph that isn't actually rendering geometry. This also lets you go without using the matrix stack, which is a good thing since GL's matrix stack tends to be tiny anyway.
It doesn't work that way for me since I have an odd order of transformations allowing me to specify a pivot for a sprite and stuff, plus I don't see how I can do this without a matrix stack. Besides, you don't see any calls to push/popmatrix in my override do you? I took over the stack so I can make it as large as I want.

Quote
Why is your matrix 2x3? GL and D3D's are both 4x4. Dropping columns/rows may have nasty implications later, even if your game is straight 2D. This may be why you're seeing a performance boost.
2x3 is for 2D, i.e. simplifying it down, the only elements that are can change are
Code:
A B 0 X
C D 0 Y
0 0 1 0
0 0 0 1

so 2x3 seemed like the easiest way to implement this, if you look at glBegin I pad it with the other elements before loading it, which might reduce some speed from having the matrices pre-padded, but it's not a huge deal and makes it easier for me to wrap GL to directx since it's matrices are row-major instead of column-major

Quote
P.S. why the individual element copies in PushMatrix? You should be able to just use memcpy.
is memcpy really faster?

also for load identity, would memset(0) then setting the other two to 1 be faster than what I have?
Logged
Kaelan
Level 1
*


Malcontent


View Profile WWW
« Reply #3 on: June 07, 2009, 05:54:27 PM »

It doesn't work that way for me since I have an odd order of transformations allowing me to specify a pivot for a sprite and stuff, plus I don't see how I can do this without a matrix stack. Besides, you don't see any calls to push/popmatrix in my override do you? I took over the stack so I can make it as large as I want.
That's unfortunate. It seems like in most cases, the only part of the matrix you're going to be using is the translate until you get to an object that's pivoted or scaled, so I would hope that you could get by just storing a translation field and constructing the necessary matrix on the fly. If it works, it works, though!

Quote
2x3 is for 2D, i.e. simplifying it down, the only elements that are can change are
Code:
A B 0 X
C D 0 Y
0 0 1 0
0 0 0 1

so 2x3 seemed like the easiest way to implement this, if you look at glBegin I pad it with the other elements before loading it, which might reduce some speed from having the matrices pre-padded, but it's not a huge deal and makes it easier for me to wrap GL to directx since it's matrices are row-major instead of column-major
Ah, I see. Personally, I'd still lean towards 4x4, since it gives you the ability to utilize public-domain/open-source code for your graphics pipeline instead of having to reimplement everything atop your particular style of matrix. D3D's handedness insanity is a good example of how changing up your math can make it harder to reuse code - look at all the trouble you're going to!

Quote
is memcpy really faster?

also for load identity, would memset(0) then setting the other two to 1 be faster than what I have?
Sorry, I didn't mean to imply that memcpy is faster. It just seems silly to write out the copy by hand when you could do it in one statement with memcpy. Smiley

Memset and memcpy are going to translate into effectively the same code as writing it by hand. However, calling them is much easier than writing out a bunch of array operations, especially since writing out your operations by hand introduces a larger chance for error.

(pedantic footnote: for large copies on the order of kilobytes or megabytes, the standard library implementations of memcpy/memset may use SIMD operations to actually outperform a hand-written copy. in the case of a transform matrix, the number of bytes is small enough to render this irrelevant.)

When dealing with graphics pipelines, correctness is often as important as performance. When you're trying to ship a demo by the end of the day and your graphics pipeline is freaking out in some bizarre manner, the last thing you want to be doing is digging through your math books trying to figure out if your matrix multiplication code is correct.
Logged

skyy
Level 2
**


[ SkyWhy ]


View Profile
« Reply #4 on: June 07, 2009, 07:51:26 PM »

I'd say using the "standard" 4x4 matrix has just minimal overhead anyway. And like said above, allows you to have tighter intergration with "thingies", if ever required, instead of reinventing the wheel all the time.

Either way, it's always fun to tinker around   Gentleman

Logged

BorisTheBrave
Level 10
*****


View Profile WWW
« Reply #5 on: June 08, 2009, 11:44:31 AM »

Memcpy does quite a few neat tricks to be AFAP. For 6 words, it's unlikely to make a difference. FWIW, you can initialize to the identity by memcpying a premade identity matrix.

If you were really so concerned about cutting corners, you'd shave off C and D from your matrices, as D=A and C=-B (for any rotation+translation+scale). Down from 16 floats to 4, woo.

But don't do this. I'm a teensy bit concerned that this may be premature optimization. You may have doubled your speed (lack of inlining perhaps?), but that's pointless until you show that your program spends a substantial amount of time on this matrix stack. Particularly after you factor in your eventual conversion to 4x4 matrices to pass to the drawing API.
Logged
Glaiel-Gamer
Guest
« Reply #6 on: June 08, 2009, 11:49:54 AM »

If you were really so concerned about cutting corners, you'd shave off C and D from your matrices, as D=A and C=-B (for any rotation+translation+scale). Down from 16 floats to 4, woo.
Didn't know that, thanks, although isn't that assuming my scales are uniform?

Quote
But don't do this. I'm a teensy bit concerned that this may be premature optimization. You may have doubled your speed (lack of inlining perhaps?), but that's pointless until you show that your program spends a substantial amount of time on this matrix stack. Particularly after you factor in your eventual conversion to 4x4 matrices to pass to the drawing API.
Well this wasn't 100% for optimization purposes, I'm preparing to make porting to D3D as easy as possible, which means overriding some of the openGL functions which behave different from D3D.

Note: I don't know D3D yet.
Logged
BorisTheBrave
Level 10
*****


View Profile WWW
« Reply #7 on: June 08, 2009, 12:46:05 PM »

Didn't know that, thanks, although isn't that assuming my scales are uniform?
Yeah. Most people don't go for non-uniform scales, so I assumed not you either. Sorry.
Logged
Pages: [1]
Print
Jump to:  

Theme orange-lt created by panic