It doesn't work that way for me since I have an odd order of transformations allowing me to specify a pivot for a sprite and stuff, plus I don't see how I can do this without a matrix stack. Besides, you don't see any calls to push/popmatrix in my override do you? I took over the stack so I can make it as large as I want.
That's unfortunate. It seems like in most cases, the only part of the matrix you're going to be using is the translate until you get to an object that's pivoted or scaled, so I would hope that you could get by just storing a translation field and constructing the necessary matrix on the fly. If it works, it works, though!
2x3 is for 2D, i.e. simplifying it down, the only elements that are can change are
A B 0 X
C D 0 Y
0 0 1 0
0 0 0 1
so 2x3 seemed like the easiest way to implement this, if you look at glBegin I pad it with the other elements before loading it, which might reduce some speed from having the matrices pre-padded, but it's not a huge deal and makes it easier for me to wrap GL to directx since it's matrices are row-major instead of column-major
Ah, I see. Personally, I'd still lean towards 4x4, since it gives you the ability to utilize public-domain/open-source code for your graphics pipeline instead of having to reimplement everything atop your particular style of matrix. D3D's handedness insanity is a good example of how changing up your math can make it harder to reuse code - look at all the trouble you're going to!
is memcpy really faster?
also for load identity, would memset(0) then setting the other two to 1 be faster than what I have?
Sorry, I didn't mean to imply that memcpy is faster. It just seems silly to write out the copy by hand when you could do it in one statement with memcpy.
Memset and memcpy are going to translate into effectively the same code as writing it by hand. However, calling them is much easier than writing out a bunch of array operations, especially since writing out your operations by hand introduces a larger chance for error.
(pedantic footnote: for large copies on the order of kilobytes or megabytes, the standard library implementations of memcpy/memset may use SIMD operations to actually outperform a hand-written copy. in the case of a transform matrix, the number of bytes is small enough to render this irrelevant.)
When dealing with graphics pipelines, correctness is often as important as performance. When you're trying to ship a demo by the end of the day and your graphics pipeline is freaking out in some bizarre manner, the last thing you want to be doing is digging through your math books trying to figure out if your matrix multiplication code is correct.