Welcome, Guest. Please login or register.

Login with username, password and session length

 
Advanced search

1411518 Posts in 69380 Topics- by 58436 Members - Latest Member: GlitchyPSI

May 01, 2024, 12:39:26 PM

Need hosting? Check out Digital Ocean
(more details in this thread)
TIGSource ForumsDeveloperTechnical (Moderator: ThemsAllTook)What is the fastest way to store VBO content?
Pages: [1]
Print
Author Topic: What is the fastest way to store VBO content?  (Read 4881 times)
digitalgibs
Level 0
**



View Profile
« on: April 29, 2010, 05:23:46 PM »

I am looking to store my vertex data into a VBO, but I am wondering what the best approach would be.  Originally I created a vertex format that was more interlaced.

Code:
struct InterlacedVertex
{
vec3f pos;
vec3f normal, tan, binormal;
vec2f texcoord;
byte color[4];
};

This worked okay, but it was pretty limiting.  I couldn't use more than 1 set of texture coordinates, and my animation code was forced to upload the entire vertex array since there was no easy way to only update a subset <pos, normal, tan, binormal>.

Now I am thinking about creating separate arrays for each component, but this seems like it would be a cache nightmare.

VBO Memory Layout:
N = number of verts
M = number of uv sets
[N points][N normals][N tangents][N colors][N*M texcoords]

  • CON - for each vertex, the driver will have to sample across huge spans of memory...
  • PRO - can update subsets of the model without having to re-upload unchanged data.
  • PRO - can support multiple texture coordinates, now that it is not interlaced.

Code:
std::vector<vec3f> points;
std::vector<Tangents> normals;
std::vector<vec2f> texcoords;
std::vector<VertexColor> colors;
Logged
David Pittman
Level 2
**


MAEK GAEM


View Profile WWW
« Reply #1 on: April 30, 2010, 08:20:01 AM »

Are you implying that those four std::vectors would be of different lengths (or that a single vertex might have a different index of each)? Because AFAIK you can't use separate indices for different elements of a vertex.
Logged

digitalgibs
Level 0
**



View Profile
« Reply #2 on: May 01, 2010, 06:36:39 PM »

Yes all the arrays would be of the same length.  The only difference is that each type (pos, normal, etc) would be stored as a packed array in the VBO and not interlaced.  I just wasn't sure if that was smart...

I'm curious if anyone out there has tried both ways and noticed any performance differences.  I'll most likely be using it for a lot of dynamic geometry.
Logged
zacaj
Level 3
***


void main()


View Profile WWW
« Reply #3 on: May 03, 2010, 04:46:56 AM »

I havent tried it myself, but, at least on the iPhone, interlaced is supposed to be faster
Logged

My twitter: @zacaj_

Quote from: mcc
Well let's just take a look at this "getting started" page and see--
Quote
Download and install cmake
Noooooooo
westquote
Level 0
**


I make games in Philly. How rare!


View Profile WWW
« Reply #4 on: May 03, 2010, 08:45:57 PM »

I think you mean 'interleaved', not 'interlaced'.

Interleaved data should generally be more efficient with regards to cache coherency/cache misses.  What I do in my codebase is define a runtime structure that maps components to offsets within vertex buffers.  This allows me to interleave the data all I want, or break it into separate buffers, or any combination thereof.  I tend to put all my static components (those that do not change every frame) into a single interleaved buffer, and all my dynamic components (those that will change every frame) into a single different interleaved buffer.  This lets me take advantage of the one array being cached to the hardware, which is a nice bandwidth savings, while still interleaving the data that is going to change every tick.

Remember that OpenGL provides you with all these options because there is no globally-optimal choice on all hardware for all use cases.  Particles, sprites, static meshes, dynamic geometry, skeletally-animated meshes, etc... all represent different optimization spaces, and you will most likely find that the best data regarding optimization is the data you gather yourself.  I certainly would not trust anyone else's data to the point of architecting my graphics engine around it without verifying their findings locally first.

As for all current iPhone hardware, it is confusing to say that interleaved VBO's are "supposed to be faster."  VBO's do not offer you any performance gain on the iPhone over normal vertex arrays, as they are not cached in video memory.  Here is an analysis that breaks this fact down fairly clearly:

http://blackpixel.com/blog/399/iphone-vertex-buffer-object-performance/

VBO's aside, as I mentioned above, the performance gains for a given use case can't be broken down into a heuristic as simple as "always use interleaved VBO's".  Also, the performance gains associated with interleaving are unlikely to be truly significant except in very specific situations.

I recommend the following page to anyone interested in getting off to a good start in learning about vertex specifications:

http://www.opengl.org/wiki/Vertex_Specification_Best_Practices

Do post your findings as you continue your experiments! Smiley
Logged

Twitter: @westquote - Webpage: Final Form Games
Will Vale
Level 4
****



View Profile WWW
« Reply #5 on: May 04, 2010, 03:41:00 PM »

One pattern I've seen quite often is splitting your vertex data into a "position only" stream and an "everything else" stream. This makes it easy to do cheaper rendering for shadow maps, depth prime, etc. since you can just bind the positions in that case.

More generally, the hardware companies provide advice on this kind of thing - ATI/AMD and NVidia have optimisation guides on their developer sites. That'd be my first port of call. I seem to recall one common point was to fit your vertex size to a multiple of the IA cache line size (and keep it as small as possible using compressed components) to avoid overfetching.

HTH,

Will
Logged
Pages: [1]
Print
Jump to:  

Theme orange-lt created by panic