The glue between differently sorted arrays in data oriented programming (structs of arrays) for optimal locality of reference?
First of all let me say it's for educational purposes. This topic is about optimization of data for locality of reference (CPU caching) and CPU branch prediction.
I'm not trying to optimize prematurely without bottlenecks so please let's leave that kind of comments aside
I'm using C++11 but will write examples in pseudo-code. If something doesn't make sense, please point it out!
Data oriented programming uses separate arrays to represent objects components, to improve CPU caching and prediction by keeping related data compact and in the cache.
But sometimes I'll need differently ordered (sorted) components for different optimizations. E.g. a particle with update data and render data.
If I sort at insertion of new particles by time of death, I can skip a lot of
branch mispredictions while updating
array updateData = [alive] + [just died] + [dead]
now = time()
for(i = 0 to nAlive) { //mostly true
part = updateData[i] //sequential array access, cached and fast
if(part.timeOfDeath > now) { //mostly true
part.update();
}
else { //mispredict once
for(j = i to nAlive) { //up till nAlive are dead
destroy(j);
}
nAlive = i;
break;
}
}
but if I want to keep a first-in depth-ordered rendering (vertices) to avoid flicker - they might die in a different order
(I can't use an OpenGL depth-buffer mostly because alpha blending won't work that way)
array renderData = [visible (depth ordered)] + [invisible]
(also kept by visible/invisible ordering to avoid branch mispredictions)(EDIT: added to clarify)for(i = 0 to nAlive) { //mostly true
render(renderData[i])
}
Sometimes I might not even know the black-box internal ordering of data, e.g. Box2D pools it's b2Body objects by itself, and I cant predict how the internal allocation sorts them. Also would need to update vertex information in a specific format, convenient to the graphics card.
So my problem is, I can use an index to map from update to render data (or vice versa), but at some point, when using one data to update the other (e.g. physics modifying vertex data) it might start jumping all around in memory.
Also this "index" to map between them would need to be kept updated on removal (death of particles) if the arrays are to remain compact and sorted by different criteria.
(EDIT: added to clarify)//Call between update and render...
for(i = 0 to nAlive) { //mostly true
udata = updateData[i];
rdata = renderData[udata.renderIndex]; //different order, memory jumps
rdata.vertex = udata.position;
rdata.color = udata.color;
//...
}
How is this handled in frameworks and engines? Which trade-off is better?Branch mispredictions on alive/death checks and keep both arrays matching in depth order?
Or jumping around in memory while updating the other array data? (can't see other solution for black-box frameworks)
I'd greatly appreciate if you shed some light on this.
I must admit that even after I've been reading a lot of CPU caching and prediction, I don't completely understand it yet nor have the empirical knowledge to apply it in more complex scenarios.
Thanks for your time.