Welcome, Guest. Please login or register.

Login with username, password and session length

 
Advanced search

1411283 Posts in 69325 Topics- by 58380 Members - Latest Member: bob1029

March 29, 2024, 06:44:27 AM

Need hosting? Check out Digital Ocean
(more details in this thread)
TIGSource ForumsDeveloperTechnical (Moderator: ThemsAllTook)General thread for quick questions
Pages: 1 ... 47 48 [49] 50 51 ... 69
Print
Author Topic: General thread for quick questions  (Read 132730 times)
BorisTheBrave
Level 10
*****


View Profile WWW
« Reply #960 on: February 18, 2017, 03:57:40 PM »

Ugh...hashmaps are such a pain. I found another perfect non-minimal hashmap implementation, but it frequemtly failed for small inputs. Tried a non-perfect hash function, but the lookup table was huge and I had to dynamically reallocate buckets to resolve collisions.

In the end I just settled with qsort + binary search for log(n) accessal. Not optimal, but it allows me to verify if entity names exist and I can replace it later on without affecting the rest of my code.
I simply don't believe that a bog standard hash map function was too slow for your use case. I think you are over complicating things, or trying to write too much from scratch.
Logged
oahda
Level 10
*****



View Profile
« Reply #961 on: February 19, 2017, 03:02:29 AM »

So how do people deal with serialisation when it comes to passing data between 32-bit and 64-bit systems, a serialisation that might have been produced on one system and then read on another? Might it be as simple as sticking to types of known size (in C++, like using std::int32_t instead of int and avoiding using those that are not available on both) or does it get more involved than that?
Logged

JWki
Level 4
****


View Profile
« Reply #962 on: February 19, 2017, 04:54:11 AM »

So how do people deal with serialisation when it comes to passing data between 32-bit and 64-bit systems, a serialisation that might have been produced on one system and then read on another? Might it be as simple as sticking to types of known size (in C++, like using std::int32_t instead of int and avoiding using those that are not available on both) or does it get more involved than that?

Sticking to explicitely size types available on both will do the trick.
It's a good idea in general to use explicit type sizes almost everywhere, but for things that are in some form serialized, sent over the network, etc, it's crucial to do it.
Logged
ThemsAllTook
Administrator
Level 10
******



View Profile WWW
« Reply #963 on: February 19, 2017, 04:15:16 PM »

As long as you're not serializing pointers or raw memory dumps, you shouldn't run into any problems. Just be vigilant about types that change size between architectures, struct packing, and endianness. Also remember that 64-bit numbers have pretty much always been available on 32-bit systems with types like long long and double.
Logged

JWki
Level 4
****


View Profile
« Reply #964 on: February 20, 2017, 01:18:04 AM »

What 32 bit systems are you interested in anyways? On desktop you can pretty much ignore them, looking at steam hw surveys.
I don't even do 32 bit builds anymore.  
« Last Edit: February 20, 2017, 08:45:33 AM by JWki » Logged
oahda
Level 10
*****



View Profile
« Reply #965 on: February 23, 2017, 07:34:50 AM »

Dunno, just checking!
Logged

oahda
Level 10
*****



View Profile
« Reply #966 on: February 23, 2017, 08:10:21 AM »

I know I should just benchmark the options and see, but it's always nice to hear people's thoughts...

I'm just barely starting to learn about cache stuff, so I really have no idea what I'm talking about, but... School me!

Say I have an hierarchy of objects in my scene (a tree), and I have to allocate each on the heap dynamically, no way around it right now. I need to iterate recursively over all of these every frame.

Considering stuff like cache misses, alongside the fact that I have to put stuff on the heap anyway, is it a better idea to just connect them as a linked list and jump from pointer to pointer, or will I still be able to take advantage of cache hits if I pack the pointers to each in a continuous array and iterating over it, even if I have to jump to dynamic memory from each element anyway, which is not continuous in memory?

Consider also the fact that stuff will be inserted and removed from the list not too rarely, which may make the linked list the better option. But I sometimes hear people say linked lists are basically never the best option and should mostly just be avoided forever, so I was wondering whether caches had anything to do with that.

Doesn't Box2D use a linked list for its objects (or used to)?
Logged

qMopey
Level 6
*


View Profile WWW
« Reply #967 on: February 23, 2017, 08:22:00 AM »

Box2D is not very cache optimized at all. Internally it's so not very cache friendly it has to, every time it updates, make large arrays of stuff. Box2D will loop over all live bodies, make giant temporary arrays of state by copying in pieces of the body, and then use these arrays to solve collisions. Once solved, the state is copied back out into the bodies. It is pretty inefficient, but for 2D games often this does not matter.

As for your tree scenario: if you can get the tree small enough to fit 100% inside the CPU cache, then even if you jump around in random-access fashion it will be very fast. The idea is to get all memory touched at a certain time into the CPU cache. This happens naturally linearly looping over arrays of any size, but can also happen if the amount of data touched is small enough. Then the memory access can be anything, not just linear array traversals.

Just googled i7 L1 cache size (the fastest cache), says 64kb per core. That's quite a bit, but it's also shared by the operating system and potentially other processes. So very rough estimate say you can have maybe 32kb of L1 cache to play with. So if your algorithms can quickly fit into 32kb of memory (get it all in the cache without too many cache misses), then you're going to have some very fast code. A typical cache line is what, 64 bytes? That means if every single cache line pulled into L1 misses, that's 512 cache misses. If a cache miss takes say, 150 cycles on average (idk I'm making this number up), then you'll be staring at 76800 cycles of overhead. The algorithm itself might only take 10k cycles, so effectively a 7x slowdown if that's the case.

Anyway for 2D games this stuff does not matter too much except in a few spots, like collision broadphase, or maybe one or two other places... Like simulating cloth, or particles on the CPU. Or maybe running tons of pathfinding/raycasts.
« Last Edit: February 23, 2017, 08:30:43 AM by qMopey » Logged
qMopey
Level 6
*


View Profile WWW
« Reply #968 on: February 23, 2017, 08:37:06 AM »

Anyway caches are simple. A cache miss means a cache line is pulled into the cache. Typically a cache line is 64 bytes. There are different levels of the cache. Pulling something from L3 to L2 is much less expensive than pulling a line from RAM to L3. Pulling a line from L2 to L1 is very fast.

The CPU will start prefetching memory for you if it realizes it is looping over an array, even if you are looping in a weird way (like for( int i = 0; i < x; i += stride / 4 )). CPUs automatically prefetch linear traverals. A prefetch will avoid the cost of a cache miss, if the prefetch is successful (the cache line arrives before it is needed). Unsuccessful prefetchs will be quite bad, since when a cache line is pulled in another is evicted. This makes manually prefetching very tricky.

For example say we want to sort an array of integers. It can be smart to first make sure the array is much less than 32kb in size. Then we can loop over the elements of the array and set them all (this will pull each element into the cache via cache lines). Then running the sort algorithm, even with random accesses, will be very fast. Similarly if this array is in the cache, doing a binary search on the array after sorting will also be very fast. Tree broadphase can work this way.
Logged
JWki
Level 4
****


View Profile
« Reply #969 on: February 23, 2017, 08:58:16 AM »

I know I should just benchmark the options and see, but it's always nice to hear people's thoughts...

I'm just barely starting to learn about cache stuff, so I really have no idea what I'm talking about, but... School me!

Say I have an hierarchy of objects in my scene (a tree), and I have to allocate each on the heap dynamically, no way around it right now. I need to iterate recursively over all of these every frame.

Considering stuff like cache misses, alongside the fact that I have to put stuff on the heap anyway, is it a better idea to just connect them as a linked list and jump from pointer to pointer, or will I still be able to take advantage of cache hits if I pack the pointers to each in a continuous array and iterating over it, even if I have to jump to dynamic memory from each element anyway, which is not continuous in memory?

Consider also the fact that stuff will be inserted and removed from the list not too rarely, which may make the linked list the better option. But I sometimes hear people say linked lists are basically never the best option and should mostly just be avoided forever, so I was wondering whether caches had anything to do with that.

Doesn't Box2D use a linked list for its objects (or used to)?

To more directly answer your question a) linked lists are never what you want if you ever iterate over all the elements. And you do in this case so you really don't want a linked list.
However iterating over pointers that point to random heap locations is not better.

What matters is where the actual data lives and you want that to be in contiguous memory locations.

What's your reason for having to heap allocate the objects individually?
Logged
oahda
Level 10
*****



View Profile
« Reply #970 on: February 23, 2017, 09:14:30 AM »

Very informative! Thanks a lot. c:

So how much does dynamic and non-dynamic memory play into this? Is it always faster, and significantly faster, to iterate over a bunch of non-pointers (so that the entire contents of the array and everything in it that I access is continuous in memory) than over a bunch of pointers to the same data (where only the pointers themselves, but not the data blocks they point to, will be continuous, although the data in each block may itself be continuous), and if so, why, or if not, why not? Is it worth worrying about if the latter solution might be easier to work with and more flexible?

Concrete examples:

Non-pointers:

Code:
struct Data
{
    int a;
    …
};

std::vector<Data> data;



for (auto &i : data)
{
    i.a *= 2;
    …
}

Pointers:

Code:
struct Data
{
    int a;
    …
};

std::vector<Data *> data;



for (auto &i : data)
{
    i->a *= 2;
    …
}

I'm fairly sure the latter is worse, but I can't say I know the exact low-level reason besides "continuous memory is good".



@JWKi:
Flexibility. I am actually pondering ways to go about it differently without losing too much flexibility however, but I wanted this situation to be the one in my question for now, if nothing else so that I can learn more about caching, which is one of those topics I still haven't really explored but decided it might be time for.

I'll probably come back with a different question if I conclude that the other idea is flexible enough and can actually work.

I guess one of my biggest problems when it comes to avoiding those pointers is how to support different configurations of data in the same list without losing information. I guess the only solution is to keep one list per type of data block, but if I end up having lots of different types, maybe that will end up less efficient after all, if I have to keep doing lots of different iterations anyway? I have some learning to do!
Logged

qMopey
Level 6
*


View Profile WWW
« Reply #971 on: February 23, 2017, 09:21:24 AM »

The latter is worse because you touch pointer memory, and then touch the data memory. That's a lot more memory to touch, and fills up the cache more. Also the CPU may have trouble prefetching the data array for you, and instead only prefetch the pointer array. That means you might get a bunch of misses during the loop.

For 2D games this stuff usually doesn't matter. Starting with a flexible solution is a pretty good idea. Then when you hit a bottleneck and get experience with performance problems applying knowledge about cache stuff is important.
Logged
JWki
Level 4
****


View Profile
« Reply #972 on: February 23, 2017, 09:25:43 AM »

So just to go into the heap allocation thing one more time - if you have to allocate your objects with new for some reason, that's completely compatible with putting them in memory in a contiguous fashion.
Placement new exists for that. Just allocate the memory manually and placement new them into it.
Logged
qMopey
Level 6
*


View Profile WWW
« Reply #973 on: February 23, 2017, 09:28:31 AM »

So just to go into the heap allocation thing one more time - if you have to allocate your objects with new for some reason, that's completely compatible with putting them in memory in a contiguous fashion.
Placement new exists for that. Just allocate the memory manually and placement new them into it.

To add to this, can also just call Init( ptr ), instead of placement new. Then it also runs in C, don't have to include <new>, and it's possible to get a pointer to the Init function -- not possible to get pointers to constructors.  Shrug
Logged
gimymblert
Level 10
*****


The archivest master, leader of all documents


View Profile
« Reply #974 on: February 23, 2017, 09:31:09 AM »

Quote
So how much does dynamic and non-dynamic memory play into this? Is it always faster, and significantly faster, to iterate over a bunch of non-pointers (so that the entire contents of the array and everything in it that I access is continuous in memory) than over a bunch of pointers to the same data (where only the pointers themselves, but not the data blocks they point to, will be continuous, although the data in each block may itself be continuous), and if so, why, or if not, why not? Is it worth worrying about if the latter solution might be easier to work with and more flexible?

I don't deal with that level of prohgramming yet, but it sound like AoS (array of structure) and SoA (structure of array) problem ...

Here is some video if any help, it's from a series math for game programmers, by jorge rodriguez:
https://www.youtube.com/watch?v=byyda5ZX2V4?
Code for Game Developers - The Memory Hierarchy
https://www.youtube.com/watch?v=qZTt8JtQOmw?
Code for Game Developers - Cache Levels
https://www.youtube.com/watch?v=B4DuT61lIPU?
Code for Game Developers - Cache Levels Demonstration
https://www.youtube.com/watch?v=ScvpoiTbMKc?
Code for Game Developers - SoA vs AoS

Logged

JWki
Level 4
****


View Profile
« Reply #975 on: February 23, 2017, 09:33:09 AM »

So just to go into the heap allocation thing one more time - if you have to allocate your objects with new for some reason, that's completely compatible with putting them in memory in a contiguous fashion.
Placement new exists for that. Just allocate the memory manually and placement new them into it.

To add to this, can also just call Init( ptr ), instead of placement new. Then it also runs in C, don't have to include <new>, and it's possible to get a pointer to the Init function -- not possible to get pointers to constructors.  Shrug

That doesn't work when you need a constructor to be called which seems to be the requirement at hand.
Logged
oahda
Level 10
*****



View Profile
« Reply #976 on: February 23, 2017, 09:36:49 AM »

As I thought then. Good to confirm. Thanks again! Hand Thumbs Up Right

I know well not to prematurely optimise—I just ask questions from time to time to increase my knowledge, even if I won't end up using that knowledge. c:

I actually have ideas right now for a bit of a hybrid system, that only uses the flexible stuff where desired (and that's mostly useful for testing things quickly or a couple of one-off things related more to particular gameplay, kind of like where you would and wouldn't use scripting), that uses a less flexible system for the bulk of the game, but not all of it. We'll have to see what happens, but this knowledge is nice food for thought. It's always nice to think about things in new ways.

(it's a 3D game I'm setting stuff up for BTW)

@JWKi:
I was about to ask about / look up placement new, actually! Another thing I've barely ever used. Was going to experiment with or ponder that for a bit as well.
Logged

qMopey
Level 6
*


View Profile WWW
« Reply #977 on: February 23, 2017, 09:42:23 AM »

@JWKi:
I was about to ask about / look up placement new, actually! Another thing I've barely ever used. Was going to experiment with or ponder that for a bit as well.

somebody is psychic  Cool
Logged
JWki
Level 4
****


View Profile
« Reply #978 on: February 23, 2017, 09:55:21 AM »

@JWKi:
I was about to ask about / look up placement new, actually! Another thing I've barely ever used. Was going to experiment with or ponder that for a bit as well.

somebody is psychic  Cool

No I just still use constructors a lot AND do memory allocation myself most of the time so I could anticipate the need for that.

Looking forward to see what Prinsessa is prepping in any case.
Logged
oahda
Level 10
*****



View Profile
« Reply #979 on: February 26, 2017, 11:23:00 AM »

Been trying to find some reading, but I don't feel like every question has been answered.

Memorywise, I know virtual inheritance isn't a huge blow, since there is only one vtable and each instance only store a pointer. My question regards virtual functions. The way I've managed to organise stuff right now, my code always knows the exact type of each object, and the objects do not derive from a common base class and so there is no virtualness involved.

However, there are a couple of functions I would like to have on objects that each could implement individually. The way it works right now I can actually have non-virtual inheritance and just hide functions with the same name in a non-virtual base class since I'm always operating on the exact (in this scenario derived) type, and so there is no confusion: the redefined function always gets called. But this might be bad practice.

I wonder, if the compiler always sees the exact type and never needs to resolve virtual calls down the chain, but can go directly to the version it needs to call, is this any less efficient than hiding instead of overriding? How does virtual inheritance and overridden functions work out in memory compared to the non-virtual version? Is it worse for caching? And will the compiler also be able to inline some of these virtual functions since it knows the exact type and no resolution needs to happen?
Logged

Pages: 1 ... 47 48 [49] 50 51 ... 69
Print
Jump to:  

Theme orange-lt created by panic