Show Posts
|
|
Pages: [1] 2
|
|
1
|
Developer / Technical / Re: The grumpy old programmer room
|
on: February 22, 2011, 10:05:17 PM
|
So code an editor with as little GUI as possible. Make it all cryptic hotkeys and mouse motions; it's quite fun that way and acceptable for in-house tools.
Sounds like an emacs mode... 
|
|
|
|
|
3
|
Developer / Technical / Re: The happy programmer room
|
on: February 21, 2011, 10:28:54 AM
|
Man, I use set and map for, like, *everything*.
Aside: This is near and dear to my heart, since I work on a C++ compiler written in C++ for my day job. std::set and std::map are designed for generality rather for performance, so in that sense they're very convenient. They provide very strong guarantees about iterator invalidation, for instance, which makes them quite easy to use. The downside, though, is that they're not generally a good choice for performance, except in certain rare edge cases. If you notice a lot of time being spent in malloc/free/new/delete in the bottom-up graph of your profiler, consider using a hashtable instead. 
|
|
|
|
|
4
|
Developer / Technical / Re: The happy programmer room
|
on: February 21, 2011, 10:07:44 AM
|
Right, I believe typically you'd malloc a big block of memory upfront and then manage it yourself after that.
Yes, this is exactly how you do it. Implementing your own general purpose allocator isn't too common, since the allocator that comes with your OS tends to be pretty decent (at least on Unix-like OSes), and writing an allocator that performs well globally is pretty tricky. That said, writing custom allocators for special purpose use is a common technique in C++, usually called pool or slab allocation. Imagine you have some scenario where you know you're going to allocate a bunch of Foo objects, manipulate them, and then free them all. Rather than calling malloc()/free() for each Foo object, you can write your own allocator that uses malloc()/free() to allocate a single large chunk of memory (saving all that overhead from calling them for each object). Then you allocate your objects out of that slab, typically ignoring the costlier parts of a normal allocator like trying to reuse space after an object is dead. You just let the final free() of the slab take care of getting rid of all of them at once. This is one of the reasons why hash tables generally crush std::map on performance: std::map is a binary tree, and performs at least one allocation per insertion. Hash tables, by contrast, allocate a single large array to act as their backing, and re-allocate a larger one only when they run out of space.
|
|
|
|
|
6
|
Developer / Technical / Re: The grumpy old programmer room
|
on: February 21, 2011, 01:53:00 AM
|
Also a minor nitpick, sorry: T1: Write A, Write B T2: Wait for A, Read B
Should that be T1: Write B, Write A T2: Wait for A, Read B
as the case which can get broken by reordering? As posted, it's broken if the writes execute serially as written. Yes, sorry. That's what happens when I write things while sleepy. 
|
|
|
|
|
7
|
Developer / Technical / Re: The happy programmer room
|
on: February 21, 2011, 01:51:54 AM
|
Face it, boy. GC is for whimps.
*shrug* I don't use GC in my own projects, but I don't think it's necessary to deride those who do. Plus, there's value in understanding how it works, why it works, and under what circumstances it works well/poorly.
|
|
|
|
|
8
|
Developer / Technical / Re: The happy programmer room
|
on: February 21, 2011, 01:46:12 AM
|
5x as much memory for minor speed increases could be better used in speeding up whatever algorithm you're working with (caching values, lookup tables, etc) rather than marginally improving allocation/deallocation speed.
Actually, value caches or lookup tables would tend to be more or less invisible to the GC, since their lifetime is effectively infinite. A generational GC will promote them to a long-lived generation and then ignore then for 99% of the time. :-) The premise of a copying GC is that, when it does a collection, it'll copy/compact your objects around in memory. This tends to be good for run time, but the cost is that the GC needs to have sufficient free space to copy/compact your objects into. The takeaway of the study is that, to achieve runtime parity, the GC will need approximately 5x the size of the objects being copied/compacted. The long-lived generation won't get copied/compacted much, so it won't have a significant impact on the overall necessary free space. That said, I wasn't really trying to convince anyone that they should use GC in order to gain performance. The claim was that modern GCs are as fast as, or faster than, explicit memory management. As the paper showed, that's true, but only under fairly ideal circumstances. If you're targeting a smartphone, for instance, where available physical memory is quite constrained, your mileage will likely vary.
|
|
|
|
|
9
|
Developer / Technical / Re: The happy programmer room
|
on: February 21, 2011, 01:23:36 AM
|
I don't believe you. This sounds false. Show me data. There are lots of reasons I can think of to use languages with built-in GC, but execution speed is not one. In most cases(i.e. cases where you're not building a big garbage collection system manually or doing heavy optimisations), C#/Java/Haskell/AS3/Lua/Python are not as fast as C++/C/pascal - this is for more reasons than just GC, but still refutes your statement. A little dated (2005), but here's a study of exactly this issue: http://www-cs.canisius.edu/~hertzm/gcmalloc-oopsla-2005.pdfHere's an executive summary for you: the runtime performance of the best-performing garbage collector is competitive with explicit memory management when given enough memory. In particular, when garbage collection has five times as much memory as required, its runtime performance matches or slightly exceeds that of explicit memory management. However, garbage collection’s performance degrades substantially when it must use smaller heaps. With three times as much memory, it runs 17% slower on average, and with twice as much memory, it runs 70% slower. Garbage collection also is more susceptible to paging when physical memory is scarce. In such conditions, all of the garbage collectors we examine here suffer order-of-magnitude performance penalties relative to explicit memory management.
Basically, sufficiently advanced garbage collectors can meet or exceed the performance of explicit management if you're operating under good conditions: the GC needs extra space to do its copying, and will also tend degrade if your working set is much larger than the amount of RAM present. The key operative words are, of course, sufficiently advanced. The JVM and CLR have highly performant GC's, Mono's has historically been much weaker. And anything based on Boehm GC will almost certainly give you... undesirable performance results.
|
|
|
|
|
10
|
Developer / Technical / Re: The grumpy old programmer room
|
on: February 21, 2011, 12:54:18 AM
|
Have you tried declaring firstBlock as volatile? That should scare the optimiser away.
Will
Right! Thank you; forgot about that one. I found a way around it but waiting is better. You need to be very, very careful with this. Volatile will prevent the compiler from removing the loop, but there are much more serious issues that can crop up when you try to rely on the ordering of memory reads/writes across threads. Even with the compiler scared off, it's entirely possible to get situations like this: T1: Write A, Write B T2: Wait for A, Read B Where the writes in T1 get reordered and T2 ends up reading the wrong value for B. While X86 is pretty sane in regards to the reorderings it can do, ARM and PowerPC are both much more lax in the guarantees they give you. To avoid this kind of possibility, you need to use some form of atomics/memory barriers (GCC's are at http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html). To see more about how this kind of issue can come up, see Hans Boehm's "Threads Cannot Be Implemented As A Library" : http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.90.2412&rep=rep1&type=pdf
|
|
|
|
|
11
|
Developer / Technical / Re: STL Vectors : deleting a pointer out of a vector
|
on: December 05, 2010, 05:00:28 PM
|
The only two reasons to use std::vector are:
- If you need random access into the list - If you need all list members to be stored sequentially in memory (which is necessary sometimes, but definitely not in your case)
If this structure is performance critical, it's likely that neither is the correct answer. std::vector has linear interior insertions/deletions, but has MUCH better constant factors (mostly because it only allocates memory in large chunks). std::list has constant time arbitrary insertions and deletions, but has massively higher constant factors because it allocates memory on every insertion. A good rule of thumb is that, unless your usage pattern is strongly biased in some specific way you can exploit, you want a dense hashtable (not std::map/std::set, which have both logarithmic lookup AND terrible memory allocation constant factors). It sounds like you basically just want a set implemented on top of a hashtable. If you're using hash_map, it'd be something like: hash_map<Coin*, bool> coins; coins.insert(std::make_pair(foo, true)); coins.erase(foo);
This gets you random access, arbitrary insertion/deletion, and allocation performance more like std::vector. The only downside is that iteration will be slower, but IMO that's typically covered up by the other savings.
|
|
|
|
|
16
|
Developer / Technical / Re: Quick Socket Question (C++)
|
on: January 14, 2010, 12:58:31 AM
|
|
That's not portable at all, for two important reasons:
1) Different operating systems, and even different compilers on the same operating system, can and do choose different in-memory representations of an object. Some will reorder fields, others will insert more or less padding. In short, it's a mess.
2) Different computers can have different endian-nesses. This is basically the order of the bytes in a multi-byte value in memory. On X86 PCs, it's little endian (least-significant-byte first), while PowerPC as in all the current gen consoles is big endian.
The solution is to #1 create a defined serialization format. You could serialize to strings, or to a binary format, but it needs to be something independent of how the compiler chooses to layout your objects. The solution to #2 is to pick one canonical endianness for your protocol, and perform input byteswapping on platforms that don't match it.
|
|
|
|
|
17
|
Developer / Technical / Re: Speed up your iPhone game - disable "Thumb"!
|
on: January 12, 2010, 12:22:28 AM
|
I guess maybe is the idea that you get a net performance gain using thumb on for apps with little or no floating point code? I don't know. I'm sure there's someone somewhere thumb is right for but I'm really having trouble justifying Apple making that setting the default.
The idea is that you get a negligible performance loss (on the order of a few percent) on non-FP code, and significant power savings because parts of the chip can be turned off in Thumb mode, in addition to other power savings from less bus traffic, etc. Given that Apple wants to encourage developers to conserve space and power, it makes sense that the compiler optimizes for that unless you tell it otherwise. Hm, so I've got the google code vfp library but the vfpmath library itself is a bit cryptic and doesn't seem to come with any documentation. Is there a guide to using vfpmathlibrary you would recommend? Alternately, is there any reading you'd recommend for someone interested in going ahead and interacting with the vfp via assembly? I've just in general had trouble finding documentation on this stuff. Thanks...
Not really, unless you're up for read the ARM reference manual.
|
|
|
|
|
18
|
Developer / Technical / Re: Speed up your iPhone game - disable "Thumb"!
|
on: January 11, 2010, 10:03:27 AM
|
(though not as much as disabling the accursed Thumb did) It's not evil. In fact, it serves the very important purpose of generating much smaller, denser code, at the cost of performance (particularly floating point). For things that aren't performance critical, it's a really good thing. You know what I very badly want to know but still have no idea, is how the hell do you use the VFP / vector unit.
http://code.google.com/p/vfpmathlibrary/or be prepared to learn some inline assembly. ;-)
|
|
|
|
|
19
|
Developer / Technical / Re: The grumpy old programmer room
|
on: December 17, 2009, 01:25:32 PM
|
A fun fact that I discovered when writing my physics engine last year: the STL containers that Visual Studio uses are by default offensively slow. The STL has a number of performance issues in all implementations, since it's quite highly specified. The biggest one, and the one that ended up being the single biggest performance issue in a project I worked on, was that most of its data structures (list, set, map) are very malloc-intensive. Each calls malloc (via new) at _least_ once per insertion, which ends up being quite expensive if you're inserting a lot. Plus, map and set are using tree-based structures rather than hash tables, which just adds to the slowness. If you care about performance, a simple probed hashtable will blow map/set out of the water. Of course, map/set offer some nice guarantees (most iterator validity while inserting/deleting), but IME that's not usually necessary.
|
|
|
|
|
20
|
Developer / Technical / Re: Chipmunk Physics 5.0.0 released.
|
on: December 07, 2009, 03:05:57 PM
|
|
It looks like you accidentally left a file out of the zip. CMake is complaining about "cpBreakableJoint.c" not being present. Even if I hack the CMakeLists.txt to ignore it, one of the demos doesn't link without it.
|
|
|
|
|