You want to write a standalone program that does the same thing every time and prints the benchmark results, similar to writing unit tests.
Here are some classic pitfalls, and how to avoid them
- using a poor resolution timer: google for the OS specific timers to use for benchmarking, they are often different from the general purpose timers
- micro benchmarking: ensure you are testing a workload that resembles real data, or else you will optimize the wrong thing
- too little work: the larger a piece of work is, the more (relative) accuracy you get on your timers. If your test is too quick, insert a loop to do it lots of time (but avoid micro benchmarks, above)
- sampling: each time you run a benchmark, the results will randomly vary. You should run the same benchmark several times and look at the median to get more consistent results. Looking at the 95th percentile can also be useful, depending on what your performance goals are.
- averaging: if you have several benchmark tests covering different aspects of your program, use the geometric mean to combine then into an overall number. This avoids problems where the slowest test becomes the dominant factor in the overall measure.
- hot vs cold: there are many, many caches in a computer, and benchmarks will run faster when they are hot. You should always run a benchmark a couple of times, and discard the results before recording anything, to ensure a consistent hot result. I have yet to figure out how to consistently benchmark cold results, even though it's often the more interesting number.