While reading "Pitfalls of Object Oriented Programming" I discovered a new term: "Branch Prediction". Branch prediction is an optimization used by the CPU when it encounters a "branch" in the code. A branch is basically an if-statement; depending on some condition, execution will either continue or go to another location. Because of instruction pipelining, the CPU is working on multiple instructions at once. However, when it reaches a branch in the code, it can't keep grabbing more instructions because the outcome of the condition which the branch depends on hasn't been determined yet. This results in a stall while the CPU chugs through the condition. To help solve this, branch prediction is used to make a guess (Spock would not approve) at the outcome of the condition, and start processing the instructions based on that outcome. If the CPU guesses incorrectly though, the pipeline needs to be flushed out, and processing continues at the correct location. This flush lands us back to where we were to begin with though, and possibly a little worse off (I am assuming that some time is spent performing the flush), so it is a good idea to avoid branch mispredictions as much as you can. Different CPUs will perform branch prediction differently, so "predicting" how it will handle the branch may be difficult. It may be a good idea to simply avoid branching your critical code altogether.
I later encountered another term while reading this slideshow (warning: contains some swearing): "Load-Hit-Store" (LHS). I'm a bit sketchy on the details, but this seems to be the basic idea: It takes some time to write to memory, and so while the data is making its way over there the CPU continues executing instructions. This works perfectly fine for the most part, but if you have worked with parallel coding before you might know where this is headed: If an instruction attempts to read from that location in memory, the CPU stalls until the write operation is complete. "Common Performance Issues in Game Programming" over at Gamasutra discusses this problem and presents some solutions. One solution to the problem is to use a register rather than a memory location if you need to read back something right after writing to it. Another good article on LHS is "Load-hit-stores and the __restrict keyword".
I've known this for quite a while, but casting floats to ints is a bad idea. What I didn't know is that floating-point registers and integer registers are completely separate, and you could think of the operations on both as being in separate processing units (which may literally be the case, but I do not know for sure). There isn't any way to move a value from an integer register directly into a floating-point register, so the value needs to be stored in memory, and then read into the destination. Then of course you have the problem of actually converting the integer into a float, which involves answering the question "how should we round this?". The rounding mode is set via a special register, but this should be done rarely because it causes a pipeline flush. You could simply set it once and forget it, but unfortunately libraries and other code which is out of your control could change it, causing some very annoying bugs in the process if your program depends on the rounding mode to be constant. One way the compiler can fix this is by storing the current rounding mode, setting the mode you want, perform the cast, and then resetting the rounding mode back to what it was before hand.
I'd like to end this with a couple quotes from Pitfalls of Object Oriented Programming which caught my eye:
"Don't test for exceptions - sort by them."
"Design for specifics, not generics (generally)."