Saturday, April 30, 2011

Branch prediction, LHS, and casting with floats

While reading "Pitfalls of Object Oriented Programming" I discovered a new term: "Branch Prediction".  Branch prediction is an optimization used by the CPU when it encounters a "branch" in the code.  A branch is basically an if-statement; depending on some condition, execution will either continue or go to another location.  Because of instruction pipelining, the CPU is working on multiple instructions at once.  However, when it reaches a branch in the code, it can't keep grabbing more instructions because the outcome of the condition which the branch depends on hasn't been determined yet.  This results in a stall while the CPU chugs through the condition.  To help solve this, branch prediction is used to make a guess (Spock would not approve) at the outcome of the condition, and start processing the instructions based on that outcome.  If the CPU guesses incorrectly though, the pipeline needs to be flushed out, and processing continues at the correct location.  This flush lands us back to where we were to begin with though, and possibly a little worse off (I am assuming that some time is spent performing the flush), so it is a good idea to avoid branch mispredictions as much as you can.  Different CPUs will perform branch prediction differently, so "predicting" how it will handle the branch may be difficult.  It may be a good idea to simply avoid branching your critical code altogether.

I later encountered another term while reading this slideshow (warning: contains some swearing): "Load-Hit-Store" (LHS).  I'm a bit sketchy on the details, but this seems to be the basic idea:  It takes some time to write to memory, and so while the data is making its way over there the CPU continues executing instructions.  This works perfectly fine for the most part, but if you have worked with parallel coding before you might know where this is headed: If an instruction attempts to read from that location in memory, the CPU stalls until the write operation is complete.  "Common Performance Issues in Game Programming" over at Gamasutra discusses this problem and presents some solutions.  One solution to the problem is to use a register rather than a memory location if you need to read back something right after writing to it.  Another good article on LHS is "Load-hit-stores and the __restrict keyword".

I've known this for quite a while, but casting floats to ints is a bad idea.  What I didn't know is that floating-point registers and integer registers are completely separate, and you could think of the operations on both as being in separate processing units (which may literally be the case, but I do not know for sure).  There isn't any way to move a value from an integer register directly into a floating-point register, so the value needs to be stored in memory, and then read into the destination.  Then of course you have the problem of actually converting the integer into a float, which involves answering the question "how should we round this?".  The rounding mode is set via a special register, but this should be done rarely because it causes a pipeline flush.  You could simply set it once and forget it, but unfortunately libraries and other code which is out of your control could change it, causing some very annoying bugs in the process if your program depends on the rounding mode to be constant.  One way the compiler can fix this is by storing the current rounding mode, setting the mode you want, perform the cast, and then resetting the rounding mode back to what it was before hand.

I'd like to end this with a couple quotes from Pitfalls of Object Oriented Programming which caught my eye:
"Don't test for exceptions - sort by them."
"Design for specifics, not generics (generally)."

Friday, April 29, 2011

Sys reqs, storage caches, animation and optimization

While reading the "The Technology of a 3D Engine" series over at Beyond3D, I had an idea: Create a PC game and optimize it for a specific set of hardware.  This would be similar to writing a game for a console, and could be useful for setting the system requirements for a game.  This idea is more of a learning tool though; creating a game and optimizing it for one set of hardware is probably easier than doing the same for a general set of hardware.  You could use this to gain experience with game development, and test it on other hardware to see how you could expand it to work with a wider range of systems.

The "The Technology of a 3D Engine" series brought up a rather interesting way of looking at memory:  Treat RAM as a cache for the storage drive.  When data needs to be accessed, RAM is checked first.  If the data isn't in RAM, it is fetched from the storage drive and placed in the least accessed spot in RAM.  I'm not entirely sure how you would perform this check without causing significant overhead though...  The article probably mentioned (or implied) the solution, but I am not familiar enough with memory managers to pick it out.

While reading through the Solid Angle blog (this post in particular) I started wondering if it is possible to have all procedural animation be physics-based.  One advantage to this is that it would look smoother with less effort (assuming the physics simulator is working fine), but I can see that you would run into problems.  Tile-based movement could become tricky for example, rather than being a very simple way of handling movement.  I'll investigate this when I get into the "experience" phase of the research I am doing.

One last thing: I am planning on making use of profiling more than I have in the past.  I've tested out the Very Sleepy profiler, but I haven't done any serious optimizing with it.  In addition to the profiling software, I will probably use timers to check how fast particular parts of the code are executing.  Keeping everything within its own time budget should be interesting.

Thursday, April 28, 2011

Constraint solver types

I spent most of the day reading about constraint solvers (as well as error correction).  It seems like you can categorize constraint solvers as either sequential or simultaneous.

Sequential solvers solve each constraint individually.  Because solving one constraint could invalidate another constraint, a sequential solver is normally iterative: All constraints are solved multiple times until a maximum iteration has been reached or all of the constraints are solved.  These are generally easier to both understand and implement.

All of the constraints are solved at the same time, removing the need to iterate.  This normally involves solving an LCP, which appears to be a difficult task.

The solvers can then be split up into force-based, impulse-based, and finally position-based methods.

The constraint is solved by applying a force.  For example, a constraint to keep two objects within a certain distance would apply a force pulling the objects together if they become separated further than that maximum distance.

An impulse is applied to the constraint anchor points if the constraint is not met.  For example, a collision constraint would apply an impulse in order to push the two objects away from each other.

The position is changed to meet the constraint.  This appears to be mostly used in error correction, such as ensuring that two objects do not interpenetrate.  I'm not sure if this is a standard way of going about this or not, but I thought it was a cool idea: Don't change the position directly.  Instead, apply an impulse and perform explicit Euler integration.

I'm done reading that rigid body dynamics thesis for now.  I'll be reading some articles on 3D engine architecture next, including a rather long one on scene graphs.  That is for another blog post though.

Wednesday, April 27, 2011

Basics and not so basic. Also, names are good.

I've decided to skip the additional research I have been doing while reading through A Unified Framework for Rigid Body Dynamics.  After reading about a certain subject, I would read about the same subject in the various programming books I own (Game Physics, Real-Time Collision Detection, etc.).  While this allowed me to gain a firmer understanding of the subject, I found that I would forget too much of the information by the time I got back to the thesis (or after learning the next subject).

Half of the day was spent in review for the most part.  I was reading through chapter 2, which covers essential areas of physics.  My experience with physics has been in 2D up till now, so I can't say that I didn't learn anything new though.  In particular, finding a useful definition of "tensor" was surprisingly difficult.  I eventually worked out that it is actually another way of saying array:  a "2nd-order tensor" means a 2D array like A[4][7], a "1st-order tensor" means a 1D array like A[20] and a "0-order tensor" is just a normal scalar.  It pretty much continues on like this with 3rd-order etc.

Another useful tid-bit I learned was that "skew-symmetric matrix" is another name for "cross-product matrix".  It might just be me, but I had some difficulty figuring out what a "cross-product matrix" was.  Performing a quick Google search now that I know this, the answer is more obvious...  Hindsight apparently is 20/20.

Chapter 2 also introduced the linear complementarity problem.  While familiar with the name (particularly as "LCP"), I don't have very much knowledge of what it actually is.  The introduction was very brief, but I came away from it being a little less scared of the ominous "LCP" which inevitably pops up when reading about physics simulation.

Chapter 3 was where things became really interesting for me.  I wish I had known about this book last year when I was researching physics simulation!  It's exactly what I was looking for: It discusses the different "modules" of a physics simulation, gives them names, and discusses some popular ways of using them (and gives them names as well).  The "Explicit Time Step Method" is pretty similar to what I have been thinking of using, so I will likely use that one.

Tuesday, April 26, 2011

Newton's laws, but not much else

Unfortunately, I wasn't able to get much work done today.  I reviewed Newton's three laws of motion and started reviewing particle kinematics.

I don't want to end the post there, so I'm going to tack on the three laws:

Law I
An object at rest stays at rest, and an object in motion stays in motion in the same direction and with the same speed as long as there is no net external force acting on the object.

Law II
The rate of change of the velocity of an object is directly proportional to the net external force acting on the object and inversely proportional to the mass of the object.  a = F/m

For every force acting on an object there is an equal and opposite reactive force.

Monday, April 25, 2011

Orientation representations

The orientation of objects can be represented in multiple ways: Euler angles, matrices, angle and axis, and quaternions.

Euler angles
The orientation is represented as three rotations about the basis vectors.  Rotation about the x-axis is called "pitch" (also "attitude").  Rotation about the y-axis is called "yaw" (I think this is also called "heading", but that might only be me).  Rotation about the z-axis is called "roll".  This representation requires the least amount of memory; you only need to store three angles.  It is also very easy to look at (such as in a debugger or in an object editor) and understand the orientation.  However, the order in which you apply these rotations matters so you want to make sure you pick an order and stick with it (pitch, then yaw, then roll for example).  This representation is prone to problems as well, such as gimbal lock and it is a nightmare to interpolate.

This is just a normal rotation matrix.  This representation requires the most memory (9 variables in 3D), is difficult to interpolate, and is pretty much impossible to look at and understand the orientation based solely on the variables.  However, this representation can be used directly to transform vectors and points; the other representations need to be converted to matrix form before they can be used to perform transformations.  One interesting thing I learned about these is that you can get the inverse of a rotation matrix by simply transposing it.

Angle and axis
The orientations is represented as a rotation about a certain axis.  I have the least experience with this representation, so I can't comment on it much.  It seems like this would be a half-way point between Euler angles and matrices.  I'm not sure how interpolation would work, but from what I understand this representation does not suffer from gimbal lock.  It also requires less memory than a matrix, but slightly more than Euler angles.

I have only read about these, but they seem to be a way of using the angle and axis representation in one variable rather than two.  I'm guessing that bunching the angle and axis into one "variable" makes performing math on it easier, but as I said I have no actual experience with using them yet.  This representation is the easiest to interpolate, and it doesn't suffer from the same problems that Euler angles do.  Due to the normal problems with representing fractions on a computer though, these can become invalid, and quickly; quaternions, when used to represent orientation, must have a magnitude of 1.

2D games normally use Euler angles to represent orientation, when orientation is even needed.  Some of the more advanced ones will use matrices, but angle and axis/quaternions are only really worth it in 3D because the axis is always the z axis in 2D.

Sunday, April 24, 2011

Methods of animation

There are various methods to do animation in a game.  The methods I have encountered are: Frame-based, Key-frame, Procedural, and Physics-based.

Finding one specific name for this method of animation is difficult.  Other equally appropriate names include cel, sprite, and flipbook.  Cel and flipbook are terms from traditional animation.  Cel animation was performed by drawing the moving parts of a frame onto transparent sheets and aligning these sheets on top of each other, on top of a background layer.  A flipbook is a book of pictures which appear to animate as you flip through the pages.  This kind of animation is frequently used to animate sprites in 2D games, so "sprite animation" is also a valid term for it.  Frame-based animation is having a set of pre-rendered frames which are displayed in succession.

The positions and orientations of the various parts of an animated entity are stored with a timestamp.  Each one of these states is called a "key-frame".  The game engine interpolates between these states based on the current time, resulting in a smooth animation rendered in real-time.  Storing animations in this format reduces memory costs while increasing performance costs.  The reduced memory cost (and potentially reduced animator work time) may make this the preferred method over frame-based animation.  If the entity is too complicated to render in real time however, frame-based animation is a better idea.

This method positions and orients based on code, whether as part of the engine or in a script.  For example:
player->pos.x -= 5; // This is procedural animation, although using a loose definition of "animation".
Some may not agree with me on this, but I classify any position/orientation update code as procedural animation.  Collision detection is part of this method.

This is technically a sub-category of procedural animation.  In physics-based animation, the positions and orientations of entities are controlled by the laws of physics (although those laws may be bent or broken in places in order to get the effect you are going for).  The major advantage to this method is that you do not need animators to create animations for you.  However, getting a physics simulation to work well enough for a completely physics-based game would be difficult.

Usually, a 2D sprite-based game will employ frame-based and procedural animation.  Sometimes physics-based is added in there (Angry Birds for the iPhone is an example of such a game).  The physics aren't always obvious though.  For example, using a velocity to adjust the position of the player's character could be considered physics-based animation, although I personally would put that in the grey-zone between procedural and physics-based.

Learning about the differences between key-frame animation and procedural animation gave me the somewhat funny idea of generating the key-frame animation on the fly.  This would technically be key-frame animation, but it would be powered by procedural animation.  Not very useful in practice, but thinking along those lines helped me clarify the differences between the two.

Forward and inverse kinematics

Forward kinematics is a hierarchy of matrices attached to various parts of a model where child nodes are positioned and oriented relative to their parent node.  For example, lets say that you have a model of an arm where the forearm is a child of the upper arm.  If the upper arm is oriented upwards, the forearm stays attached and thus moves along with it, staying in the same orientation relative to the upper arm.  In other words, you don't need to manually reposition the forearm when the upper arm is moved.

"Forward" kinematics is called that because changes are propagated down the hierarchy from where they were applied.  So if you wanted to move the hand to a specific spot, you would have to orient the upper arm, and then the forearm, and finally the hand.  You could possibly make this easier by orienting the upper arm in such a away that the forearm doesn't need to be oriented,  but this might not look very natural.

The obvious limitation with forward kinematics is that you cannot directly move the hand from our example.  A much better method would be to move the hand, and have the rest of the arm follow along.  This is called "inverse kinematics", because it is the exact reverse of forward kinematics:  The changes propagate back up the hierarchy rather than down it.  This is achieved by using invisible bones, but I'm not quite clear on the specifics.  I'll post more details as I learn them.

Saturday, April 23, 2011

Nonpenetration requirement = troublesome

I skimmed through the collision detection sections of "Game Physics" to try and figure out what kind of physics engine the book has in mind.  From what I could gather, it depends on non-interpenetration.  Non-interpenetration is achieved by finding the time of collision and stopping the simulation at that point in order to handle the collision.  The most referenced method seemed to be bisection.  Bisection involves cutting the delta time in half every time a collision is found and restarting the whole simulation from the initial time.  You eventually end up with a delta time which advances the simulation up till the first collision.

One problem with making sure that there are no interpenetrations by stopping the simulation at the time of collision is that the delta time can become very, very small.  When this happens, you end up simulating many times per step in order to get through the whole step delta time.  If the computer has the power to manage this within the time allotted to the physics step then you don't have a problem, but this is rarely the case.

One example of where this could happen is when a ball is bouncing with a high velocity between two immovable blocks.  This could potentially be "solved" by putting a cap on velocities.

Another example is when a block is resting on the ground.  The block will have a normal velocity towards the ground due to gravity, and so the game detects a collision right off the bat and sets the delta time to be zero (collision occurred at the initial time).  The offending velocity is corrected and the simulation attempts to finish the step.  However, the velocity will go back each time due to the constant effect of gravity and the delta time will remain zero.  Presto!  An infinite loop.  Now, I should note that my only experience with this problem has been with a velocity-based constraint solver.  From what I understand, an acceleration-based constraint solver would negate gravity and the normal velocity would not point towards the ground.  One idea I have for solving this problem under a velocity-based solver is to detect when an object is at rest and suspend simulation of that object until something it is touching is no longer suspended (the ground is always suspended in this setup).  This would also be a rather good optimization, and I suspect it is used in several physics engines.  Deciding whether or not an object is at rest would likely involve checking if the velocity is lower than some minimum.  The minimum should be lower than the amount of velocity gained in one step due to gravity though, otherwise objects will become suspended (literally) after entering free-fall.

I'd like to mention a few things I learned about animation today, but they will have to wait till tomorrow.

Friday, April 22, 2011

Analytic vs. numerical and determinants again

I was able to find a definition of the difference between an analytic solution and a numerical method.  Solving something analytically means thinking through the problem and solving it by hand.  Solving something numerically usually means iterating a formula to get an approximate solution.  Solving numerically is done when the problem in question is too complicated to solve analytically.

I spent most of the day reading up on matrix determinants.  A few interesting things to note are: Only square matrices have determinants.  The determinant of a matrix is the area of the parallelogram(2D)/parallelepiped(3D) defined by the row vectors of the matrix.  The sign of the determinant can be used to perform basic collision tests, such as checking which side of a line a point is on.  The sign of the determinant also tells you whether or not the matrix can be inverted (if the determinant is 0, the matrix does not have an inverse).

Thursday, April 21, 2011

A free rigid body simulation book

I started off the day reading "Constraint-based collision and contact handling using impulses".  Unfortunately, the method discussed depends upon finding the time of collision and performing collision handling at that point in time, not after there is already penetration.  Unfortunately, this can become rather costly due to the time step becoming really small, and the actual calculation of the time can be expensive.  I have tried out methods like this in the past, but I was unable to get them to run at acceptable real-time speeds.

I then moved on to read a set of slides from this year's GDC.  The slides showed various physics "artifacts", such as vibrating stacks of boxes, and showed some methods of solving or hiding them.  As is usual with slides, they were pretty light on text and you need to know about the problem in advance to get much out of it.  I recognized contact point caching from Erin Catto's original Box2D GDC presentation, which was helpful in that it was the first time I have actually seen a diagram of how caching could help.  It was pretty vague though.

After getting through the slides, I started reading the free book "A Unified Framework for Rigid Body Dynamics".  Totaling at 153 pages, this is a heavy read compared to most of the other papers and articles I have read on rigid body simulation.  However, it should be very helpful since one of the goals of it is to teach how the various parts of physics simulation fit together, which just happens to be the part I am stumbling on.  It also gives what seems to be a very thorough definition of terms, which helps greatly if you are teaching yourself this stuff and need to lookup further information or discuss it with other people.

As part of going through the second chapter of the previously mentioned book, I went back and re-learned what an orthogonal matrix is, as well as how to calculate the determinant and inverse of a matrix.  I'm running out of time, so I can't give a full description, but an orthogonal matrix is basically a matrix which doesn't scale or skew.  The only thing I'll say about the other two at the moment is that calculating them is complicated.