Refactoring Box2D

Have you ever wondered whether your code is performing as best it can? What are your benchmarks, anyhow? How do you determine what “as best it can” means? These are all questions of heavy existential weight, but usually we have a reasonable idea of what it is we’d like to improve and why. Are you looking to turn a mess of unreadable and unmaintainable code into something more pleasurable to behold? Are you tasked with putting a processor pig on a diet? Usually you’ll have a sense of what you really need.

The real question is “How?”

I’m going to examine a concept called “Refactoring” and I’m going to give you a case study in it as I refactor parts of the Box2DAS3 (version 2.0.1) engine with the goal of improving runtime performance and memory consumption.

To refactor code is to change its inner workings without destructively changing its interfaces or altering the output of its functions and expressions in any way. You may wonder “How can this possibly help? Doesn’t this mean that the code is just doing what it always did?”

No, not in the slightest.

I had an itch to see if I could manage to squeeze some more juice out of the ol’ physics engine for the sake of the banner up above. If it’s running more smoothly on your machine now (and it should be. It certainly is on mine), this is the reason why.

Before we begin, there is one thing to keep in mind when you decide to refactor a 3rd party library : thier updates can break code you’re relying on! Your refactored code may not be compatible with the vision the creators originally conceived when they wrote it. They might have code they were intending to implement but simply haven’t, and their ideas might be better than yours. In this case you’re stuck with a lot of time sunk into your own custom branch of the project which might fall far behind the official branch. I will do my best to point out the danger zones as I encounter them.

I will admit to you that what follows is not the most formal method of refactoring. You can use a debugger to step through your code, or you can simply do as I have done here and take the initiative to “run through” yourself, from function to function and “execute” the code in your head. We’ll take this latter approach for now, as I find it tells an interesting story.

A reasonable place to look when identifying performance bottlenecks is anything that happens in a loop, or on an interval. The most obvious interval in the world of Box2D (and the one that is called 30 times per second in my banner) is b2World.Step(); Let’s examine this function. I’ll call out some places in the code via comments:

	public function Step(dt:Number, iterations:int) : void{
 
		m_lock = true;
 
		// *** An instantiation happens 30 times per second *** //
		var step:b2TimeStep = new b2TimeStep();
 
		step.dt = dt;
		step.maxIterations	= iterations;
		if (dt > 0.0)
		{
			step.inv_dt = 1.0 / dt;
		}
		else
		{
			step.inv_dt = 0.0;
		}
 
		step.dtRatio = m_inv_dt0 * dt;
 
		step.positionCorrection = m_positionCorrection;
		step.warmStarting = m_warmStarting;
 
		// Update contacts.
		m_contactManager.Collide();
 
		// Integrate velocities, solve velocity constraints, and integrate positions.
		if (step.dt > 0.0)
		{
			Solve(step);
		}
 
		// Handle TOI events.
		if (m_continuousPhysics && step.dt > 0.0)
		{
			SolveTOI(step);
		}
 
		// Draw debug information.
 
		//*** Regardless of whether it's useful, or whether we're debugging,
		         this function is called 30 times per second ***//
		DrawDebugData();
 
		m_inv_dt0 = step.inv_dt;
		m_lock = false;
	}

First, note the instantiation of a b2TimeStep each time we call the Step function. What is a b2TimeStep? It’s this:

package Box2D.Dynamics{
 
 
public class b2TimeStep
{
	public var dt:Number;			// time step
	public var inv_dt:Number;		// inverse time step (0 if dt == 0).
	public var dtRatio:Number;		// dt * inv_dt0
	public var maxIterations:int;
	public var warmStarting:Boolean;
	public var positionCorrection:Boolean;
};
 
 
}

It’s simply a data structure. There are no functions at all. Now, in case you did not know, instantiation is expensive. It is not something you want to do willy-nilly if you can possibly help it, and from the look of the code above it seems like all the variables are set immediately after instantiation within the scope of the Step function. So instead of instantiating from scratch, let’s simply make one local Class-level variable to contain our Step function’s b2TimeStep object, and do some refactoring:

 
	private var m_stepScopeTimeStep:b2TimeStep = new b2TimeStep();
 
	public function Step(dt:Number, iterations:int) : void{
 
		m_lock = true;
 
		//var step:b2TimeStep = new b2TimeStep();
		var step:b2TimeStep = m_stepScopeTimeStep;

“Why did he choose to to simply assign the old step variable with the object referenced by m_stepScopeTimeStep? Why not just find/replace all instances of the word “step” with “m_stepScopeTimeStep”?

When refactoring it is critical to take small steps. We know that the code works as it was written, so the goal at least for now is to modify it as little as possible while still making it better. Yes, we are still allocating memory for an unneccessary variable at the start of the Step function, but what’s more important right now is to stop instantiating a needless variable every time Step is called.

We now dutifully confirm that our change has not broken the code. It is best to do this with a debugger, but for our purposes we’ll simply execute the code and ensure that it still runs correctly.

This same sort of redundant instantiation happens in other places in the library as well. notably, there are a great many b2Island objects instantiated every frame when only a single one is ever needed. It can simply be re-initialized and reused.

Cutting down on instantiations of b2Islands and b2TimeSteps alone helps save several MB of memory over time, which can be better spent rendering the awesomeness of physics.

The next thing we’ll do in the Step function is examine the call to the DrawDebugData function. Here’s what’s going on inside of it:

public function DrawDebugData() : void{
 
		if (m_debugDraw == null)
		{
			return;
		}
 
		// snipped ...

One thing to remember about ActionScript is that function calls are relatively expensive to perform. You shouldn’t do it without a good reason, and particularly not on a loop. So what we’ll do instead is this:

//DrawDebugData();
if(m_debugDraw) DrawDebugData();

In the event that you’re not debugging the application, you’ll save 30 function calls per second here just by confirming that there’s even a reason to call the function in the first place. Again, we’re not going to change anything inside the DrawDebugData function. It’s not quite in the scope of the refactor… I really couldn’t care less right now about how well it runs in debug mode, as I’m only displaying content in production mode.

So, that’s all well and good. Where should we go next? Let’s scan our way down and see what functions are being called in Step

 
// I wonder what's happening in this function call?
m_contactManager.Collide();

We’ve seen our first call here in this line. m_contactManager is an instance of b2ContactManager, so let’s open it up and see what this function does:

	public function Collide() : void
	{
		// Update awake contacts.
		for (var c:b2Contact = m_world.m_contactList; c; c = c.m_next)
		{
			var body1:b2Body = c.m_shape1.m_body;
			var body2:b2Body = c.m_shape2.m_body;
			if (body1.IsSleeping() && body2.IsSleeping())
			{
				continue;
			}
 
			c.Update(m_world.m_contactListener);
		}
	}

Doesn’t look like anything suspicious is happening here, does it?

Wait!

    // getter functions for "IsSleeping"?  Is this just a boolean we can retrieve for ourselves?
    if (body1.IsSleeping() && body2.IsSleeping())

Let’s open up b2Body and have a look-see:

/// A rigid body.
public class b2Body
{
	/// Creates a shape and attach it to this body.
	/// @param shapeDef the shape definition.
	/// @warning This function is locked during callbacks.
	public function CreateShape(def:b2ShapeDef) : b2Shape{
 
           /*** snip ***/
 
	/// Is this body sleeping (not simulating).
	public function IsSleeping() : Boolean{
		return (m_flags & e_sleepFlag) == e_sleepFlag;
	}
 
           /*** snip ***/
 
	public var m_flags:uint;
 
           /*** snip ***/
 
	// m_flags
	//enum
	//{
		static public var e_frozenFlag:uint			= 0x0002;
		static public var e_islandFlag:uint			= 0x0004;
		static public var e_sleepFlag:uint			= 0x0008;
		static public var e_allowSleepFlag:uint		= 0x0010;  // this is the one!
		static public var e_bulletFlag:uint			= 0x0020;
		static public var e_fixedRotationFlag:uint	= 0x0040;
	//};
		static public var e_sleepFlag:uint			= 0x0008;

So the b2Body not only has a function call for IsSleeping, but it does a bitwise operation based on its current flags uint to determine whether or not it counts as “asleep”. Changing the internal guts of how Box2D determines whether an object sleeps might be worthwhile, but that would require some benchmarking. For now what we’ll do is take advantage of the fact that the variables are all public and perform the comparison without calling the function:

	public function Collide() : void
	{
		// Update awake contacts.
		for (var c:b2Contact = m_world.m_contactList; c; c = c.m_next)
		{
			var body1:b2Body = c.m_shape1.m_body;
			var body2:b2Body = c.m_shape2.m_body;
			//if (body1.IsSleeping() && body2.IsSleeping())
			if ((body1.m_flags & b2Body.e_sleepFlag) == b2Body.e_sleepFlag && 
				(body2.m_flags & b2Body.e_sleepFlag) == b2Body.e_sleepFlag)
			{
				continue;
			}
 
			c.Update(m_world.m_contactListener);
		}
	}

We’ve now eliminated two needless function calls, each of which were being called by Step. Testing this code shows that the application continues to behave properly. But before we go on…

Danger : While we have done nothing to change the actual results of the code on execution we have definitely “painted ourselves into a corner” in a sense. The code becomes vastly more efficient by removing needless get/set function calls, but by breaking encapsulation we’re now at the mercy of fate. By no longer relying on the IsSleeping() function, we’ve lost out on the potential that the function itself may be made more efficient… or more disastrously, if in a future version of the library the way “sleep” is determined changes we’re no longer protected from it by a function that abstracts it away from us. It’s a potentially future-breaking change. In this particular case, it’s not likely that there would be a change, but that’s not the case for other parts of the library. Particularly, there are a number of functions of b2Vec2 that simply return new “cloned” instances of the b2Vec2 with certain transformations or maths applied to them. These functions, called repeatedly, are quite inefficient as they not only are a wasteful function call but also an instantiation. We *could* simply instantiate our own new b2Vec2 instances and apply the simple math functions ourselves, which would save us a function call. Potentially a better solution though, would be to make the function call itself less wasteful. If the authors (or we) choose to create an object pooling scheme that allows them to generate a limited number of b2Vec2 instances and recycle them it would justify keeping the function around. At times like those, it’s completely up to your own judgment and what risks you’re willing to live with. In the case of Box2D, the library is overly encapsulated in a number of ways that hamper performance. More frustratingly, there are several function calls that happen repeatedly where the function body has nothing inside other than a //TODO. While I’m sure in the future there is something to be done in those stub functions, in the meaintime those function calls should be commented. An empty function call iterated for every single “body” in the simulation, 30 times every second can quickly deteriorate your performance. Do you ever wonder how much reality there is in those contrived 100,000 iteration for-loops? This is one of those cases where it actually happens.

And there you have a very basic rundown of what a refactoring is. These changes seem small, but they add up very quickly and the more of them you implement in performance-critical areas the more juice you get from your code. They can be combined with microoptimizations, such as more efficient for-loop declarations, factoring out the use of b2Math convenience functions such as min/max, or the simple vector arithmetic functions. Another place for improvement is to use array-literal style instantiations where a pre-determined array length is unimportant. Truly though, I got my biggest gains from eliminating needless instantiation and allowing direct property access instead of using the get functions for properties that have no actual mutations applied to them in the process. I’d say that one of the biggest potential roads for improvement would be to implement object pools for commonly instantiated trump objects that do not persist, and that serve little value other than as fodder for calculation.

In this particular refactoring you’ll note that I’m tearing down certain OOP constructs for the sake of performance. This should not be taken to mean that well-designed systems and Object Orientation is by default a hog. It’s in general better to start with a system that is well-designed and possibly slower and to selectively break encapsulation to gain boosts than to start with a mess and try to organize it later.

Tags: , , , ,

  1. #1 written by James June 11th, 2011 at 15:25

    This is a fascinating post. I would love to hear if you’ve gotten any further in optimizing Box2D for Flash. I’m working on an Android / iOS game that needs as much tweaking as possible, and this has been the best resource I’ve found so far. Thanks!

  2. #2 written by The Horseman June 11th, 2011 at 20:57

    @ James
    I think the single biggest improvement I gained was when I decided to add a “killswitch” that turned off all collision reporting for given b2Shapes. The way box2d implements messaging for collisions / contacts / persisted contacts is one of the most expensive operations I ran into for large numbers of objects and large numbers of persisted collisions. None of these are triggered when no listener object exists, but when one does exist there’s no way to filter the events you want to hear from the events that you don’t want to hear and so all of them trigger notifications. The only built-in way I found to avoid sending all those notifications was to use masking and grouping to put objects in non-mutually-collidable groups, but that often meant that objects simply passed through each other, which wasn’t the desired behavior. So I hacked the engine to allow me to more granularly say “yes, you should collide objects A, B, C, X, Y, Z all with each other, but please don’t dispatch any notifications about adding/persisiting/removing contact points among all these objects… but do allow them to register contacts with object F!

    Hopefully this makes some sense.

    By the way, I’d be interested to know how your company’s game is coming along. Would you be so kind as to ping me back when you’ve released it to Android and iOS?

  3. #3 written by James June 12th, 2011 at 12:25

    I’d be more than happy to keep you up to date on Beard & Glory.

    This makes some sense. I’m still pouring through the code to get a grasp on how Box2D actually works under the covers.

  4. #4 written by The Horseman June 12th, 2011 at 15:32

    @ James
    It’s not a small engine. I’m no expert on it, but definitely have more insight than the average person… at least with version 2.0.1. If I recall correctly one of the big changes they made in 2.1 was to the way listeners work so it might not be very instructive to post my solution… but if it’s like it was before then I’d suggest looking for all areas in the box2d source where collision add/persist and such are dispatched and add your logic there. This obviously is only applicable if you want selective collision *reporting* between individual objects, or classes of objects. (eg: It would make no sense in my banner because I do care about all manner of reporting there, and between all kinds of objects, but in the game I made a few months ago it mattered a great deal).