Tuesday, March 31, 2009

Speed with a Catch

A while back, I wrote a post about surface normals in OpenGL ES. Yesterday on Twitter, there was some discussion about using the inverse square root function from Quake 3 to speed up the performance of iPhone OpenGL ES applications. Here is what that method looks like (converted to using GL and iPhone data types):

static inline GLfloat InvSqrt(GLfloat x)
{
GLfloat xhalf = 0.5f * x;
int i = *(int*)&x; // store floating-point bits in integer
i = 0x5f3759d5 - (i >> 1); // initial guess for Newton's method
x = *(GLfloat*)&i; // convert new bits into float
x = x*(1.5f - xhalf*x*x); // One round of Newton's method
return x;
}

The inverse square root can be used in several ways. Noel Llopis of Snappy Touch pointed out two uses for it on Twitter yesterday: calculating normals and doing spherical UV texture mapping. I'm still trying to wrap my head around the UV Texture Mapping, but I understand normals pretty well at this point, so I though I'd see what kind of performance gains I could get using this old optimization. There's all sorts of arguments around the intertubes about whether this function still gives performance gains, but there's an easy way to find out: use it and measure with Shark.

I used my Wavefront OBJ Loader as a test, and profiled the loading of the most complex of the three objects - the airplane. The first run was using my original code, which stupidly1 used sqrt(). I then re-ran it using sqrtf(), and then again using the Quake3D InvSqrt() function above.

The results were impressive, and you definitely do get a performance increase from using this decade-old function on the iPhone. Using InvSqrt() gave a 15% decrease in time spent calculating surface normals over using sqrtf() and a 40% decrease over calculating with sqrt(). That's not an amount to be sneezed at, especially in situations where you need to calculate normals on the fly many times a second.

Now, if you remember, this was how we calculated normals using the square root function from Math.h:

static inline GLfloat Vector3DMagnitude(Vector3D vector)
{
return sqrt((vector.x * vector.x) + (vector.y * vector.y) + (vector.z * vector.z));
}

static inline void Vector3DNormalize(Vector3D *vector)
{
GLfloat vecMag = Vector3DMagnitude(*vector);
if ( vecMag == 0.0 )
{
vector->x = 1.0;
vector->y = 0.0;
vector->z = 0.0;
}

vector->x /= vecMag;
vector->y /= vecMag;
vector->z /= vecMag;
}

So... how can we tweak this to use inverse square root? Well, the inverse square root of a number is simply 1 divided by the square root of that number. In Vector3DNormalize(), we divide each of the components of the vector (x,y,and z) by the magnitude of the vector, which is calculated using square root. Since dividing a value by a number is the same as multiplying by 1 divided by that same number, so, we can just multiply each component by the inverse magnitude instead, like so:

static inline GLfloat Vector3DFastInverseMagnitude(Vector3D vector)
{
return InvSqrt((vector.x * vector.x) + (vector.y * vector.y) + (vector.z * vector.z));
}

static inline void Vector3DFastNormalize(Vector3D *vector)
{
GLfloat vecInverseMag = Vector3DFastInverseMagnitude(*vector);
if (vecInverseMag == 0.0)
{
vector->x = 1.0;
vector->y = 0.0;
vector->z = 0.0;
}

vector->x *= vecInverseMag;
vector->y *= vecInverseMag;
vector->z *= vecInverseMag;
}


Sweet, right? If we now use Vector3DFastNormalize() instead of Vector3DNormalize(), and each call will be about 15% faster on current generations of the iPhone and iPod Touch compared to using the built-in square root function.

But… there's a catch. Actually, two catches.

The Catches


The first catch is that this optimization doesn't work faster on all hardware. In fact, on some hardware, it is measurably slower than using sqrtf(). That means you're gambling that future hardware will also benefit from this same optimization. Not a huge deal and very possibly a safe bet, but you should be aware of it, and be prepared to back it out quickly should Apple release a new generation of iPhones and iPod Touches that use a different processor.

The second, and far more important catch is the possible legal ramifications of using this code. You see, Id released Quake3D's source code under the GNU Public License, which is a viral license. If you use source code from a GPL project, you have to open source your entire project under the GPL as well. Now, that's an oversimplification, and there are ways around the GPL, but as a general rule, if you use GPL'd code, you have to make your code GPL also.

But, the waters are a little murky. John Carmack has admitted that he didn't write that function, and doesn't think the other programmers at Id did either. The actual author of the code is unknown. Some of the contributors to the function have been found, but not the original author. That means the code MIGHT be in the public domain. If that's the case, its inclusion in a GPL application doesn't take it out of the public domain.

So, bottom line: is it safe to use? Probably. This function is widely known and widely used and there's been no indication that any possible rights owner has any interest in chasing down every use of this function. Are there any guarantees? Nope.

My recommendation is to use it, but make sure every place you use it, have a backup method that you can fallback on if you need to. If you want some assurance, you could try contacting Id legal and getting a waiver to use that function. I don't know if they'll respond, or if they'll grant it, but the folks at Id have always struck me as good people, so it might be worth an inquiry if you're risk averse.

1 - sqrt() is a double-precision function. Since OpenGL ES doesn't support the GLDouble datatype, which means I was doing the calculation on twice as many bits as needed, and converting back and forth from single to double precision then back again.

No comments:

Post a Comment