Standardizing and Optimizing the Game Library

All this talk about libraries should have you excited about doing some thinking about coding your confectionary masterpiece of a game (I hope). But before we move on to more exciting code, let's see if we can do some optimizations with the code we have first. Granted, I agree that optimizing code and cleaning up code should take place at the time of actually writing the code. However, there are times when you may discover a bug in your already compiled library code or you may find a better way of writing a particular function that you want to include in your own functions of your library. This section gives you the techniques that you can use to make changes to your current code and update your library with it. You can later apply these techniques to other code that you add to any library that you create yourself or get from other sources.

Okay, let's begin with the code found in `VMODE.C`

and `VMODE.H`

.
We are going to make two minor changes to these files.

Okay, first let's look at the contents of `VMODE.H`

. For the sake
of simplicity, I won't show the entire file (you've already seen it before), instead
I will show the lines that we're interested in. In fact, there's only one line
at the moment, the line that defines the **SetPixel** macro. Right now we have
it defined as so:

#define SetPixel(X, Y, C) videoMem[Y * 320 + X] = C

First off, I will say that the idea for this code did come from Andre LaMothe's book. This in my opinion is the simplest and most straight forward approach to setting pixels in the chosen video mode. Beleive me, I've look at code for diong this from various sources and this one is the best I've seen. Of course, I did make it a little bit better by making it a macro which, as explained earlier, will enhance its execution speed by eliminating the need for a function call. Let's enhance this even further by adding the optimization technique also described in Andre's book. The one that uses shifting instead of multiplying.

#define SetPixel(X, Y, C) videoMem[(Y<<8) + (Y<<6) + X] = C

The code will be screaming fast now, this function works shifting and adding are exactly how microprocessor's work to multiply two integer numbers (floating points are a whole new ball game). Well, if you think about it, it's one of two ways that this operation can be done by a microprocessor. The first technique is a simple one. Add the multiplicand repeatedly for the number of times specified by the value of the multiplier. If the multiplier is zero, let the product be also zero. Here's some pseudocode that illustrates this process:

LET PRODUCT = ZERO WHILE MULTIPLIER NOT = ZERO ADD MULTIPLICAND TO PRODUCT SUBTRACT ONE FROM MULTIPLIER END WHILE

Of course, this obviously works best on positive integers. No guarantees are ever made when done on signed integers, but it seems to work well on 2's compliment integers when the product is the same bit width as the multiplicand and multpilier. This technique has its obvious disadvantage when the multiplier is a large value (the larger the value, the longer it takes to get the product). Of course, one optimization technique would be to test the two numbers and make the multiplier the smaller of the two numbers. This would, of course, reduce the time it takes to actually multiply the numbers. Here's how the process would appear in a C function:

int IntMult_OldStyle(int multiplicand, int multiplier) { int product = 0, tempVal; if ((unsigned)multiplicand < (unsigned)multiplier) { tempVal = multiplicand; multiplicand = multiplier; multiplier = tempVal; } while (multiplier != 0) { product += multiplicand; multiplier--; } return product; }

Since then, newer and better microprocessors emerged and better techniques for multiplying two integers were also discovered. This new technique involves shifting, testing and adding. Even though this technique may seem somewhat advanced, it really isn't that much different than what you or I do in our own multiplication. To see what I mean, here's a sample problem:

125 x 8 = ?

If I were to ask someone to work that out, they'd probably do it like so:

125 x 8 ---- 40 16 8 ---- 1000

Little do they know it, but this technique involved shifting, multiplying and finally adding. Not convinced? Then let's look at this problem again more closely, as we work it out.

First, start off with our two values for multiplying: 125 x 8 ---- Then multiply the 8 by the 5 in 125. 5 x 8 = 40 Shift the 8 to the left one decimal place (to make it 80). Multiply the 80 by the 2 in 125. 2 x 80 = 160 Shift the 80 again one place to the left (to make it 800). Multiply the 800 by the 1 in 125. 1 x 800 = 800 Now add all the products for the final product. 40 160 + 800 ----- 1000 = 125 x 8

Of course, if the microprocessor looked at numbers the way we look at numbers, it would probably be more efficient to just use an add loop like we did before and skip the shifting and adding technique. Of course, we had to multiply along the way because our numbers are in base 10 (decimal) and therefore we have 10 different combinations (0 - 9) for every digit. Since we had to multiply along the way also, it defeated the purpose of shifting and adding. However, a microprocessor looks at numbers in a base 2 (binary) format and each digit only has two different combinations (0 and 1) for each digit. We all know that multiplying by 0 gives us 0 and multiplying by one gives us the original number. So in this case, we need not even do anything if the binary digit (more commonly known as bit) is 0 and if it is one, just simply add what we've got (no multiplying needed). Let's illustrate:

Let's pick two numbers, keep them small for simplicity, how about: 11 x 3 --- 33 A microprocessor sees them like this: 1011_{2}(This is the 11 from above) x 11_{2}(This is the 3 from above) ------ 100001_{2}(This is the 33 from above) Now let's analyze the steps: First, set our product variable to zero. Next, test the last bit (least significant bit) of the 1011_{2}. 1011_{2}^ -- Test this bit. If it is a 1, add the 11_{2}to the product variable. It is a 1, so now our product variable contains 11_{2}. Then shift the 11_{2}one place to the left (to make it 110_{2}). Next, test the next bit to the left of the number 1011_{2}1011_{2}^ --- Test this bit. If it is a 1, add the 110_{2}to the product variable. It is a 1, so now our product variable contains 1001_{2}(110_{2}+ 11_{2}). Then shift the 110_{2}one place to the left (to make it 1100_{2}). Next, test the next bit to the left of the number 1011_{2}1011_{2}^ ---- Test this bit. If it is a 1, add the 1100_{2}to the product variable. It is a 0, so 1100_{2}is not added to the product variable. Then shift the 1100_{2}one place to the left (to make it 11000_{2}). Next, test the next bit to the left of the number 1011_{2}1011_{2}^ ----- Test this bit (this is the most significant bit). If it is a 1, add the 11000_{2}to the product variable. It is a 1, so now our product variable contains 100001_{2}(11000_{2}+ 110_{2}+ 11_{2}). Our final product now contains 100001_{2}. Actually, if integers are 16 bits in length, the product actually contains: 0000000000100001_{2}-- Must account for all bits. We also would have to test all 16 bits, regardless of the fact that only four bits of 1011_{2}were significant.

That's it. Here's a C function that kind of simulates it in code:

int IntMul(int multiplicand, int multiplier) { int testBit = 1; /* Our roving test bit mask */ int product = 0; /* Our final product */ while (testBit) { /* Do until all bits are tested */ if (multiplicand & testBit) /* If test bit is a one */ product += multiplier; /* add to the product */ multiplier <<= 1; /* Shift to the left one digit */ testBit <<= 1; /* Ready next bit for the test */ } return product; }

So how does this apply to our **SetPixel** macro? You see, in the DOS
16 bit based world, all integers are 16 bits, the microprocessor has to test all
16 bits of the muliplicand regardless of how many of them is a one or a zero.
In the case of multiplying by 320, if we were to look at 320 in binary form, we'd
see that only the 6th and the 8th binary digit (from the left) is a one. So
the microprocessor only adds at the 6th and 8th digit, after shifting the multiplier
(in our case, Y) that many places over each time. But, the remaining digits still
have to be tested (a microprocessor can't determine ahead of time how many digits are actually 1).
This waste of time is avoided with the new macro.

In DOS (16 bit) an integer is 16 bits long. So the constant 320 looks like this to the CPU: 0000000100100000_{2}All 16 bits have to be tested, even though only bits 6 and 8 are significant. By shifting the multiplier (Y) to the sixth and eighth place and adding: product = (Y << 8) + (Y << 6) We skip the other 14 tests and save time. Time is of the essence in game code.

Be careful, don't get in the mindset that shifting and adding should replace *all*
multiplication. It certainly cannot be done if both the multiplier and the multiplicand
are a variable (you could not determine beforehand which bits are a one). Also, if
the multiplicand is a constant that contains many many binary ones (like 255 it has 8), then
using the shifting and adding technique produces bulky code which will probably run just as slow
as it would if you just did straight multiplication. If, on the other hand, you are lucky
enough to find yourself multiplying with an integer constant that has only one binary one (all
integers that are a power of two are like this) then you can simply shift and skip the adding.

The constant integer 255 looks like this in binary: 0000000011111111_{2}To shift and add with this number, it would look like this: product = (multiplier << 7) + (multiplier << 6) + (multiplier << 5) + (multiplier << 4) + (multiplier << 3) + (multiplier << 2) + (multiplier << 1) + (multiplier); That's bulky, forget it, just multiply. product = multiplier * 255; A power of 2 has only one binary one, so just shift. product = multiplier * 2; /* 2 is 2^{1}so... */ product = multiplier << 1; /* ...this is equivalent. */ product = multiplier * 16; /* 16 is 2^{4}so... */ product = multiplier << 4; /* ...this is equivalent. */ Here's another tip, you can divide by powers of two by shifting to the right. quotient = dividend / 2; /* 2 is 2^{1}so... */ quotient = dividend >> 1; /* ...this is equivalent. */ quotient = dividend / 16; /* 16 is 2^{4}so... */ quotient = dividend >> 4; /* ...this is equivalent. */

Send your questions, comments, or ideas to:
*garyneal@geocities.com*