Standardizing and Optimizing the Game Library

Okay, let's begin with the code found in `VMODE.C` and `VMODE.H`.  We are going to make two minor changes to these files.

Okay, first let's look at the contents of `VMODE.H`.  For the sake of simplicity, I won't show the entire file (you've already seen it before), instead I will show the lines that we're interested in.  In fact, there's only one line at the moment, the line that defines the SetPixel macro.  Right now we have it defined as so:

```     #define SetPixel(X, Y, C) videoMem[Y * 320 + X] = C
```

First off, I will say that the idea for this code did come from Andre LaMothe's book.  This in my opinion is the simplest and most straight forward approach to setting pixels in the chosen video mode.  Beleive me, I've look at code for diong this from various sources and this one is the best I've seen.  Of course, I did make it a little bit better by making it a macro which, as explained earlier, will enhance its execution speed by eliminating the need for a function call.  Let's enhance this even further by adding the optimization technique also described in Andre's book.  The one that uses shifting instead of multiplying.

```     #define SetPixel(X, Y, C) videoMem[(Y<<8) + (Y<<6) + X] = C
```

The code will be screaming fast now, this function works shifting and adding are exactly how microprocessor's work to multiply two integer numbers (floating points are a whole new ball game).  Well, if you think about it, it's one of two ways that this operation can be done by a microprocessor.  The first technique is a simple one.  Add the multiplicand repeatedly for the number of times specified by the value of the multiplier.  If the multiplier is zero, let the product be also zero.  Here's some pseudocode that illustrates this process:

```     LET PRODUCT = ZERO

WHILE MULTIPLIER NOT = ZERO
SUBTRACT ONE FROM MULTIPLIER
END WHILE
```

Of course, this obviously works best on positive integers.  No guarantees are ever made when done on signed integers, but it seems to work well on 2's compliment integers when the product is the same bit width as the multiplicand and multpilier.  This technique has its obvious disadvantage when the multiplier is a large value (the larger the value, the longer it takes to get the product).  Of course, one optimization technique would be to test the two numbers and make the multiplier the smaller of the two numbers.  This would, of course, reduce the time it takes to actually multiply the numbers.  Here's how the process would appear in a C function:

```     int IntMult_OldStyle(int multiplicand, int multiplier)
{
int product = 0, tempVal;

if ((unsigned)multiplicand < (unsigned)multiplier) {
tempVal = multiplicand;
multiplicand = multiplier;
multiplier = tempVal;
}

while (multiplier != 0) {
product += multiplicand;
multiplier--;
}

return product;
}
```

Since then, newer and better microprocessors emerged and better techniques for multiplying two integers were also discovered.  This new technique involves shifting, testing and adding.  Even though this technique may seem somewhat advanced, it really isn't that much different than what you or I do in our own multiplication.  To see what I mean, here's a sample problem:

```     125 x 8 = ?
```

If I were to ask someone to work that out, they'd probably do it like so:

```      125
x  8
----
40
16
8
----
1000
```

Little do they know it, but this technique involved shifting, multiplying and finally adding.  Not convinced?  Then let's look at this problem again more closely, as we work it out.

```     First, start off with our two values for multiplying:

125
x  8
----

Then multiply the 8 by the 5 in 125.

5 x 8 = 40

Shift the 8 to the left one decimal place (to make it 80).
Multiply the 80 by the 2 in 125.

2 x 80 = 160

Shift the 80 again one place to the left (to make it 800).
Multiply the 800 by the 1 in 125.

1 x 800 = 800

Now add all the products for the final product.

40
160
+ 800
-----
1000 = 125 x 8
```

Of course, if the microprocessor looked at numbers the way we look at numbers, it would probably be more efficient to just use an add loop like we did before and skip the shifting and adding technique.  Of course, we had to multiply along the way because our numbers are in base 10 (decimal) and therefore we have 10 different combinations (0 - 9) for every digit.  Since we had to multiply along the way also, it defeated the purpose of shifting and adding.  However, a microprocessor looks at numbers in a base 2 (binary) format and each digit only has two different combinations (0 and 1) for each digit.  We all know that multiplying by 0 gives us 0 and multiplying by one gives us the original number.  So in this case, we need not even do anything if the binary digit (more commonly known as bit) is 0 and if it is one, just simply add what we've got (no multiplying needed).  Let's illustrate:

```     Let's pick two numbers, keep them small for simplicity, how about:

11
x 3
---
33

A microprocessor sees them like this:

10112 (This is the 11 from above)
x   112 (This is the  3 from above)
------
1000012 (This is the 33 from above)

Now let's analyze the steps:

First, set our product variable to zero.

Next, test the last bit (least significant bit) of the 10112.

10112
^ -- Test this bit.
If it is a 1, add the 112 to the product variable.
It is a 1, so now our product variable contains 112.

Then shift the 112 one place to the left (to make it 1102).

Next, test the next bit to the left of the number 10112

10112
^ --- Test this bit.
If it is a 1, add the 1102 to the product variable.
It is a 1, so now our product variable contains 10012 (1102 + 112).

Then shift the 1102 one place to the left (to make it 11002).

Next, test the next bit to the left of the number 10112

10112
^ ---- Test this bit.
If it is a 1, add the 11002 to the product variable.
It is a 0, so 11002 is not added to the product variable.

Then shift the 11002 one place to the left (to make it 110002).

Next, test the next bit to the left of the number 10112

10112
^ ----- Test this bit (this is the most significant bit).
If it is a 1, add the 110002 to the product variable.
It is a 1, so now our product variable contains 1000012 (110002 + 1102 + 112).

Our final product now contains 1000012.

Actually, if integers are 16 bits in length, the product actually contains:

00000000001000012 -- Must account for all bits.

We also would have to test all 16 bits, regardless of the fact that only four
bits of 10112 were significant.
```

That's it.  Here's a C function that kind of simulates it in code:

```     int IntMul(int multiplicand, int multiplier)
{
int testBit = 1;  /* Our roving test bit mask */
int product = 0;  /* Our final product */

while (testBit) { /* Do until all bits are tested */
if (multiplicand & testBit)  /* If test bit is a one */
product += multiplier;   /* add to the product */

multiplier <<= 1;            /* Shift to the left one digit */
testBit    <<= 1;            /* Ready next bit for the test */
}
return product;
}
```

So how does this apply to our SetPixel macro?  You see, in the DOS 16 bit based world, all integers are 16 bits, the microprocessor has to test all 16 bits of the muliplicand regardless of how many of them is a one or a zero.  In the case of multiplying by 320, if we were to look at 320 in binary form, we'd see that only the 6th and the 8th binary digit (from the left) is a one.  So the microprocessor only adds at the 6th and 8th digit, after shifting the multiplier (in our case, Y) that many places over each time.  But, the remaining digits still have to be tested (a microprocessor can't determine ahead of time how many digits are actually 1).  This waste of time is avoided with the new macro.

```     In DOS (16 bit) an integer is 16 bits long.

So the constant 320 looks like this to the CPU:

00000001001000002

All 16 bits have to be tested, even though only bits 6 and 8 are significant.

By shifting the multiplier (Y) to the sixth and eighth place and adding:

product = (Y << 8) + (Y << 6)

We skip the other 14 tests and save time.  Time is of the essence in game code.
```

Be careful, don't get in the mindset that shifting and adding should replace all multiplication.  It certainly cannot be done if both the multiplier and the multiplicand are a variable (you could not determine beforehand which bits are a one).  Also, if the multiplicand is a constant that contains many many binary ones (like 255 it has 8), then using the shifting and adding technique produces bulky code which will probably run just as slow as it would if you just did straight multiplication.  If, on the other hand, you are lucky enough to find yourself multiplying with an integer constant that has only one binary one (all integers that are a power of two are like this) then you can simply shift and skip the adding.

```     The constant integer 255 looks like this in binary:

00000000111111112

To shift and add with this number, it would look like this:

product = (multiplier << 7) + (multiplier << 6) +
(multiplier << 5) + (multiplier << 4) +
(multiplier << 3) + (multiplier << 2) +
(multiplier << 1) + (multiplier);

That's bulky, forget it, just multiply.

product = multiplier * 255;

A power of 2 has only one binary one, so just shift.

product = multiplier * 2;  /* 2 is 21 so... */
product = multiplier << 1;  /* ...this is equivalent. */

product = multiplier * 16;  /* 16 is 24 so... */
product = multiplier << 4;  /* ...this is equivalent. */

Here's another tip, you can divide by powers of two by shifting to the right.

quotient = dividend / 2;  /* 2 is 21 so... */
quotient = dividend >> 1;  /* ...this is equivalent. */

quotient = dividend / 16;  /* 16 is 24 so... */
quotient = dividend >> 4;  /* ...this is equivalent. */
```