What I Do

Here is a simple introduction to what I do as a high-performance software engineer.

My work generally falls into two categories: high performance and numerical software.

1. High Performance

I design software to execute as quickly as possible on a computer. When software is faster:
  1. The computer is more responsive to the user and can play audio and video without skipping.
  2. Less heat is generated.
  3. The battery charge lasts longer.

1.1 Scheduling Instructions

There are many ways to make software faster. One way is to fit program instructions together like a jigsaw puzzle to make the best use of the processor. Typical computer processors can do several things at the same time, such as load data from memory into registers, add or multiply numbers, compare data, and rearrange bytes in registers. By studying how a processor behaves, we can arrange program instructions to make the best use of the processor.

1.2 Single Instruction Multiple Data

Another tool we use is Single Instruction Multiple Data (SIMD) features. Modern computers may have dozens or hundreds of instructions executing at the same time. Instructions take different amounts of time to execute, especially when some load data from memory outside the processor while others do simple arithmetic inside the processor. However, the processor must produce the same results as if each instruction were completed before the next starts. So it has to keep track of what all the instructions are doing and ensure the results are kept in order. This requires the processor to do a lot of bookkeeping.

That bookkeeping is so complicated it is actually easier for the processor to do the calculations than to do the bookkeeping. So processor designers have created instructions that do four calculations on four sets of data. One load instruction will load four things, or one add instruction will do four adds. We get four times as much work done for the same amount of bookkeeping.

Writing programs for SIMD is complicated. Most programming languages and most algorithms are designed for single-instruction single-data programming. Writing for SIMD often requires rewriting programs. It also requires writing a lot of software for special cases. For example, if you have 103 numbers to process, you could do 100 of them with 25 SIMD instructions, but then you need different instructions to handle the last three.

2. Numerical Software

2.1 Floating-Point Arithmetic

Computers typically represent numbers as either integers or as floating-point numbers. Most arithmetic with integers is exact. When a computer adds, subtracts, or multiplies 7 and 3, it gets an exact answer (10, 4, or 21) with no error. (Division is different; there is an error because the remainder is discarded.) As long as you do not exceed the size limits of the integers in the computer, there is no error.

Floating-point arithmetic is different. With floating-point arithmetic, the computer represents a number with a limited number of bits and a scale factor. The scale factor allows the computer to represent very big or very small numbers. For example, integers can represent 3, 30, 300, or 3,000,000,000, but bigger numbers would cause an overflow and not work. With floating-point numbers, these could be represented as 3×100, 3×101, 3×102, 3×109.

An advantage of floating-point numbers is that they go further than integers. They can also represent 3×1010, 3×1020, 3×1030. Or they can represent very small numbers, like 3×10-10, 3×10-20, 3×10-30. A disadvantage is that they have fewer regular bits, so we cannot always have exact answers. Instead, floating-point numbers are rounded. When we add 1.2345 to 10, there might not be room for the .0005, so the computer has to round and deliver 11.235 as a result instead of the exact answer 11.2345.

Actually, most computers use binary instead of decimal as I show here. However, the consequences are the same: Floating-point arithmetic is inexact.

Because there are errors in floating-point arithmetic, good programmers must prove or estimate how large the errors are. They should show that the errors do not prevent their programs from getting results that are close enough to the exact results. This requires detailed understanding of floating-point arithmetic and some skill in writing mathematical proofs.

2.2 Approximating Functions

Generally, computers can only add, subtract, multiply, and divide. However, programmers want functions like square root, sine, cosine, and logarithm. Fortunately, these functions can be approximated using just addition and multiplication.

Any well-behaved function, like sine(x), can be approximated with a polynomial like c3x3 + c2x2 + c1x + c0. The polynomial only approximates the function for a fixed interval, say for x from -π to +π, and the values for the coefficients, c3, c2, c1, and c0, must be carefully chosen.

Computers can calculate polynomials easily, because they use only addition and multiplication. However, there is an art to finding good polynomials. There is an algorithm that gives the best coefficients for a function, but you have to choose the number of terms in the polynomial and the interval where you make the polynomial fit the function. Using more terms makes the approximation better, but then the polynomial takes longer to compute. Sometimes it is better to break up the interval and use different polynomials for different parts. That is, you would use several polynomials with a few terms instead of one polynomial with many terms.

In the case of sine(x), we can approximate sine with a polynomial for the interval from -π to +π, but the sine function should return an answer for any value of x, not just those in that interval. It would be too difficult to design polynomials for all of the numbers that floating-point numbers can represent. Fortunately, sine(x) is a periodic function. That is, it repeats itself. For any value of x, there is some value in -π to +π that has the same sine. So, to implement a sine function in the computer, we need two things: A polynomial for the interval from -π to +π, and a way to change any x to a value in that interval.

In principle, all we need to do is to subtract multiples of π from x until we have a value in the interval. However, π is an irrational number, so it cannot be represented exactly with floating-point arithmetic. If we try to subtract multiples of π, we will introduce errors into the calculation, and those errors are bigger than we allow. To get the right answer, we design special operations that allow us to subtract π with more precision than usual, and we use some mathematical proofs to figure out how much precision we need to keep the error under certain limits.

© Copyright 2010 by Eric Postpischil.