Bug in calibrating the giant laser that shoots at satellites

Summary

Satellites that detect nuclear explosions were calibrated by shooting a giant laser at them from a known location. The code to calculate the correction factor had a bug in it that no one had been able to fix for years, so they used a separate program to correct the correction factor. I traced the problem to an erroneous coordinate transformation matrix and fixed the bug.

Details

This is the story of how I fixed a long-standing bug in one of the U.S.’s satellite systems for detecting nuclear explosions, the ARDU (Advanced Radiation detection capability Data Unit). The ARDU system processed data from sensors on 24 GPS satellites that detected various particles (photons, protons, etc.) and decided whether they came from a nuclear explosion. (Note that the sun is a continuous thermonuclear explosion, so the ARDU had a lot of code to track where the Sun and its reflection off the ocean were to avoid accidentally starting World War III several times a day.)

The software for the ARDU ground data-processing computers was written in Ada 95 by scientists in the early 90’s. Some of the code looked like assembly, some like a series of logic truth tables, and much of it had zero comments. Everyone who had originally written the software had moved on, leaving the maintenance team a collection of videotaped lectures and a binder of handwritten (!!!) design documents. The ARDU had been running for 5 years when I joined the software maintenance team in 1999, straight out of university with a double B.S. in computer science and, fortuitously, mathematics.

So what was the bug? One of the satellite sensors detected the location of bright lights on the surface of the earth. It needed to be calibrated by being shot at by a giant laser (!!!) in the mountains near Albuquerque. The sensor on the satellite would detect the laser light and send raw sensor data back to the computer on the ground, which calculated where the satellite thought the location of the laser was and compared that to its known location. Then the difference between the two locations would be used to correct all the other output from that sensor.

But there was a bug in the code that calculated the location of the giant laser. The code always returned a location that was way, way off the real location of the giant laser… but in a predictable way. No one could figure out where the bug was in the ARDU code. So one of the programmers wrote a separate FORTRAN program that calculated the location correctly from the raw sensor data. It had to be run by hand and was super annoying when I was helping with Y2K testing. I decided I was going to fix this weird calibration bug as soon as I got my clearance.

First, how did the light sensor work? To get the coordinates of the giant laser shooting at it, the satellite recorded where the light hit a spinning disc inside the satellite, which was turned into an [x,y] coordinate pair. The [x,y] coordinates, the position of the spinning disc, and the satellite’s position and orientation defined a vector which was intersected with the surface of the earth to get the latitude and longitude of the giant laser.

I noticed that the output of the buggy code seemed as if it had at some point accidentally flipped an [x,y] pair across a diagonal line instead of rotating the [x,y] pair around the origin. This kind of flip could happen if the code accidentally applied a reflection matrix instead of a rotation matrix, which look very similar.

Rotation and reflection matrices are kinds of transformation matrices, which are how you convert between different coordinate systems. To get from the spinning disc to the earth’s surface, there were, like, eleventy transformation matrices in the code, and they all looked the same to me. So I wrote out all the transformation matrices from the ARDU code on the whiteboard in the classified vault and laboriously worked my way through what each one did by hand.

Eventually I found the bug! One of the transformation matrices had sin() and cos() swapped. The result was some wonky kind of reflection instead of what it should have been, a rotation.

I fixed the transformation matrix code so it did the rotation correctly and it fixed the bug! No more running the raw sensor data by hand through a random FORTRAN program.

More info:

Like this story? Read more stories about solving systems problems.