Ancient castles from a time long past rise from green pastures, spiked turrets piercing the bellies of fat, dark clouds that swirl overhead. Around these castles, the mountains rise and fall like uneven breaths. The country of Romania often looks like something out of a fairytale, a picturesque backdrop printed onto jigsaw puzzles.
In Romania, around 100 or more years ago, a baby was born, the grandson of a rabbi and the son of a kosher baker. This baby, Abraham Wald, grew up educated and knowledgable, only to be turned away from opportunities in the then nation of Austria-Hungary because of his Jewish heritage. He moved to New York City as Word War II began, working with a statistical research group in a department that supported American troops and artillery on the warfront: essentially, he was supposed to defend American war planes.
But of course, each plane could only carry so much armor as defense: with too much armor the planes would become too heavy to fly efficiently, but too little armor made the planes vulnerable to enemy fire. Wald would have to figure out where to pad the planes in the most effective location.
Drawing on his statistical skills, Wald analyzed the data that researchers gave him: according to the data collected from planes that returned, the most frequently hit part of the plane was close to the fuselage; compared to the rest of the body, the engine was the part of the plane that was least damaged.
And so after Wald mulled over the data, they asked him: where should we armor our planes? To this, Wald answered, the engine.
At first, there was confusion: Wald decided to armor the plane that, statistically speaking, got hit the least. He decided to do the opposite of what the data might indicate him to do.
But Wald, unlike the generals and officers and pilots who fought on the warfront, was mathematician. As Ellenberg describes it, “A mathematician is always asking, ‘What assumptions are you making? And are they justified?’”
Wald essentially said: well, yes, given that data and given the research provided to us, the most obvious place to armor the plane would be the fuselage -- because that is the place that’s hit most often. However, this data is to be interpreted one of two ways: either German bullets were somehow consistently missing the plane engines, or there was something larger than this picture provided by the data. Wald asked, where are the rest of the bullets?
An analogy: when you walk into a hospital, there are bulletholes in the wounded: holes in the arms, legs, maybe their sides. But very rarely do you see bulletholes in the head or in the chest. This phenomenon Wald proposed would eventually be called survivorship bias.
If you use only the data of the planes you receive, you assume that these planes -- you assume that this data collected -- offer a random sample of all the planes that are shot. The sample that Wald received, however, was not a random sample. The planes that the researchers showed to Wald were samples only from the planes that returned. The planes that were destroyed were shot in the engines. By taking this into account, Wald concluded that the most logical place to armor the planes would be the engines. Neat, huh? Ellenberg describes many other mathematical discoveries and applications in her book How Not to Be Wrong: the Power of Mathematical Thinking, in addition to the story of Abraham Wald and his missing bullet holes explained here.