The mystery of Tufte’s stem-and-leaf plot
In his books The Visual Display of Quantitative Information (page 140) and Envisioning Information (page 46) Edward Tufte shows a stem-and-leaf plot of volcanos’ heights:
As Tufte says, this plot “constructs the distribution of a variable with numbers themselves”. Numbers to the left of the stem represent thousands feet. To the right, hundreds, for each volcano. For example, 17 | 92 represents two volcanos of heights 17 900 and 17 200 (rounded).
This diagram has confused a lot of designers.
Usually, in a stem-and-leaf plot the values to the right of the stem are ordered. In the Tufte’s example, however, they don’t look so:
13 | 47830
On his own website, Tufte answers a question about this oddity, emphasis mine:
In the late 1960s John Tukey made the stem-and-leaf graphic by hand from an almanac that showed the volcano heights listed probably alphabetically. And so after finding the range of the data, and settling on intervals, John simply wrote down the next signficant digit on the leaves. Nowadays we would ask to the computer to sort the leaves in each bin in order.
However this answer is doubtful. The order of digits does not look completely random. Notice these descending ranges:
This diagram is also shown in another book, William S. Cleveland’s “The Collected Works of John W. Tukey”, as found by my colleague Vadim Yumadilov:
Here we see the full name of the almanac from which Tukey took his data: The World Almanac, 1966. Even the pages numbers are given. Unfortunately, I wasn’t able to find it online. It’s available for sale on Amazon, but I did not want to buy it, pay for shipping to Russia, wait for a month for it to arrive.
Luckily, I’ve found a couple of these almanacs on sale on Ebay and Abebooks. So I wrote to the sellers and asked them to send me the pictures of the pages 282 and 283. Soon, I got a response from Barbie Berquist:
I have attached 2 pictures of the pages on the volcanoes. I hope they help.
Yes they do.
So in the almanac the volcanos are not, in fact, ordered alphabetically. They are ordered by height, but also grouped by region. Hence the descending ranges in the Tukey’s diagram.
Here are, for instance, the volcanos corresponding to the line 13 | 47830 — in Cameroon, Hawaii, and three in Guamemala:
Mystery solved. But I am still confused by lack of curiosity by Tufte. How could he ignore the oddity twice in the books and then content himself with “probably alphabetically” in the forum?