Stem and Leaf Plotting

A stem and leaf plot is a special kind of histogram that (1) can be drawn quickly by hand without graph paper and (2) displays not just the shape of a distribution but the values that comprise it. Values are shown to two or three significant figures, a precision appropriate for general use. Hand drawn stem and leaf plots are suited to displaying distributions of up to somewhere between 50 and 100 values.

Consider the numbers 3.39, 2.22, 2.95, 4.88, 5.11, 4.90, 6.77, 5.41, 4.24, 6.25. Rounding to one place after the decimal, the digits to the left of the decimal point become the "stem," the digit after the decimal point (after rounding) the "stem," the digit after the decimal the "leaf," thus:

        2|2 
        3|04 
        4|299
        5|14 
        6|28

The name "stem and leaf" comes from the appearance of the display, which looks like half a leaf, with the stem on the left. Observe that (unlike the familiar histogram) the values many be read off immediately from the display: 2.2, 3.0, 3.4, 4.2, and so on. Observe also that they are displayed in ascending order; the construction of the display has, incidentally, sorted them.

The numbers shown in this display are the total fertility rates for the first 10 (of 40) provinces of Vietnam estimated from the 1989 census, as reported in Table 9.1, page 154, of Vietnam Population Census -1989, Detailed Analysis of Sample Results, General Statistical Office, Hanoi, 1991. Adding the values for the next 10 provinces results in the following stem and leaf for the first 20 provinces:

        2|2
        3|00469
        4|012225699
        5|114
        6|28

While this is a reasonable picture, more detail may be displayed by "stretching" the stem, the stem unit from one to one half a child per woman:

        2|2
        *|
        3|004 
        *|69 
        4|01222 
        *|5699 
        5|114 
        *| 
        6|2
        *|8

For this particular batch of numbers, both plots are reasonable representations. For other distributions, however, the unstretched plot will show too little detail and the stretched plot will be clearly superior.

If stretching by a factor of two doesn't give enough detail, stretching by five probably will. Though further stretching clearly isn't called for by this data, it will serve to illustrate the technique. Note the convenience of the English names of numbers, "t" for two and three, "f" for four and five, "s" for six and seven.

   
        2tfs.3tfs.4tfs.5tfs.6tfs.
        -------------------------
         2   0 469025691 4   2  8 
             0    12  91
               2

The horizontal format shown here is somewhat less convenient to write and type than the vertical format, but is typographically convenient when, as in this case, the vertical format is unduly long. The fixed length fonts (with which stem and leaf plots must be made if typing) are roughly twice as high as they are wide, so that shifting from vertical to horizontal format changes the aspect ratio substantially.

There are a number of variations in stem and leaf for distributions with a few extreme outliers or with ranges extending over multiple powers of ten. We won't encounter such distributions and so won't discuss these variations here.

Stem and leaf plots should generally be checked by counting the number of leaf digits. When plotting more that 20 or so values, it is useful to write the number of leaf digits for each row to the right or left of the row. On checks in general, note Tukey's suggestion that "We want some checks, but not too many, else we spend all our time getting the errors out of the checks."

The median and upper and lower quartiles of a distribution are readily determined from its stem and leaf plot by counting leaf digits in from either end. It is customary to note the median of the distribution with the notation "m42" together with the plot.

Stem and leaf is an exceptionally quick, flexible, and effective way of following Tukey's injunction

NEVER FAIL TO PLOT AND LOOK!

These notes focus on mechanics of making stem and leaf plots, but the point of making them is to learn about the distribution. Is it unimodal? Is it skewed? How skewed? Are their outliers? How many, how pronounced, how extreme? Are their clusters? Are their gaps? Having made a stem and leaf, we should study it closely, asking first, "What do we see?" and then "What does it (or might it) mean?"

<gfeeney@gfeeney.com>
Valid HTML 3.2!