Notes on the Modified Brass Variant of Generalized Growth Balance Griffith Feeney 1999-11-09 "Generalized" growth balance refers to growth balance that takes of account of, by estimating, differential completeness of enumeration in the two censuses involved. Brass originally presented a generalized growth balance method in the Asian and Pacific Census Forum in 1979. Trussell and Menken pointed out some problems, and Hill later published a variant of generalized growth balance that overcame these problems. The original Brass formulation is much simpler than the Hill formulation, however, and the problems pointed out by Trussell and Menken turn out to be both minor and easily overcome. The original Brass formulation does have the disadvantage, however, of requiring censuses exactly 10 years apart. This assumption is easily dropped, however, and this is the modification referred to in the title of this note. The file ggbb.thailand.female.xls implements the method, which is described briefly and informally here. Brass' original formulation involves a bit of slight of hand. The death registration death rates for age x and over are calculated directly from intercensal deaths over age x and average intercensal population over age x. The census derived death rates are calculated by subtracting population over age x+5 at the second census from population over age x-5 at the first census and--the censuses being (very nearly, for Thailand) 10 years apart--calling this an estimate of intercensal deaths over age x. This of course restricts the method to censuses ten years apart. Brass also skips over the rationale for the statement that delta_y, the error in the census-derived death rates, can be assumed constant if there is a constant differential enumeration, "Exact proportionality would give a constant absolute error in the calculated death rate." and preceding text. This isn't as obvious as he makes it sound, and Trussell and Menken got very exercised over it in their note on this method. To make the proposition plausible, suppose for the moment that the census death rate were calculated as [P_x - P'_(x+n)]/P_x, where ' denotes the second census and these are true rather than observed numbers. If the first census needs to be adjusted by a multiplicative factor k, what we will calculate is not the preceding, but [kP_x - P'_(x+n)]/kP_x. The difference between these two expressions simplifies to (1-k). The same is true if we put the second census number in the denominator and apply the k factor to the second census numbers. This gives the constant absolute error Brass assumes, but neither of these expressions is quite what we want, and the Trussell-Menken note details how things go slightly wrong when we calculate what we want--and of course we want something directly comparable to the death registration calculation. Despite this glitch, I prefer Bill's method because the logic is transparent, if apparently slightly flawed, whereas Ken's method is wholly obscure and requires torturous algebra. Moreover, I had the idea when last looking at this (post UN manual work, when mulling write-up of Zimbabwe work for UNAIDS following Reference group meeting in June 1999) that one could apply the method once, estimate differential census underenumeration, correct for it, and then rerun the method on the corrected age distributions, iterating this process if necessary until convergence occurs. Proving convergence might be difficult (or impossible), but a single iteration suffices on the Zimbabwe data, which suggests to me that this modification of Bill's method will generally work fine in practice. It would be nice to have a sense of why it works. What ought to be the case, it seems to me, is that the error incurred by the deviation from exact proportionality when census derived deaths are calculated the way we want should be small when differential under enumeration is small, so that if we get fairly close, we can make it home free. No time to explore this mathematically now, however. The ggbb spreadsheet begins with two columns for the two input census age distributions. These are followed by two columns for adjusted age distributions, with a place for entering adjustment factors in the column heading. Column (5) is for the input intercensal deaths. Columns (7), (8) and (10) cumulate the *adjusted* input age distributions and deaths to give numbers over age x. Column (9) estimates numbers of persons reaching exact age x during the intercensal period from the input age distributions (using Ken Hill's approximation). This is used, together with the numbers over x information, to estimate deaths from age distribution in column (11). Column (12) gives intercensal PYL over age x computed from the *adjusted* cumulative age distributions. Death rates from census are given in column (12) and are deaths divided by PYL. This calculation of intercensal death rates differs from Bill's, which breaks down if censuses are not 10 years apart. It complicates the formulas for error in the census-derived death rates when there is differential census enumeration completeness, but in practice we use an iterative procedure that makes this largely irrelevant. The straight line fitting method is detailed in the adult mortality estimation manual, but since I had forgotten some details, and have slightly revised the spreadsheet, I'll go through it here. It begins with the Tukey device of calculating the medians of the x and y values of the lower third of points and the upper third of points. The estimate of slope calculated from these points. To estimate intercept we calculate one intercept for each data point, namely y-bx, where b is the slope. The estimated intercept is the median of these points. To get an error indicator for the estimated intercept we compute the interquartile range of the y-bx values (this would be zero if the points lined up exactly). To get an error indicator for the estimated slope we compute one slope for each data point as (y-a)/x, where a is the estimated intercept. The median of these points will be very close to (but not necessary equal to) the estimated slope. We use the interquartile range of these points as an error indicator for the slope. To use the spreadsheet to implement the method we enter the country, sex and dates, the two census age distributions, and intercensal deaths. We then enter 1 for the adjustment factors for both age distributions and look at the estimated intercept (and also at the plots!). A positive intercept means that the census death rates are too high, and this means that the second census is overenumerated relative to the first. We subtract the intercept (which will be positive) from one to get an adjustment factor (which will be less than one) to apply to the second input age distribution. We enter this factor into the appropriate position above the adjusted age distribution column. The spreadsheet automatically recalculates, and we will (should) see a new intercept very close to zero. If it isn't zero to four places after the decimal, we fiddle until we get it there (this seems always to be possible in practice).