Census Household Data Matching Exercise
Griffith FeeneyObjective To develop an appreciation of the potential of the population census for providing data on households as opposed to individuals. To teach the idea of using information on individuals in a household to infer relationships between household members, beyond what is directly given by the data. To suggest the many possible ways of using individual level information to define variables for households. To teach the importance of scrutinizing suitably formated listings of individual level census data.
Exercise 1 For the most recent census in your (or another) country identify all tables (don't worry about 'all' if there are many) in which the unit tabulation is 'household' rather than 'person'. If you find no tables for the country you are using, find another country that does have one or more household tabulations. Then consider what additional tables might be incorporated in your 2000 round census report. One possible focus for this exercise might be economic characteristics of households.
Exercise 2 Browse the household data listing provided, hhlist.txt. Note that this is a large file, many more records than you will want to examine individually, so that each participant can find different cases. How many different kinds of households can you find? How frequent are complete nuclear households (husband plus wife plus one or more children)? How frequent are single person households? How frequent are two generation households? (These questions are meant to be answered impressionistically; you are not expected to tabulate
Exercise 3 Find half a dozen relatively complex and interesting households in the listing and see how much you can infer about family relationships in these households from the information shown. Write the households up case by case, indicating the inferences and noting which inferences are certain (barring errors in the data), which are only probable, and suggest the level of probability.
Exercise 4 Read the notes provided on Matching Children to Mothers in Household Data and identify all (mother,child) pairs in half a dozen housholds, including at least one household with three or more generations.
Note Scrutinizing lists of actual census records is a very useful exercise in many ways. It seems not to be utilized as often as it should, even where unit record data are available, as they are on a sample basis for many countries.
Of course one can examine only a tiny fraction of the available records, but this does not matter if the purpose is to obtain ideas that will latter develop into analyses in which all records are processed. One can also produce highly select lists, e.g. only households containing at least one person with a relation to head code of other than 'head', 'spouse', or 'child'. Random samplings are appropriate, though by no means essential.
It is essential, however, to insist that programmers produce readable listings and not force you to look at a headache-inducing jumble of numerical codes; which is likely to be 'default' behavior because it is simplest to program.
Numerical codes should be translated into simple mnemonics (see those used in the supplied listing for marital status, for example), listings should provide only a selection of the information on the records (use different selections for different purposes), and should be formatted with line breaks between households and spaces between information from individual records.
Such listings are very easy to program with appropriate tools, of which the programming language Perl is surely one of the best. Perl is also freely available for just about every computing platforum; see the links in the Syllabus.
<gfeeney@gfeeney.com>
![]()