Small Area Data Scenario Exercise
Griffith Feeney

Objective To calculate the amount of storage space in megabtyes required to store all population tables published from the most recent census of your (or another) table for every enumeration district. To estimate the cost of purchasing disk storage to store this quantity of data. To consider briefly the question of how this very large number of tables should be organized for convenient access. Most fundamentally, to learn the practicability of producing and archiving quantities of data very much larger than has been the practice of most countries in the past.

Exercise 1 Select an impressionistically 'typical' page of a relevant census publication, scan it into the computer with the PaperPort scanner. OCR (OCR=optical character recognition) it and save to a text file (not to Excel or Word). Note the size of the file in bytes (use and MS-DOS window and the 'dir' command). Count or otherwise estimate the total number of pages in the national level census publication and multiply to figure how many bytes are required for one set of tables. Finally, multiply this by the number of enumeration districts and divide by one million to obtain the number of megabytes required to produce one set of tables for every enumeration district (block, area). How big a hard disk would be required to store all of this data? How many CD-ROMs would be required? How many 100 megabyte 'Zip' disks? How many 1.44 megabyte diskettes?

Exercise 2 Consider how these files might be organized for convenient access, e.g. using administrative unit codes and table numbers embedded into file names. If you can, sketch an organization and access procedure.

<gfeeney@gfeeney.com>
Valid HTML 3.2!