Digital Archiving and Publication
India Case Study

Griffith Feeney
Draft 1998-06-15

The Registrar-General's Office of India made data from the 1991 census available on a set of diskettes. This case study examines the content and format of this data and the various difficulties encountered in using it.

At the time of the 1991 census India consisted for census purposes of 30 (?) states and (? about 600) districts. Some tables are available for the whole country only (?), others for every state, others for every district. (are district tables available for states as well--without summing?) A large volume of data is provided, nearly 100 megabytes in uncompressed form.

The format of the data is roughly as follows. Every table is provided as a separated .wk1 file. The state or district represented is encoded in the file name and is also contained in the file. Groups of files are compressed, generally by state, in zip format.

This format is convenient if one wants to view individual census tables. It is not convenient for looking at particular information for all states or districts because every individual file must be opened in a spreadsheet program and the relevant portions of each file copied to a new spreadsheet file. This is tedious for states and very tedious indeed for districts.

These problems can be overcome by using two kinds of computer programs, one to open and save the .wk1 files in space-delimited text format (.prn in Microsoft applications, generally), and a class of programs to read through all text files of a given type in a given directory, extract specified data, and write it to an output file in which rows are states or districts and columns are desired data items from the source tables.

Tools Perl is available for download at www.perl.com, more specifically at ww.perl.com/latest.html. You can choose either the ActiveState or Gurusamy Sarathy's version. The GNUwin32 project provides DOS verions of standard unix tools, available from www.cygnus.com, more specifically at www.cygnus.com/misc/gnu-win32/.

Directory Listing

<gfeeney@gfeeney.com>
Valid HTML 3.2!