FSD - EtusivuFSD neWWWs


Front page

Number 7 (1/2002)

Handling of quantitative data

Ari Eronen

The Finnish Social Science Data Archive receives various kinds of quantitative research data. Before the data can be archived and disseminated for secondary use, a few verifications, and if necessary, additions and amendments are needed.

Checking the data

The Archive first makes a copy of the original data. The original data file is archived as such without any modifications. The copy, on the other hand, is used to produce a version fit for secondary use. The objective of the possible verifications, alterations and remedies is to make the content of this version correspond as closely as possible to that of the original study. Documents received from the depositor, such as the questionnaire and the original observation matrix, play a key role in this work.

If necessary, verifications, amendments and additions are made. Variables are re-named. Variable labels and value labels are constructed on the basis of the questionnaire. Possible mistakes are corrected. All these procedures are carried out in accordance with a consistent practice.

Dropping, keeping and adding variables

The objective of the data handling procedure is to make sure that the content of the archived data corresponds as closely as possible to that of the original. In order to fulfil this goal, variables may have to be dropped or added. All alterations are carefully documented to the syntax of SPSS, a special program designed for checking and editing survey data.

A variable is dropped if it cannot be recognised or data security aspects so require. Constructed variables, such as combined variables and sum variables, are also usually dropped. However, those constructed variables that are somehow integral to the usability of the data, especially the weighted variables, are kept - providing that the documentation the depositor has delivered is explicit enough. New variables are added only if usability so requires.

Filter variables

Questionnaires often include questions that are directed only at respondents who meet certain requirements. The Archive checks these filter conditions. If the data includes answers from people who do not belong to the specified target group, the answers are classified as missing data.

Variables and data protection

Data protection aspects require that personal data are deleted. It is recommended that the depositor deletes these types of data (names, addresses, birthdates etc.) before delivering the material to the Archive.

Variables indicating places of residence and business are also problematic. On one hand there is always a risk that a single respondent might be identified, on the other hand the deletion of these types of variables prevents secondary users from conducting regional comparisons, especially if no other regional variables are used.

Usually these types of variables are deleted. If necessary, they can, however, be restored. Variables of larger regional units (provinces, districts) are kept.

The secondary user is liable for data security. By signing the Agreement on material use conditions, the user commits to protect the privacy of the respondents.

Finally the dataset is described, and the description is transferred to FSD's databases.

Top of page