Records Management and Archives Formation Plan of FSD FSD Operational Guidelines
This part of the Plan first describes how current legislation is taken into account in data management and what the Finnish Social Science Data Archive's (FSD) data acquisition and selection principles are. Next, it reviews FSD's work process as well the documents and document series that are processed and produced at different stages of this process. Finally, it describes the Archive's main information system (Tiipii), data security and privacy measures and practices.
- 1. Compliance with legislation
- 2. Data acquisition and selection criteria
- 3. FSD's work process
- 4. Website
- 5. Information system and data security
- 6. Collection thus far and anticipated accumulation
- 7. Continuity plan
1. Compliance with legislation
Universities Act (558/2009), Archives Act (831/1994) and the Act on the Openness of Government Activities (621/1999)
FSD is organisationally part of the Tampere University but has a national-level service function. The Universities Act (558/2009) guarantees the autonomy of Finnish universities. According to the Act, universities themselves, and not the administrative authorities of the State, have the right to make decisions in matters belonging to internal administration of the university. In the Government Proposal on the Universities Act, it was stated that appropriate preservation of research data should be safeguarded (Government Proposal 7/2009). The most important activity of FSD is to document, catalogue, maintain the usability of, and preserve digital research data that are used as sources for scientific research. The National Archives of Finland does not have the authority to issue orders to universities on cataloguing and registering of research data. However, by creating a Records Management and Archives Formation Plan, FSD aims to follow good practices in data management in accordance with the Archives Act and the Act on the Openness of Government Activities. FSD strives to manage records effectively in all its activities and its Advisory Board has a representative from the National Archives.
To succeed in its activities, the Data Archive must carefully plan all stages of archiving research data. The Archives Formation Plan is a key document for the archival work. The Plan is updated annually and is published on the FSD website. The structure of the Plan is based on archival processes and work flows.
Copyright Act (404/1961)
The original creators of data retain their ownership, copyright and associated intellectual property rights in all deposited material. From the original digital dataset deposited at the Data Archive, FSD produces a processed version for long-term storage and disseminates this version for reuse according to the access conditions set in the deposit agreement. As stipulated in deposit agreements, the Data Archive has the right to process archived data according to established data protection, data security and long-term preservation practices.
The moral rights of the author to research data are honoured in accordance with normal scientific citation practices. By agreeing to the Terms and Conditions for Data Use, reusers of data commit themselves to citing the data used and specifying the author(s) of such data in any texts based on the data.
The Finnish Copyright Act does not recognize the so-called copyright exception for research. An exception for research would allow archiving and disseminating material collected by researchers for research purposes but created by others for other purposes without a separate licence agreement or permission from the author. FSD and Finnish copyright society Kopiosto signed an agreement in 2015 that allows digital works, collected for research, in the fields represented by Kopiosto member organisations to be archived at FSD and disseminated for research purposes (for instance, magazine articles, photographs, illustrations and comics). Audiovisual works and musical compositions are not covered by the agreement.
When research data contains material created by research participants which is subject to copyright (for instance, photos taken by participants), the researcher should agree on the transfer of rights with the participants before depositing the data for archiving.
Personal Data Act (523/1999)
FSD's right to process research data containing identifiers is primarily based on the assignment given by a researcher or research group to FSD to anonymise the data (Personal Data Act Section 8, subsection 7).
In some rare cases, a special provision of the Personal Data Act may be applied to archived research data. This provision covers purposes of journalism or artistic or literary expression (Personal Data Act Section 2, subsection 5). The special provision is primarily applicable to data that are also subject to copyright. When the provision is applied, personal information in the data is minimised but not fully anonymised. Personal details of research participants and other identifiers not necessary for understanding the contents of the data are removed. In addition, information about third parties may be removed, even though this means altering the original work.
FSD removes all direct identifiers from data processed for reuse. Indirect identifiers are removed or changed wherever necessary. Direct identifiers are deleted after the data have been anonymised and the reusability and functionality of the processed dataset have been checked (Personal Data Act, Section 34). Anonymisation measures are described in more detail in the chapters 3.2.1. and 3.2.2. of this document.
If anonymisation is considered to significantly reduce the usability of the data, researchers are instructed to either dispose of the data or to apply for permission from the National Archive to archive such material (Personal Data Act, Section 35, subsection 2). FSD provides help with the application if needed.
Research data are processed at the Data Archive lawfully and carefully, in compliance with the Personal Data Act, Section 5. Data are stored in secure manner on the servers of FSD, and access to data that contain personal information is strictly restricted (Personal Data Act, Section 32). FSD staff who process data are bound by non-disclosure obligation (Personal Data Act, Section 33).
Confidentiality is also emphasised in the Terms and Conditions for the Use of Data, which data reusers must commit themselves to before gaining access. Reusers must pledge themselves not to endanger the privacy of individuals or organisations connected to the data. Moreover, reusers must comply with good research ethics in privacy and data protection issues, and erase the data as soon as the use purpose has ended.
2. Data acquisition and selection criteria
FSD increases its collection actively and selectively: the Data Archive acquires datasets actively but accepts data for archiving selectively. The data to be archived at FSD must comply with certain qualitative, technical and legislative criteria.
2.1. Qualitative criteria
The dataset must fulfil at least one of the following criteria:
- It can be used for temporal or content-related comparative study
- It can be used to complement other data
- It has thus far only been partly analysed
- It can be used in a manner that differs from its original use (for example, it enables new hypotheses or methodological focuses)
- It can be used for studying or teaching research methods
- It is scientifically and/or culturally unique
2.2. Technical criteria
Both of the following criteria must be fulfilled:
- The data are in a reasonable technical state, meaning that they can be copied/processed/converted for reuse at a reasonable cost.
- The information content of the data is sufficiently clearly organised and its supplementary, contextualising materials are sufficient to allow metadata creation and the processing of the data.
Recommended file formats are listed in a separate table.
2.3. Legislative criteria
The data have to be processed for research purposes in compliance with current legislation:
- Ownership and copyright of the data are sufficiently clear
- The data are anonymous or can be anonymised on assignment without significantly compromising their usability
- Permission has been granted by the National Archives to archive the data with personal information intact
- Data containing material subject to copyright fall under the scope of the agreement between FSD and Finnish copyright society Kopiosto, or an agreement on the transfer of rights has been made with the authors of the material (research participants)
- Permission to access data received from authorities (e.g. register data) stipulates that data be deposited at FSD without identifiers
- Research subjects have given their explicit consent for archiving and disseminating the data that are archived on the basis of Section 2, subsection 5 of the Personal Data Act
- Lawfulness of processing anonymous research data is not determined by the consent given by research subjects. For research ethical reasons, however, it is recommended that research subjects are either informed of plans to archive quantitative data or asked for consent to archive qualitative data without identifiers.
2.4. Other criteria
- FSD does not archive digital data that have been used in research and are archived in the National Library or National Archives. However, FSD may archive materials that are archived as hard copies in the National Archives but have been digitised by the researcher for research purposes and there is sufficient metadata for citation.
- FSD can archive newspaper and magazine material as well as photographs, cartoons and illustrations in books that have been collected by researchers for their study but have been created by someone else. According to an agreement between FSD and the Finnish copyright society Kopiosto, the Archive can archive and disseminate such digitised or digital material for research purposes. The agreement does not apply to audiovisual material or compositions.
- Hard copies (paper materials), such as newspaper or magazine articles, other texts, or paper photographs, are converted into digital files as part of the dissemination package of the dataset, if these materials have been used as a research instrument (e.g. as stimulation material for discussion, interview or survey).
- FSD does not archive audiovisual files. Such files are archived by the Language Bank of Finland, operating under the University of Helsinki and specialising in long-term preservation and reuse management of audiovisual material.
3. FSD's work process
This chapter gives an overview of the three key work processes at FSD: depositing, processing and delivering data for reuse. Documents and/or document series pertaining to each process are listed at the end of each overview. All documents and related document management measures are described in more detail in Appendix 1 (tables with archiving instructions).
3.1. Depositing data
The depositing of data is a process whereby a researcher, research group or research unit submits research data for archiving at the Finnish Social Science Data Archive. The person submitting the data (or the representative of the submitting body) provides FSD with a Submission Information Package (SIP), which includes a completed and signed Deposition Agreement, Dataset Description, a digital copy of the data itself and any other contextual material. The digital copy of the actual research data (for instance data matrix or written answers) included in the SIP is called 'original data' by FSD.
The original data in electronic format are usually submitted to FSD in ASCII format, in the format of statistical software, or as text files. Supplementary materials are submitted either in digital format or as hard copies. Delivery can take place by, for example, e-mail attachment, via FTP or a memory stick. The original data and its supplementary materials are destroyed after a certain period of time once a dissemination information package of the data have been generated and verified as usable for research purposes.
The Deposition Agreement is a contractual document that will be stored permanently. The Dataset Description form provides information on the original researchers/researching body, data content and the collection methodology used. The Dataset Description will be stored till the archive has produced the necessary metadata for the dataset.
FSD assigns each dataset a persistent identifier, that is, a study number.
Data deposition includes the following documents/document series:
- Deposition Agreement (L series: Appendix 1, Table 1)
- Dataset Description (KL series: Appendix 1, Table 2)
- Original data (ORIG/DA-series: Appendix 1, Table 3)
- Other material that describes and contextualises how the research data were produced (ORIG/OT-series: Appendix 1, Table 4)
3.2. Data processing
FSD processes the Submission Information Package (SIP), turning it into an Archival Information Package (AIP) for long-term storage. The AIP is used for producing a Dissemination Information Package (DIP) that makes the data suitable for reuse. In the majority of cases, the AIP is the same as the DIP. The AIP and DIP include data, metadata and any other material related to the data.
The aim of the processing is to ensure that (1) the data are accessible in the long term, both in terms of technical format and content information, and that (2) the research subjects' privacy is protected. This is achieved, for example, by choosing appropriate technical formats, creating detailed metadata and anonymising the data. The aims of data processing are the same for different data types, but different types of data are processed in different ways. The key features of qualitative and quantitative data processing are described below.
Data processing produces the following document series:
- Archival information data (AR series: Appendix 1, Table 5)
- Dissemination information data (DA series: Appendix 1, Table 6)
- Metadata (ME series: Appendix 1, Table 7)
- Digitalised material that describes/contextualises the production of the research data (OT series: Appendix 1, Table 8)
- Data processing files (SY series: Appendix 1, Table 9)
3.2.1. Quantitative data
The original data deposited at FSD can be in many different formats (data in SPSS, Excel or ASCII formats, for example, and supplementary materials in Word, Excel or text files or as hard copies). The objective is to produce a well-documented data file whose contents and structure correspond as closely as possible to the collection instrument (e.g. a questionnaire). This is why the DIP does not normally include variables construed by the researchers from other variables in the data.
The Archive uses SPSS statistical software for checking and processing data, and for adding variable-level metadata. Data processors enter detailed information on how the AIP and DIP were produced. Variable information, any amendments made to the data and other observations are noted in the SPSS syntax file. The international DDI2 documentation standard is used for describing and storing metadata relating to the content and methodology of the data. Archival and packaging information is stored in the Data Archive's internal databases (see chapter 7).
The Data Archive hopes that researchers anonymise their quantitative data before submission to the Archive (more information on anonymising quantitative data in Data Management Guidelines). FSD reviews the anonymisation and makes additional changes if necessary. In some cases, FSD does the anonymisation on the assignment of the researcher (Personal Data Act, Section 8, subsection 7). If necessary, the measures taken are itemised in the anonymisation agreement between the data depositor and the Archive.
Anonymisation is planned for each dataset on a case-by-case basis. Following strategies are used to remove identifiers:
- Any supplementary material containing direct identifiers (such as personal identification numbers, names, addresses, telephone numbers, or email addresses) is deleted.
- Variables containing direct identifiers are deleted from the data.
- All strong indirect identifiers are deleted. These include student numbers and other administrative indirect identifiers, vehicle registration documents, bibliographic citations to publications of research subjects etc.
- Date of birth is generalised into year of birth and, if necessary, the years of birth are categorised into age groups.
- Direct and strong indirect identifiers are also removed from responses entered by research subjects in open-ended variables.
- If an open-ended variable contains a considerable amount of identifying information, the variable will be deleted altogether.
- Variables containing indirect identifiers are deleted.
- Response categories are recoded and aggregated by combining adjoining categories.
- New variables are created based on indirect personal identifiers (for example, job titles of respondents are removed and the information is recoded into an occupational status variable).
- Variables containing regional information are deleted or recoded into new variables (for example, a variable containing information on municipalities of residence is replaced with two new variables: region of residence and type of municipality.
- Outliers are excluded or hidden (this will prevent, for instance, the identification of individuals with exceptionally high income).
- Two or more variables are aggregated into a new variable.
- Anonymising pseudonymised data: In addition to the strategies listed above, units of observation in pseudonymised data are always randomly assigned new id numbers and the data are rearranged according to these new id numbers. Original id numbers are erased. After the data have been anonymised in this way, it is no longer possible to add new information about the research participants to the data.
If anonymisation would prevent sensible use of the dataset and the data are of high scientific value if personal data are not removed, the Archive recommends that the researcher(s) apply permission for archiving from the National Archives (Personal Data Act Section 35, subsection 2).
3.2.2. Qualitative data
The Archive accepts qualitative, or non-numerical, research data for archiving in many formats. Archived qualitative data are mainly textual data, originating from interviews or different types of interactions, or self-administered writings (e.g. biographies, diaries or thematical texts).
Digital images are archived only if the researcher has gotten permission to archive the images from the authors (so-called transfer of rights). Audiovisual material is archived only in exceptional cases. For example, expert interviews of people well known in their field can be archived if the individuals featured in the material have given explicit consent for the archiving and reuse and if Section 2, subsection 5 of the Personal Data Act is applicable to the material. Most interview data do not fall under this special provision.
When processed, digital text files of qualitative data are converted to RTF or TXT format. Image files are stored in TIFF format and audiovisual data in MPEG format. Hard copies are converted into PDF, RTF or TIFF format, whichever is considered best. Consistency of the dataset's internal metadata (such as file names and descriptive background data) is also verified. Measures and actions taken to produce the Dissemination Information Package (DIP) of the data are noted in detail in the data-specific text file. The international DDI2 documentation standard is used for describing and storing metadata relating to the content and methodology of the data. Textual data DIPs contain an HTML index facilitating data use. Archival and packaging information is stored in FSD's internal databases (see chapter 5).
Removal of identifiers
FSD hopes that researchers remove identifiers from their qualitative data before submitting them to the Archive (more information on anonymising qualitative data in the Data Management Guidelines). In some cases, the Archive does the anonymisation on the assignment of the researcher (Personal Data Act, Section 8, subsection 7).
A plan on the removal of identifiers is made for each dataset on a case-by-case basis. Identifiers are removed from all personal information, including information both on research participants and third parties. Following strategies are used to remove identifiers:
- Additional data files containing direct identifiers (such as personal identification numbers, names, addresses, telephone numbers, or email addresses) are deleted.
- Person names (both of research participants and of third parties mentioned by them) are replaced by pseudonyms (Elisabeth -> [Ann]), or the names are removed (Elisabeth -> [wife]).
- Indirect identifiers mentioned within the text (schools, workplaces etc.) are categorised (Hennes & Mauritz -> [clothing store]).
- Background information of participants (e.g. age, municipality of residence, education, occupation, household composition, nationality or ethnicity) is categorised.
- Parts of data that contain significant numbers of identifiers are removed.
- Exceptions: Data archived in accordance with the Copyright Act 404/1961 and Section 2, subsection 5 of the Personal Data Act are only minimised and not fully cleared of identifiers, provided that there is explicit consent from the research participants to retain names in the data.
If participants cannot be reached and anonymisation would significantly reduce the usability of the data, and the data are of high scientific value with personal information included, the Archive recommends that the researcher(s) apply permission for archiving from the National Archives (Personal Data Act Section 35, subsection 2).
3.3. Dissemination of data for reuse
The datasets archived at FSD are disseminated according to the access conditions set in the data deposit agreements. A small part of the datasets is available for all users.
Most datasets are available only for registered users. Students and members of staff of Finnish universities, polytechnics and research institutes register themselves using the Haka identity federation. Other users are required to complete a short registration form to get a user account. After FSD has checked the personal details supplied in the form, the applicant will be sent a username to the email address she or he has given.
Data are available for users according to the access conditions set out for the dataset:
- available for all users
- available for research, teaching and study (requires registration).
- available for research only (requires registration)
- available only by permission from the depositor (requires registration)
The following documents and files are related to the delivery of data to users:
- Data download information, starting from year 2014 (AL-series, Appendix 1, table 12)
- Access applications (hard copies), till year 2014 (P1-series, Appendix 1, table 10)
- Terms and Conditions of Use (hard copies), till year 2014 (P2-series, Appendix 1, table 11)
One of the key tasks of FSD is to disseminate information about archived research data. Detailed descriptions of all datasets are freely available on Aila Data Service. The catalogue is constantly updated.
On its website, the Archive also provides guidelines for researchers on data management. The guidelines cover information given to research participants, anonymisation, file formats, metadata and data security. There are also instructions on research methods in Finnish. All issues of the Archive's newsletter FSD Bulletin are published on the website.
The website draws its information from FSD's founding documents, archived data files, data archiving literature and press releases, and the research literature. A permanently archived copy of the website is extracted annually and whenever major changes are made to it. Modifications to the website are documented using a version management system. The website can be found at https://www.fsd.uta.fi/.
5. Information systems and data security
There are two important information systems in the archive: Aila Data Service and Tiipii operational database. Aila Data Service includes an online data catalogue, a client register of data users, a user registration and sign-on system, and an online data ordering and download system: Information system description (Aila) Tiipii operational database is an internal recording system for all archiving work: Information system description (Tiipii).
FSD also has an intranet accessible only by in-house staff using their user IDs and passwords. The intranet includes an extensive and detailed manual of archiving practices.
Digital data are stored on the Data Archive's server located in the Archive's own server room, which is locked and under access control. Only appointed FSD employees with server administrative tasks as well as property maintenance personnel can access the room. Any other maintenance personnel are accompanied to the room.
Files on FSD's disk server can be accessed by authorised FSD staff members through FSD's local network. Authorised staff may also access the drives remotely by using the official remote access service offered by the Tampere University. Other remote access connections or systems are not provided or supported. Access rights are provided to the staff based on their roles and responsibilities as well as their memberships in teams within FSD. The role of each staff member is determined by his/her tasks, and the role and responsibilities are confirmed by the supervisor. FSD's IT administration provides the appropriate access rights to each employee based on the role.
To ensure data security, for instance in case of physical damage through hard disk failure or fire, the data on the Data Archive's internal server are copied onto the Tampere University's IT services server in the area limited to FSD use (so-called 'mirror'). The mirror server is located in the University IT services server room and is managed by the University IT services. The geographical distance between the University server room and the FSD server room is over two kilometres. The network services provided by the IT services are jointly managed by Tampere Universities (Tampere University and Tampere University of Applied Sciences). The mirror data can be only accessed by the members of the Tampere Universities maintenance team and appointed FSD staff members responsible for system administration and server maintenance. These authorised personnel can also remotely access the data. The University IT services backs the data up on tape in accordance with their own backup strategy.
One of FSD's internal servers (so-called auxiliary server) is used to make the mirror copies. The internal server will also keep copies of the data. The auxiliary server is located in the same locked server room as the Archive's own disk server. The contents of the auxiliary server are copied to tape. At the moment, the tape type used is LTO-5. Tape copies are produced by the server's own backup tools, creating multiple tape copies of the data over time. All tapes are stored in a data cabinet (fire safety category S 120 DIS) physically located in the Data Archive's server room.
The condition of the tapes is monitored and any damaged tapes are replaced. Discarded tapes are sent for disposal in accordance with the Tampere University regulations relating to the destruction of storage media containing personal data. All other faulty media are destroyed in the same manner. The backup tapes of the University IT services are also disposed of in accordance with the University regulations. Faulty disks of the University IT services are delivered to the organisation responsible for the maintenance of the system, which erases or destroys them in accordance with the maintenance agreement.
The Tampere University requires data systems administrators to sign a non-disclosure agreement (an undertaking of confidentiality). The same procedure applies both to the Data Archive's technical service staff and the administrative and maintenance staff of the Tampere Universities IT services.
6. Collection thus far and anticipated accumulation
Hard copies of deposit agreement documents: the collection from 1999 until February 2017 encompasses a total of 5 archiving folders (0.4 shelf metres). Anticipated future accumulation: 1 archiving folder (0.08 shelf metres) in three years.
Hard copies of access applications and Terms and Conditions of Use documents: the collection from 1999 until April 2017 encompasses a total of 20 archiving folders (1.6 shelf metres). Anticipated future accumulation: none. The document series ended with the launch of the online Aila Data Service in April 2014.
Permanently stored hard copies for contextualising research material: the collection from 1999 until February 2017 encompasses a total of 41 archiving folders (3.28 shelf metres). Anticipated future accumulation: 2 archiving folders (0.16 shelf metres) per year.
Digital research data and other connected digital material: the collection from 1999 until February 2017 totals 12 GB containing 26,000 files. Anticipated future accumulation: approximately 0.7 GB per annum.
Hard copies of FSD's administrative documents are archived at the long-term archive of the Tampere University according to the archives formation plan of the University.
7. Continuity plan
The base funding of FSD is sufficient to maintain the basic activities (data archiving and dissemination, information service). In the unlikely event that the Archive's funding and continuity of operations are at risk, the director of FSD appoints a task group to map out required functions for the controlled transferring of the data to another institution. The task group are to take administrative and technical aspects into account when planning such transfer. Representatives of funders, members of FSD's National Advisory Board and archiving experts should be included in the task group.