Appendix 6. Digital File Formats Used in the Finnish Social Science Data Archive for Different Types of Data

Released 5 October 2015, updated 21 March 2016

Preservation (FSD) refers to the digital file formats the Archive uses for long-term preservation.
Delivery to users means the file formats used in data transmission through Aila Data Service. The service allows online data download.

Classification of file formats according to data types should not be strictly interpreted. Recommended formats can be used for all data types.

Data Types and Digital File Formats
Type of data Ingest formats accepted Preservation (FSD) Delivery to users Notes
Data matrix No restrictions
Recommended: SPSS portable or other statistical software formats (e.g. SAS, Stata, Excel, csv)
por (SPSS Portable), CSV, (ods) por (SPSS Portable), (CSV, ods) In exceptional circumstances, the Archive can deliver data matrices in other formats (e.g. sav, csv).
OpenDocument format (.od*) may be used as a preservation format in case the layout needs to be preserved.
The Archive is aware of the restrictions imposed by the SPSS, and therefore keeps track of developments in statistical software and of the format choices made in data repositories in other countries.
Increased archiving of health data may require re-evaluation of matrix formats.
Textual data, e.g. interview transcripts or responses to open-ended questions No restrictions
Recommended: plain text or widely-used office formats (e.g. docx)
UTF-8 encoded TXT or CSV, xml, html/xhtml, odt UTF-8 encoded TXT or CSV, xml, html/xhtml, odt No restrictions to using other field separators than commas in csv files (e.g. tabulator).
If the layout of the archived material or images embedded into the document need to be preserved for the material to be understood, using PDF/A or OpenDocument format is recommended.
Increased archiving of humanities data may request re-evaluation of file formats.
Internal data processing documentation - UTF-8 encoded TXT or CSV, PDF/A, odt - The program used in the processing will determine the file extension (e.g. for syntax SPS, for python source code PY etc.)
PDF/A or OpenDocument format (odt) may be used as a preservation format in case the layout needs to be preserved.
Image No restrictions
Recommended: JPEG, PNG, TIFF
JPEG, PNG, TIFF, (DNG) JPEG, PNG In exceptional circumstances, DNG can be considered for long-term preservation.
When the Archive digitises images, the adopted long-term preservation format is TIFF or DNG. The Archive takes into account the digitisation guidelines maintained by the National Archives.
Animated GIF files can be converted either into video or to a series of PNG images.
Audio No restrictions
Recommended: FLAC, WAV
FLAC, (MP3) FLAC, MP3 The Archive keeps track of audio format recommendations and changes formats, if needed.
MP3 is accepted as a long-term preservation format only if the original material was in this format.
Video No restrictions
Recommended: MPEG-4 H.264
MPEG-4 H.264 MPEG-4 H.264 The formats recommended/accepted for video may change. They will be reviewed as soon as there is need.
Compression level is decided on a case-to-case basis. Archival and dissemination information packages (AIP and DIP) of a dataset may differ in compression level.
updated 2016-04-12