Records Management and Archives Formation Plan of FSD Appendix 6. Digital File Formats Used at the Finnish Social Science Data Archive for Different Types of Data

Initially released 5 October 2015, latest update 23 November 2022.

  • Ingest formats accepted refers to the digital file formats the Archive accepts for processing.
  • Preservation (FSD) refers to the digital file formats the Archive uses for long-term preservation.
  • Delivery to users means the file formats used in data transmission to customers through Aila Data Service.

Classification of file formats according to data types should not be strictly interpreted. Recommended formats can be used for all data types.

The Archive recommends converting files to the recommended ingest formats before deposit. Data files sent in formats other than preservation or distribution formats are converted to accepted file formats during the archiving process. The original file in another format is not retained after the archiving has been completed.

Data Types and Digital File Formats

Type of data Ingest formats accepted Preservation (FSD) Delivery to users Notes
Data matrix No restrictions
Recommended: SPSS SAV or other statistical software formats (e.g. SAS, Stata, Excel), CSV or SPSS Portable POR
CSV, SPSS Portable, (ODS) SPSS SAV, CSV, (SPSS Portable POR, ODS) Recently archived data files are delivered in SAV and CSV formats. Older SPSS Portable files will be converted to sav format in the future.
OpenDocument format (.od*) may be used as a preservation format in case the layout needs to be preserved.
The Archive is aware of the restrictions imposed by SPSS, and therefore keeps track of developments in statistical software and of the format choices made in data repositories in other countries.
Textual data, e.g. interview transcripts or responses to open-ended questions No restrictions
Recommended: plain text or widely-used office formats (e.g. docx)
UTF-8 encoded TXT or CSV, xml, html/xhtml, odt UTF-8 encoded TXT or CSV, xml, html/xhtml, odt No restrictions to using other field separators than commas in csv files (e.g. tabulator).
If the layout of the archived material or images embedded into the document need to be preserved for the material to be understood, using PDF/A or OpenDocument format is recommended.
Increased archiving of humanities data may request re-evaluation of file formats.
Internal data processing documentation - UTF-8 encoded TXT or CSV, PDF/A, odt - The program used in the processing will determine the file extension (e.g. for syntax SPS, for python source code PY etc.)
PDF/A or OpenDocument format (odt) may be used as a preservation format in case the layout needs to be preserved.
Image No restrictions
Recommended: JPEG, PNG, TIFF, SVG
JPEG, PNG, TIFF, (SVG, DNG) JPEG, PNG, (SVG) In exceptional circumstances, DNG can be considered for long-term preservation.
When the Archive digitises images, the adopted long-term preservation format is TIFF or DNG. The Archive takes into account the up-to-date digitisation guidelines.
The SVG-format is used for vector images.
Animated GIF files can be converted either into video or to a series of PNG images.
Camera RAW formats are generally not accepted. Note: The Archive is able to ingest all Adobe file formats, but recomments open formats instead.
Audio No restrictions
Recommended: FLAC, WAV
FLAC, (MP3) FLAC, MP3 The Archive keeps track of audio format recommendations and changes formats if needed.
MP3 is accepted as a long-term preservation format only if the original material was in this format.
Video No restrictions
Recommended: MPEG-4 H.264
MPEG-4 H.264, (JPEG 2000) MPEG-4 H.264 The formats recommended/accepted for video may change. They will be reviewed as soon as there is need.
The JPEG 2000 sequence is preserved as is and not converted to another format.
Compression level is decided on a case-by-case basis. Archival and dissemination information packages (AIP and DIP) of a dataset may differ in compression level, resolution and format.
Geographic information Dealt with case-by-case Dealt with case-by-case Same as the preservation format Any geospatial information related to the data are dealt with on a case-by-case basis, taking into account the specifications provided by the national digital long-term preservation solution. GeoTIFF-files can be deposited and preserved as TIFF-files.