Funding Agencies with OA and Data Management RequirementsOA Journals, Publishers, and RepositoriesSuggested Language and Tips for Authors Seeking OA PublicationOpen Access Books, Theses, and Teaching Resources
Funding Agencies with OA and Data Management RequirementsHow to Craft a Data Management PlanHow to Package Your DataWhere to Store Your Data
This is the "How to Package Your Data" page of the "Scholarly Communication" guide.
Alternate Page for Screenreader Users
Skip to Page Navigation
Skip to Page Content

Scholarly Communication   Tags: data, faculty, open_access, publishing, scholarly_communications  

Last Updated: Feb 9, 2017 URL: Print Guide RSS Updates

How to Package Your Data Print Page

Tips for Packaging Data

When packaging your data, think long-term and long-distance.  Your goal as an author is to make your data usable by as many people as possible, whether they are far away geographically or living in the future, without extra input or instructions.  Here are some tips to consider when formatting and preparing your data for preservation:
  • Think long-term!  Package your data in open, non-propietary formats that will remain accessible even as technology evolves in the future, as opposed to proprietary formats whose use is restricted to particular programs or people.

    • Some proprietary formats are more ubiquitous than others--it's unlikely Microsoft products will disappear anytime soon.  Still, it's best to package data in formats that can't only be read by one program, but are more universally accessible.

    • Consider:
      • CSV (rather than Excel charts) for raw data.
      • PDF for text files.  TXT, RTF, and HTML files are also acceptable.
      • TIFF for images.  They will not display on the Internet, but they are the best for preserving image data without loss.
      • XML instead of databases.
  • Recognize that in some cases, you may need to save your data in two different formats:  one you can manipulate for your work, and one for long-term preservation.

  • Create good metadata for your research data!  Identify any relevant standards for data and metadata content and format, and follow them to make sure the data can be used by others.

  • Know whether any laws or regulations will affect how you package your data.  For example, HIPAA regulations and NIH policies will affect how you treat personally identifiable information while packaging data that is meant to be widely shared.

Describing Your Data (Metadata)

Metadata is, at its simplest, data about data.  It usually includes information about the content, context, provenance, and/or accessibility of a data set.  Descriptive metadata may be a required consideration in data management plans and is vital in making published data sets more findable, accessible, reproducible, and universally usable. 

Metadata can exist in multiple formats, including as a separate text or HTML document that accompanies a data set, an XML document linked to the data files, or as information embedded in an XML data file.  (XML is often used for metadata records because it can be easily integrated into many different systems.)

Various metadata standards specify what pieces of information to include and how to express them when describing a data set. Each metadata standard is composed of various elements or fields, individual pieces of information that facilitate searching similar items through shared terminology and construction to describe them.  There are three main types of metadata elements:

  • Descriptive: describes the content and context of an object. e.g. title, author/creator, subject
  • Technical/structural: describes the format, process, and interrelatedness of an object. e.g. file format, size, dimensions (for images), set (if part of a series)
  • Administrative: describes the information needed to manage or use an object. e.g. permissions, creation date, required software, provenance

Some examples of metadata standards are linked below, along with a description of whether they are best used within a specific discipline or across many subject areas.

Best Practices for Metadata Record Creation

Preparing Yourself

  1. Prepare yourself!  Creating good metadata begins with preparation and organization.  Gather all of your information together, especially if it is distributed among multiple people.  Then you can plan what you need to do.
  2. Use existing information whenever possible.  The information will often already be written by the time you need it for a metadata record.  Reuse text from your funding proposals, such as the abstract, purpose, location, etc.  You can also create a data dictionary during the data collection and analysis stages of your research and reference that in your metadata.
  3. Choose keywords and other descriptive tags wisely.  Consider all the interpretations of your vocabulary choices, and use a thesaurus to come up with alternate terms you may not otherwise have thought of. 
  4. Review your metadata to make sure it is complete and accurate.  Include as many details so users can know what to expect from your data before they begin going through it.  Make sure your descriptions are clear and do not omit any important information.
  5. If possible, include unique identifiers like an ORCID (Open Researcher and Contributer ID) for the authors of and contributors to the research.


Preparing Your Metadata

At the dataset level, good metadata includes information about:

  • The context of the data: project history, objectives, hypotheses.
  • Data collection methods: protocols, sampling, instruments, data scale, resolution, temporal coverage, geographic coverage, hardware/software and other equipment.
  • Structure of the data: Data files, relationships between files.
  • Sources of data used
  • Data checking, validation, proofing
  • Modifications to the data since their creation
  • Identification of different versions of the data
  • Information on access and use conditions, confidentiality, etc., where necessary

This information may be contained in a separate document that accompanies the data files.

At the individual data level, good metadata includes information about:

  • Names and labels for variables, descriptions, records, and values
  • Explanation of codes and classification schema
  • Explanation and reasons for missing values
  • Data derived after collection, with information on how they were created (code, algorithms, or command files)

This information may be embedded within a dataset itself or contained within a separate document that accompanies the data files.


Loading  Loading...