Format & Preservation
I. File Formats
A. Accepted formats:
1. Text: Comma-separated values (.csv), Plain Text (.txt), XML (.xml), XHTML (.html), Rich Text (.rtf), HTML (.htm, .html), Microsoft Word (.doc), Microsoft Excel (.xls), Microsoft PowerPoint (.ppt), Acrobat PDF (.pdf), Postscript (.ps)/
2. Images: TIFF (.tip, .tiff), JPEG 2000 (.jp2), GIF (.gif), PNG (.png), JPEG (.jpg), RAW (.crw, .dng, .sr2, etc.)<\p>
3. Audio: FLAC (.fla), Wave (.wav), MP3 (.mp3)
4. Video: AVI (.avi), Motion JPEG 2000 (.mj2, .mjp2), MPEG-2 Video (.mp2), MPEG-4 Video (.mp4), Windows Media Video (.wmv), Quicktime (.mov)
5. Data: SAS syntax/program & SAS data set or index files (.sas,.sd2, .sv2, si2), SPSS syntax & system files (.sps, .sav), Rsyntax & binary files (r, R, Rdata, rdata)
B. Zipped and/or tarred files are discouraged but may be used in the event that a dataset is too large or contains too many individual files that should be distributed as a bundle.
C. Researchers who would like to deposit files in formats other than those listed in III.A. should contact the repository manager.
- All deposited content is covered under the Libraries’ Digital Preservation Policy.
- Metadata for all deposited content will be preserved.
- All deposited content will receive bit-level preservation, i.e., will be preserved in the file format in which it was deposited.
- The University Libraries will follow library and IT best practices to make as much of the repository’s content accessible into the future as possible.
In general, the more openly documented, non-proprietary and lossless the file, the more robust and successful the Libraries’ preservation efforts will be. These efforts will range from “full preservation” to “basic preservation” as follows:
- Level 1 or full preservation - Digital materials are preserved at this level through the use of non-proprietary and openly documented formats and enhanced standardized metadata schema. These materials may be normalized before deposit into the system, either through normalization of the metadata or normalization of the format. Depending on the case, master files (which may be uncompressed or in proprietary formats) may be preserved alongside non-proprietary, compressed, access copies. For example, RAW image format may be preserved alongside an open format like TIFF.
- Level 2 or limited preservation - The repository will make limited efforts to maintain the usability of the file as well as preserving it at the bit-level. The materials preserved at this level may have ample metadata, but be encoded in a proprietary or undocumented format or they may be encoded in open, non-proprietary formats, but lack adequate metadata to ensure long-term preservation and access. The format of content at this level will be monitored and may be transformed when significant risk to access is imminent, but it will be difficult to predict or control the consequences of any transformation or migration on content, structure or functionality. These files may also be transformed to a more preservable format to ensure that the content is not lost, even if some structure and functionality are sacrificed. This level of support will be generally applied to proprietary formats that are widely used and for which there is substantial commercial interest in maintaining access to them. Therefore tools will be available to migrate them to successor formats (e.g., Microsoft Word).
- Level 3 or basic and not guaranteed preservation - The Repository will provide preservation at the bit level for content at this level and will preserve its associated metadata as is. It will not monitor the format and associated risks or normalize, transform or migrate the file to another format. Files at this level may be readable by future applications, but there is no guarantee that the content, structure, or functionality will be preserved. This service level will usually apply to files written in highly specialized, proprietary formats (often usable only in a single software environment), formats no longer widely utilized, and/or formats about which little information is publicly available (e.g., PhotoCD) Any format not yet reviewed and evaluated by UA Scholars Archive will also receive Level 3 preservation on deposit. A higher level may be assigned after format review takes place.
- Scholars Archive is a Digital Commons institutional repository hosted by bepress.
- Content within the Scholars Archive community is organized by sub-communities and collections that closely follow the organization of the university.
- The sub-communities for schools and colleges, research institutes, research centers, the University Archives, Special Collections and University Libraries are at the top. hierarchical level. Collections are located within the sub-communities and correspond to their departments and units.
- Sub-communities and collections are established at the point at which content to be located in them is deposited.
- Requests for additional sub-communities and/or collections should be made to the repository manager.