File Format Specification

From HDF5 Wiki
Jump to navigation Jump to search

I. Introduction

I.A. This Document

I.B. Changes for HDF5 1.12

I.C. Changes for HDF5 1.10

II. Disk Format: Level 0 - File Metadata

II.A. Disk Format: Level 0A - Format Signature and Superblock

The superblock may begin at certain predefined offsets within the HDF5 file, allowing a block of unspecified content for users to place additional information at the beginning (and end) of the HDF5 file without limiting the HDF5 Library’s ability to manage the objects within the file itself. This feature was designed to accommodate wrapping an HDF5 file in another file format or adding descriptive information to an HDF5 file without requiring the modification of the actual file’s information. The superblock is located by searching for the HDF5 format signature at byte offset 0, byte offset 512, and at successive locations in the file, each a multiple of two of the previous location; in other words, at these byte offsets: 0, 512, 1024, 2048, and so on.

The superblock is composed of the format signature, followed by a superblock version number and information that is specific to each version of the superblock.

Currently, there are four versions of the superblock format:

  • Version 0 is the default format.
  • Version 1 is the same as version 0 but with the "Indexed Storage Internal Node K" field for storing non-default B-tree 'K' value.
  • Version 2 has some fields eliminated and compressed from superblock format versions 0 and 1. It has added checksum support and superblock extension to store additional superblock metadata.
  • Version 3 is the same as version 2 except that the field "File Consistency Flags" is used for file locking. This format version will enable support for the latest version.

Versions 0 and 1 of the superblock are described below:

byte byte byte byte
Format Signature (8 bytes)
Version # of Superblock Version # of File’s Free Space Storage Version # of Root Group Symbol Table Entry Reserved (zero)
Version Number of Shared Header Message Format Size of Offsets Size of Lengths Reserved (zero)
Group Leaf Node K Group Internal Node K
File Consistency Flags
Indexed Storage Internal Node K1 Reserved (zero)1
Base AddressO
Address of File Free space InfoO
End of File AddressO
Driver Information Block AddressO
Root Group Symbol Table Entry
(Items marked with a '1' in the above table are new in version 1 of the superblock.)
(Items marked with an 'O' in the above table are of the size specified in the Size of Offsets field in the superblock.)

Field Name Description
Format Signature This field contains a constant value and can be used to quickly identify a file as being an HDF5 file. The constant value is designed to allow easy identification of an HDF5 file and to allow certain types of data corruption to be detected. The file signature of an HDF5 file always contains the following values:
Decimal: 137 72 68 70 13 10 26 10
Hexadecimal: 89 48 44 46 0d 0a 1a 0a
ASCII C Notation: \211 H D F \r \n \032 \n

This signature both identifies the file as an HDF5 file and provides for immediate detection of common file-transfer problems. The first two bytes distinguish HDF5 files on systems that expect the first two bytes to identify the file type uniquely. The first byte is chosen as a non-ASCII value to reduce the probability that a text file may be misrecognized as an HDF5 file; also, it catches bad file transfers that clear bit 7. Bytes two through four name the format. The CR-LF sequence catches bad file transfers that alter newline sequences. The control-Z character stops file display under MS-DOS. The final line feed checks for the inverse of the CR-LF translation problem. (This is a direct descendent of the PNG file signature.)

This field is present in version 0+ of the superblock.

Version Number of the Superblock This value is used to determine the format of the information in the superblock. When the format of the information in the superblock is changed, the version number is incremented to the next integer and can be used to determine how the information in the superblock is formatted.

Values of 0, 1 and 2 are defined for this field (the format of version 2 is described below, not here).

This field is present in version 0+ of the superblock.

Add table captions!