HDF4

This section details a set of additional conventions that are specific to the HDF4 file format.

The following table shows the mapping between HARP data types and HDF4 data types. HDF4 data types not covered in this table are not supported by HARP.

HARP data type

HDF4 data type

int8

DFNT_INT8

int16

DFNT_INT16

int32

DFNT_INT32

float

DFNT_FLOAT32

double

DFNT_FLOAT64

string

DFNT_CHAR

In the HDF4 data model there is no concept of shared dimensions (unlike netCDF). The shape of an HDF4 dataset is specified as a list of dimension lengths.

When a HARP variable is stored as an HDF4 dataset, dimension lengths are preserved, but dimension types are lost. A dataset attribute named ‘dims’ is used to store the type of each dimension of the associated dataset as a comma- separated list of dimension type names. The number of dimension types equals the number of dimensions of the HDF4 dataset. This number is equal to the number of dimensions of the HARP variable, except for scalar variables and variables of type string (see below).

HDF4 does not support scalars (datasets with zero dimensions). The HDF4 backend stores a scalar HARP variable as an HDF4 dataset with a single dimension of length 1. To differentiate between scalars and proper 1-D variables, both of which are stored as 1-D HDF4 datasets, the introduced dimension is included in the dimension type list using the dimension type name ‘scalar’.

HDF4 does not support strings. The HDF4 backend stores an N-dimensional HARP variable of type string as an (N+1)-dimensional HDF4 dataset of type DFNT_CHAR. The length of the introduced dimension equals the length of the longest string, or 1 if the length of the longest string is zero. Shorter strings are padded with null-termination characters. The introduced dimension is included in the dimension type list using the dimension type name ‘string’.

Thus, a scalar HARP variable of type string would be represented in HDF4 by a 2-dimensional dataset of type DFNT_CHAR. The length of the outer dimension would be 1, the length of the inner dimension would equal the length of the string stored in the HARP variable. The ‘dims’ attribute associated with this HDF4 dataset would contain the string ‘scalar,string’.

To summarize, HARP dimension types are mapped to dimension type names as follows:

HARP dimension type

dimension type name

time

time

latitude

latitude

longitude

longitude

vertical

vertical

spectral

spectral

independent

independent

N/A

scalar

N/A

string

HARP uses empty strings to represent the unit of dimensionless quantities (to distinguish them from non-quantities, which will lack a unit attribute). However, HDF4 cannot store string attributes with length zero. For this reason an empty unit string will be written as a units attribute with value "1" when writing data to HDF4. When reading from HDF4 a unit string value "1" will be converted back again to an empty unit string.

Note that even though the time dimension is conceptually considered appendable, this dimension is not stored as an actual appendable dimension in HDF4. Products are read/written from/to files in full and are only modified in memory. The appendable aspect is only relevant for tools such as plotting routines that combine the data from a series of HARP products in order to provide plots/statistics for a whole dataset (and thus, where data from different files will have to be concatenated).