CODA HDF4 Mapping Description
CODA provides access to HDF4 by creating a view on the HDF files using the CODA data types. Below we will describe how CODA maps the HDF4 product structure to one that is based on the CODA data types.
HDF products are self describing products. This means that when an HDF file is opened one can retrieve the product structure from the file itself. For this reason, CODA will not require fixed product format descriptions for HDF files. What CODA will do is use the underlying HDF4 library to retrieve the product format dynamically once an HDF file is opened. Based on this format CODA will create, on the fly, a mapping of the HDF product structure to one that is based on the CODA data types. Because of this approach CODA is able to provide you access to all HDF files, no matter whether the HDF file contains atmospheric data or not. In order to read data from HDF files you can use the same CODA reading routines as you would have used for all other products (underneath CODA will redirect these calls to the appropriate HDF4 routines for you).
HDF4 files contain three different data types for storing data: Vdata, SDS, and GRImage. The GRImage is used for 2D image data, the SDS for multi-dimensional arrays of scientific data, and Vdata for tables of data. The data types can be grouped into a hierarchical structure by means of Vgroups. Each of the data types in HDF4 can contain attributes; there are data type specific attributes, but there are also general annotations (labels and descriptions) that can be assigned to any data object. The HDF4 file itself can also contain various attributes. Below we will discuss the mapping for each of the different data types and attributes.
HDF4 names for objects and attributes can contain any kind of string and do not have to be unique. Since fieldnames in CODA have to be unique and have to be formatted as an identifier (i.e. no spaces and special characters may be used), CODA translates all object names it finds in the HDF4 file to unique identifiers. It is important to keep this in mind, because it means that the fieldnames you will see in CODA do not necessarily have to be the same names that you will see when you open an HDF4 file with e.g. an HDF browser. The translation from an HDF4 name to a CODA identifier is a very straightforward translation. First, all characters that are not alphanumerical characters (0-9, a-z, A-Z) are converted to an underscore. Then all underscores at the begin of the name are stripped. If the name is now an empty string, then it is replaced with the value 'unnamed'. Finally CODA will look for other occurrences of the name within the same CODA record. If it finds one, CODA will add a '_1' postfix to the new name and increase the number until the name is a unique name.
Basic Data Types
Each of the HDF4 data types (including attributes) contain a data type property that describes the basic data type of the object. Each basic data type is represented in CODA by one of the CODA basic data types. The following table provides a mapping of the HDF data types to the CODA versions.
HDF4 data type | CODA type class | CODA read type |
---|---|---|
DFNT_CHAR8 | text | char |
DFNT_CHAR | text | char |
DFNT_UCHAR8 | integer | uint8 |
DFNT_UCHAR | integer | uint8 |
DFNT_INT8 | integer | int8 |
DFNT_UINT8 | integer | uint8 |
DFNT_INT16 | integer | int16 |
DFNT_UINT16 | integer | uint16 |
DFNT_INT32 | integer | int32 |
DFNT_UINT32 | integer | uint32 |
DFNT_INT64 | integer | int64 |
DFNT_UINT64 | integer | uint64 |
DFNT_FLOAT32 | real | float |
DFNT_FLOAT64 | real | double |
The string and bytes read types are not used in CODA for HDF4 files.
Root of the product
The root of the product will be presented as a record and will contain all 'lone' HDF4 GRImage, SDS, Vdata, and Vgroup objects (i.e. objects that are not included in a Vgroup). The field names will be taken from the names of the objects.
GRImage
A GRImage is presented in CODA as a multi-dimensional array of the base type of the array. The array dimensions of the GRImage are swapped so the second dimension becomes the fastest running (in HDF the first dimension of a GRImage is the fastest running). Each pixel in a GRImage object can consist of more than one component. If the number of components equals one the CODA array will be 2 dimensional, otherwise it will be 3 dimensional and the third dimension will represent the number of components for each pixel.
The base type of the CODA array will be a scalar data type (see above) that is derived from the data_type property of the GRImage.
SDS
A Scientific Data Set is a multi-dimensional array of a basic type and will also be presented as such in CODA.
Vdata
The Vdata object is a table-like object in which the columns are fields and the rows represent a numbered list (i.e. the Vdata is a list of records). The data for single field within a record can contain one or more basic data type elements (this number is called the 'order' of a field in HDF4). The number of elements in a field is the same for all records in the Vdata object.
CODA will map the Vdata object to a single record in which each field corresponds to a Vdata field. A field will be a one or two-dimensional array of a basic type. The first dimension corresponds to the record-list and if the number of elements in the field is not equal to 1 there will be a second dimension that is used map the list of elements within a field.
Vgroup
A Vgroup will be mapped to a record in CODA. The field names for each of the Vgroup elements are the names of the objects (and converted to identifiers as mentioned above). Mind that, because the name conversion can apply a postfix to make a name unique, if an object occurs in more than one Vgroup it could end up having different names in the CODA records for these Vgroups.
Attributes
For HDF4 data CODA won't provide attributes such as 'description' or 'unit'. However, CODA will provide an attribute record for each data object containing all attributes for this object as they are stored in the HDF4 file (and these attributes can provide description or unit information).
For the root of the product, the attributes in the attribute record are a concatenation of the SD and GR file attributes and the file labels and file descriptions from the Annotation interface.
The attribute record for the CODA array that describes a GRImage or SDS contains the GR/SD attributes and Annotation label and description for this object.
Vdata and Vgroup attributes are currently not yet retrieved by CODA because of an issue with the HDF4 library. As soon as this issue is resolved, retrieval of Vdata/Vgroup attributes (including attributes of Vdata fields) will be included in CODA.
If an HDF4 object has a scale_factor and/or add_offset attribute CODA will use this to automatically convert any values that are read from the object (value_returned = value_read * scale_factor + add_offset). The application of this conversion can be enabled/disabled using the same global option that is used to turn on/off conversions for products that are described by CODA format definition files (this option is turned on by default).
If a variable has a _FillValue attribute CODA will use this to automatically set the value to NaN (potentially upgrading the type to a double). The application of this conversion can be enabled/disabled as described above.