CODA Product Format Definition Description Format

Below we will explain the format that CODA uses to describe the Product Format Definitions for data products.

Product classes and product types

The data files that are supported by CODA are often referred to as products. They are called this way because most data files are produced by data processing software (e.g. software that corrects for instrument calibration parameters, retrieves physical quantities from raw measurement data, etc.). When data files are generated by the same software application (using the same output options) these files will be identical with regard to the data format; the way data is stored inside a file (its structure) will be the same, but the actual content of the files can (and of course often will) be different. When a group of data files share the exact same file format, we say that they are of the same product type. Note that there is a difference between the main file formats that CODA supports (ascii, binary, xml, netcdf, hdf4, and hdf5) and the format associated with a product type. Two files can both be e.g HDF5 files but still have a different product type (because they contain different kind of information).

For some software applications the format of the generated data files are described in a product format specification document. Such a document explains in full detail how data is stored inside the product files (e.g. what data elements are stored and in which order, how numerical values are stored, etc.).

Sometimes a software application gets modified along the way. This can result in a change to the format of the output files (and if there is a product format specification document, then this document will also be updated). In order to deal with data from both the old and new version of the processor, CODA supports the concept of multiple versions of a product type. Each version gets its own version number (normally starting with zero and increasing by one for each new version, but it can be any integer value). This way, multiple versions of the same product type can be handled simultaneously (without the need to reconfigure CODA).

Product files are often part of a larger set of data files. For instance, a single satellite mission can produce many types of files. For these files there is usually some standardisation with regard to the product format that is used. In order to make use of this commonality or standardisation in product formats, CODA categorizes product types into product classes. Within a product class, product types can share common format definitions for parts of the product (for instance: common ascii headers that are at the start of a product file). Product classes are also a way to prevent nameclashes between different product types. This means that names within a product class should be unique (i.e. no two data types or product types may have the same name) but when two types are located in different product classes then they are allowed to have the same name.

Product Variables

For data with dynamic positions or lengths in raw ascii/binary data products, the calculation of a certain offset/size value can sometimes be quite time consuming, or it just can't be expressed easily in a single expression. For these situations the CODA product format definition system has been extended with a mechanism called 'product variables'. Product variables can be seen as a caching mechanism for expression results (for expressions that are used to interpret the contents of an ascii/binary product file). A product variable is a named scalar (a 64 bit signed integer) or a one-dimensional array thereof that is attached to the root of a product. Each product variable has an expression that allows the product variable to be initialized. Other expressions can make use of the value of (an array element of) a product variable as long as the product definition in which the expression is located has a product variable with that name. Product variables are initialized the first time a value is requested from it (it will thus not impact the performance of opening a product file).

Data Types

CODA uses a set of compound and basic types to dynamically describe each product. The compound types provide information about the overall structure of a file (e.g. the order and amount of data elements) and the basic types provide information about the data storage for each of the individual data items (e.g. the storage type of an integer value, real value, string, etc.). In order to allow descriptions of almost any raw (i.e. uncompressed) binary or ascii data file the data types in CODA are chosen to be very generic. The result of this is that the way a file is described in the CODA product format definitions may sometimes deviate from the way it is described in a product format specification document.

The CODA types are categorized into 7 different classes: record, array, integer, real, text, raw, and special. The records and array types are the compound types, and the integer, real, text, and raw types are the basic types. The special types are, as the name implies, special and we will come back to those later.

The basic types can further be differentiated according to their best native type to represent the data into memory. For integer numbers these are int8, uint8, int16, uint16, int32, uint32, int64, and uint64, for real numbers these are the float (IEEE754 32-bit floating point) and double (IEEE754 64-bit floating point), for text we have char (a single, 1-byte, character) and string (a 0-terminated character array), and for raw there is the bytes type (a series of uninterpreted bytes).

If a type is used more than once within the scope of a product class (such as e.g. generic product headers), CODA is able to store this type only once. In order for other compound types to link to this single instance of the data type description, the data type will have a unique (within the scope of the product class) name associated to it. Within the CODA Product Format Definition documentation each named data type will also receive its own separate page where the data type is described.

Below we will discuss each of the data type classes.

record

A record is a compound data type that consists of a list of fields. The fields have a fixed order and each field has a unique (within the scope of the record) name. The data in a field can again be of any type (including records and arrays).

Fields in a record can have several dynamic properties. First of all, record fields can be dynamically available. This means that, depending on the outcome of a certain expression (this expression is included as attribute information in the CODA product format definition documentation), the field can either contain data or be empty. If a record field is not available, the field will not disappear from the list of record fields (the order and amount of fields in a records always stays the same). However, if you try to access an unavailable field in a record you will get a data type with type class special and special type no data (see below for a description of this special data type). Each of the CODA interfaces also provides a function that allow you to check whether a record field is available and thus has data or not. If a record field does not have an available expression as attribute it is always available and you will thus not have to check explicitly for availability.

Another dynamic property is the bit offset, which is only applicable for data in ascii or binary format. Within CODA, record fields always have the same order, but in a product file this order may sometimes differ. For instance, if a product has a number of data sets, but the data sets can be stored in any order in the product, then within CODA this product will be mapped to a record with each record field corresponding to a data set. The order in which the data sets are enumerated in the CODA record is fixed, but, since the data sets can appear in any order in the product, CODA will assign a dynamic bit offset to each record field. If record fields have no bit offset property, CODA will treat all record fields as being stored in consecutive order in the product file, otherwise the bit offset expression will be used to calculate the relative bit offset from the start of the record.

Another field property, which is more a static than a dynamic property, is the hidden property. In ASCII sections of data files you will often find additional data, such as comments, keywords, whitespace, separation characters, newline characters, etc. CODA will also include descriptions of these data elements in the CODA product format definition, but this kind of data is usually not very interesting for a user to access. For this reason, in CODA, all fields that describe such annotation/filler data are tagged hidden. The result is that if you try to ingest a complete record using one of the high level CODA interfaces (e.g. the CODA fetch functionality), the hidden fields will automatically be filtered out. There is also a global CODA option that can be set that will allow you to include the hidden fields in the CODA ingestions again.

In CODA, bitfields are also often described using record types. A bit field is a set of one or more bytes in which data values are stored that are not a multiple of 8 bits (i.e. not a rounded number of bytes). Bitfields are often used for status flags (i.e. each bit in a byte depicts a different true/false status). Normally a user would have to read the bitfield as an integer and then extract the individual bit sections himself, but since CODA is able to describe data elements down to the bit level, in CODA a bitfield will often be described using a record where each of the fields describe a different bit section of the bitfield.

One of the special versions of the record type is the union. The union data type is a record that can only have one field available at any time. All fields in a union are thus per definition dynamically available. Instead of having an expression per field to determine whether each field is available individually, the union only has a single expression at the record level that immediately gives the index of the available field. Just as for normal records you can still access unavailable union fields, but this will give a data type with type class special and special type no data.

It is possible to encounter records with zero number of fields in CODA. Although you will probably not find such records in the product format definitions, you can for instance retrieve a record without fields when you try to retrieve the attributes for a data element that has no attributes.

array

In CODA it is possible to have multi-dimensional arrays of any data type, including compound data types (i.e. 'arrays of arrays' and 'arrays of records'). An array in CODA is not a property of the underlying data type (as is sometimes the case in other applications) but a data type in itself. This means that arrays can have their own attributes such as descriptions, independent of the data type that is used to describe the array elements.

The most important properties of an array are its number of dimensions and the size for each of the dimensions. The number of dimensions can be 0 or higher (zero-dimensional arrays will often not be found in product format definitions, but you can encounter them when you access data for self describing data formats such as HDF). The size for each of the dimensions can be either fixed or dynamic. If it is dynamic there will be an expression that describes how to calculate the dimension size at runtime.

The number of elements of an array is simply the product of its dimensions (or 1 in case of a zero-dimensional array). A thing to be aware of is that is possible for one of the dimension sizes to be zero and this means the number of elements of an array will thus also be zero.

integer

CODA supports both integers that are stored in binary form and integers that are stored in ascii form. When reading integers the result will be stored in one of the platform specific data types for integers: int8, uint8, int16, uint16, int32, uint32, int64, or uint64. However, CODA also allows reading of integer data as float or double data.

When an integer is stored in ascii format, the size of an ascii integer is always a rounded number of bytes. An ascii integer can, however, also have a size attribute that is variable. This means that CODA will keep reading characters until it finds a character that is no longer a digit (and thus not part of the ascii integer).

For binary integers, CODA supports both integers that are stored in the platform specific 8, 16, 32, or 64 bit data types (either big or little endian, and signed or unsigned), but also integers that are stored using a different number of bits. By default CODA assumes that integer data is stored in big endian format in the file. If the integer is 8, 16, 32, or 64 bits and the data type has the little endian property set, then CODA will read the integer data in little endian form. After reading the integer data, CODA will automatically translate the integer to the endianness of the platform that you are working on.

Integers can have a unit property in the CODA product format definition that describes the unit of the data.

Sometimes integers are stored in a data file using a scaled value. This is done to allow storage of the value using less bits/bytes. In such cases the data type in the CODA product format definition may have a conversion attribute associated with it. This conversion is a fractional value (consisting of a numerator and a denominator) with which the data value is multiplied after it is read. The converted value will always be returned as a double value. A conversion also has a unit attribute that represents the unit of the value after the conversion has been applied. There is a global CODA switch that determines whether conversion attributes are applied or not (by default conversions are enabled). If conversions are enabled and you want to retrieve the unit attribute of an integer data type that has a conversion attribute, the unit that is attached to the conversion will be returned.

real

Real data types represent the floating point numbers. CODA supports real data stored in both ASCII and binary format. For ASCII storage the same remarks as were made for integers apply. For binary real data, CODA only supports IEEE754 32-bits float data and 64-bits double data (both big-endian and little-endian format).

Just as for integer data, real data can also have unit and conversion attributes.

text

Text data can be a single character or a series of characters. CODA does not provide any interpretation of the textual data. This means that CODA neither recognizes nor translates between text-encodings and there is no explicit support for wide characters (e.g. 16-bit unicode characters). CODA just reads string data as is. Note that the CODA C interface does not allow text data with ascii code 0 characters, so in CODA such data will be described using the raw type class.

The most important property of text data is its size. The size is always a rounded number of bytes. The size can also be variable, in which case there will be an expression that CODA uses to calculate the bit-size at runtime. In theory the length of a string can be zero.

There are three special types of text data that have a variable size, but have no expression associated with them. These types are only used in CODA format definitions when a data file is very loosely defined and it is not possible to pre-calculate the string lengths. The three types are the line, line separator, and white space types. The size of line is determined by reading as many characters from a data file until an end-of-line (\r, \n, or \r\n) or end-of-file is encountered. The line data type thus describes a full ascii line without the end-of-line sequence. The line separator is one or two bytes in size and contains either \r, \n, or \r\n. Finally, the white space data type describes a series of spaces (and spaces only, so no tab characters). The white space data ends when a character is encountered that is no longer a space character (or the end of the file is encountered).

Sometimes, as in the case of hidden record fields in ASCII headers, string data will always have the same value (e.g. a keyword always has the same keyword name). For such cases the fixed value attribute can be assigned to a text data type that contains the value that the string should always have. This attribute can also be used by verification tools to validate the actual value of the string data in the product file.

raw

The raw data type is used when a block of data can not be described by any of the other data types. It is often used for data sections that are stored in a compressed format or for which the data definition is not known. It is sometimes also used for sections in a data file that are spare (i.e. a series of bits/bytes that is reserved for future extensions of the product or is left empty in order to align data to a specific byte boundary).

Raw data always has the bytes read type associated to it. The length of the data block can be either static or variable (in which case it is determined by a CODA expression). When a raw data type is used to describe spare blocks of data, sometimes the product format descriptions prescribe a fixed filling mechanism (e.g. all bits must be zero for binary data, or all bytes must be spaces for ascii data). For such cases the CODA product format definition allows the assignment of a fixed value attribute to raw data. This fixed value attribute is similar to the one used for text data (except that for raw data it is also allowed to use ascii code 0 characters in the fixed value and the length of the fixed value does not have to be a rounded set of bytes).

special

The special data types are abstractions that are put on top of other non-special data types. The special types were introduced to make it easier for the user to read certain types of information. Below we will describe each of the special types that are supported by CODA.

Each of the special types has a base type attribute. This attribute contains a description of the data in terms of non-special type classes. In the CODA product format definition documentation for each special type a description of the base type is included.

no data

The no data special type will only be available at runtime and you should never see this in the descriptions in the CODA product format definition documentation. You will encounter a no data data type when you try to access a record field that is not available. The base type of a no data data type is a raw data type with a bit size of 0.

VSF integer

The Variable Scale Factor integer represents a compound data type containing a binary integer scale factor and a binary integer value. CODA is able to return you the value for this compound as a double with the scale factor already applied. CODA applies the scale factor using the following formula: double_value = integer_value * 10^(-scale_factor).

time

The time data type was introduced to provide a common way to represent time values in CODA. The standard representation in CODA for time values is a double value containing the amount of seconds elapsed since 01-Jan-2000 00:00:00 UTC. Time values from before this date are represented by negative values.

CODA has a built-in set of translations for some of the time representations it knows about. When reading one of these time values, CODA will automatically convert it to a double value containing the amount of seconds since 01-Jan-2000 00:00:00.

The time value conversions in CODA do not take into account any leap second correction. The amount of seconds in a year will thus simply be the amount of days times 86400.

complex

The complex special type is simply a sequence of two similarly typed integers or reals, with the first value representing the real valued part and the second representing the imaginary part. The parts of a complex type will always be returned as double values.