CODA Expressions
CODA comes with an advanced expression language that is able to perform calculations on product data. The expression language is primarilay used in the definitions of the product formats to formally describe how variable sizes, offsets, etc. are calculated. However, the expression language is also used within e.g. the codafind and codaeval tools to formulate queries on data products. The expression language provides a means to specify locations in a file, read values from a file and apply calculations on these values.
The expression language is a basic language in ASCII format. In this document we will describe the language and provide a formal definition using the ISO/IEC 14977 EBNF (Extended Backus-Naur Form) format.
The expression language knows 5 different types: integer, float, string, boolean, and node. integer and float represent integer and floating point values, string represents text data, boolean represents logical values (true/false), and node represents locations in a data file.
When we talk about the 'type of an expression' we refer to the type of the value that will result from evaluating the expression. A float expression will thus result in a floating point value and a boolean expression in a true/false value.
Expressions can range from being very simple (specifying just a constant value) to very complex (using various functions/operations to calculate the resulting value). An expression of one type can make use of sub-expressions of another type. For instance, a boolean expresssion can return the result of a comparison of two string expressions.
In addition to expressions for each of the five data types there is also a sixth category of expressions, those that do not return any value. These expressions are referred to as void expressions (i.e. statements).
The CODA expressions can thus be categorized in the following six groups:
- void expressions: statements
- boolean expressions: expressions that return either True or False
- integer expressions: expressions that return a numerical value
- float expressions: expressions that return a floating point value
- string expressions: expressions that return a character string
- node expressions: expressions that refer to data in the product file
It should be noted that the expression language allows arbitrary whitespace (in the form of space characters) between the components of an expression.
void expressions
void expressions are statements. They won't return a value and can only be used at the topmost level of an expression.
operators
There are two void operators. One is the sequence operator ';
'. This operator is used to instruct the sequential execution of two or more statements. The other operator is the assignment operator '=
', which is used to assign values to product variables.
Example:
$count[0] = 100; $count[1] = 200; $count[2] = 300
This will set the first three elements of the 1-dimensional product variable 'count' to 100, 200, and 300 respectively.
Product variables can be either scalars or 1-dimensional arrays. The array subscript '[]
' is only used for product variables that are arrays. For scalar product variables one should use e.g. '$count = 100
'.
functions
for indexvar = integer to integer (step integer) do void
The for
expression executes the void expression after 'do
' in a loop. The loop will terminate as soon as the index 'indexvar
' exceeds the value of the integer expression after 'to
'. The index 'indexvar
' can be any of three variables that can be used for keeping indices: i
, j
, or k
. The step value is optional. The step may be negative, in which case the loop index will decrease with each step. The current value of the loop index can be used inside the void expression using the index integer expression 'i
', 'j
', or 'k
'.
Example:
for i = 0 to 2 do $count[i] = 100 * i
This expression has the same effect as the example provided in the previous section.
boolean expressions
A boolean expression is an expression that results in a true or false value.
constant values
There are two possible constant values 'true
' and 'false
'.
operators
There are three logical operators that operate on boolean values: AND, OR, and NOT ('&&
', '||
', and '!
'). Boolean expressions can be put between braces ( '(
' and ')
' ) in order to guide the evaluation order.
The AND and OR expressions are lazy evaluated. This means that the second argument to AND will not be evaluated if the first argument already evaluated to false, and for OR similarly if the first argument already evaluated to true.
Integer, float, and string expressions can be compared using the following relations: equality ('==
'), inequality ('!=
'), less than ('<
'), less than or equal ('<=
'), greater than ('>
'), or greater than or equal ('>=
'). The relative ordering of strings (to determine whether one string is larger than the other) will be based on the (unsigned) integer value of each byte character.
Integers cannot be compared directly to floats. To compare a float value with an integer value, first convert the integer value to a float using the 'float()' function.
functions
isnan(float)
Returns true if (and only if) the floating point value is NaN (the special 'Not a Number' value).
isinf(float)
Returns true if (and only if) the floating point value is +Inf or -Inf (minus or plus infinity).
ismininf(float)
Returns true if (and only if) the floating point value is -Inf (minus infinity).
isplusinf(float)
Returns true if (and only if) the floating point value is +Inf (plus infinity).
regex(string, string)
Matches a string against a regular expression pattern. The first parameter is the pattern and the second parameter is the string that it should match against.
The function will return true if the pattern matches (and false otherwise).
Note that if you want to use '\' characters in your pattern you should either provide the pattern as a 'raw string' (using the 'r' prefix) or use double escaping. For example, if you want to match for a "word" character (\w) you should either use 'r"\w"' or '"\\w"'
The regular expression engine that is used is PCRE. Please refer to the manual of this software package for an overview of the syntax and options for constructing a pattern. Note that the version of PCRE that ships with CODA has been build using default options. The pattern is compiled with the PCRE_DOTALL
and PCRE_DOLLAR_ENDONLY
options.
exists(node)
With the 'exists
' function it is possible to check whether a path exists inside a product. This can be used to check the availability of optional available record fields or to just check whether a certain path exists at all. The function will only return true if it is possible to traverse the path and false otherwise.
Example:
if(exists(/calibration), int(/calibration/num_values), 0)
This will return the value of the num_values parameter if the calibration data set exists and 0 otherwise.
exists(node, boolean)
This function only works if node points to an array in the product. It will then walk the elements of the array (in ascending order) until it finds an element for which the boolean expression returns true, or, if none of the array elements match, will return false.
Example:
exists(/calibration/value, float(.) > 10.0)
This will return true if any of the elements '/calibration/value[]' has a value > 10, and false otherwise.
all(node, boolean)
This function works just as the previous function, but will return false as soon as it finds an element for which the boolean expression returns false, or, if all elements match, will return true.
Example:
all(/calibration/value, float(.) > 10.0)
This will return true if all of the elements '/calibration/value[]' have a value > 10, and false otherwise.
if(boolean, boolean, boolean)
If the first boolean expression argument evaluates to true the second argument is evaluated and its result returned, otherwise the third argument is evaluated and its result returned.
at(node, boolean)
Evaluate the boolean expression as provided in the second argument with the current node position moved according to the expression from the first argument. This function is recommended if evaluating the boolean expression requires multiple navigations to the same common node position.
with(indexvar = integer, boolean)
Evaluate the boolean expression as provided in the second argument while using the given indexvar
(which can be either i
, j
, or k
) as precalculated/cached value. This is useful if e.g. an expression needs to use an integer value from a product several times and you only want to read the value once.
Example:
with(k = int(/data/intvalue), k > 10 || k < 0)
This will be faster than using 'int(/data/intvalue) > 10 || int(/data/intvalue) < 0
'.
integer expressions
An integer expression is an expression that results in an integer value.
All integer expressions are evaluated using a signed 64 bit integer.
constant values
Constants can be any integer number. However, it should be possible to represent the number by a 64 bit signed integer.
variables
There are two kinds of variables that can be used as integer expression: product variables and index variables. CODA only supports integer variables (i.e. variables can not contain boolean, floating point, or string data).
Product variables can be seen as a caching mechanism for expression results. A product variable is a named scalar or a one-dimensional array that is attached to an open product. Note that product variables can only be referred to if they have been defined in the CODA product format definition for the open product file. The initialisation expression for a product variable is fixed and defined as part of the product format definition. Product variables are initialized the first time a value is requested from it (it will thus not impact the performance of opening a product file).
Product variables are referenced using a '$
' character followed by the name of the product variable. If the product variable is an array then an additional (zero-based) array subscript should also be provided using '[]
'. For example '$foo
' will return the value of the scalar product variable 'foo' and '$bar[10]
' will return the 11-th array element from the 1-dimensional product variable 'bar'.
Inside a for loop or 'with
' function the current value of the loop/with index can be retrieved by using the corresponding 'i
', 'j
', or 'k
' expression. These are called index variables.
operators
The following operations are provided: addition ('+
'), subtraction ('-
'), multiplication ('*
'), division ('/
'), modulo ('%
'), bitwise and ('&
'), and bitwise or ('|
').
A unary '-
' before an integer expression can be used to turn the sign of an integer.
Integer expressions can be put between braces ( '(
' and ')
' ) in order to guide the evaluation order.
functions
int(bool)
Convert the boolean value to an integer value.
This is equivalent to calling 'if(bool, 1, 0)
'
int(node)
Reads the value at the node as an integer. If the node does not point to data that represents an integer value this will result in an error.
When the integer is stored as an unsigned 64 bit integer the value is returned as a signed 64 bit integer by converting all values >= 2^63 into negative values (e.g. 2^64 - 1 becomes -1). Integer values >= 2^64 are not supported.
int(string)
Convert the string to an integer value.
The rules for conversion are the same as for specifying a constant integer value in the expression language.
abs(integer)
Returns the absolute value of an integer as an integer.
max(integer, integer)
Returns the maximum of both integers.
max(node, integer)
This function only works if node points to an array in the product. It will then walk the elements of the array, evaluate the integer expression for each element and return the maximum of those results.
min(integer, integer)
Returns the minimum of both integers.
min(node, integer)
This function only works if node points to an array in the product. It will then walk the elements of the array, evaluate the integer expression for each element and return the minimum of those results.
dim(node, integer)
Will return the size of a specific dimension of the array at the given node.
Example:
numelements(/calibration/value)==dim(/calibration/value, 0) * dim(/calibration/value, 1)
This comparison should return true if '/calibration/value
' is a two dimensional array.
Note that the only CODA expression functions that deal with unflattened array dimensions are 'dim
' and 'num_dims
', all other operations treat arrays as flat arrays.
numdims(node)
Will return the number of dimensions of the array at the given node.
Note that the only CODA expression functions that deal with unflattened array dimensions are 'dim
' and 'num_dims
', all other operations treat arrays as flat arrays.
numelements(node)
Will return the number of fields if the node points to a record or the number of array elements if the node points to an array. If the node points to a scalar the function will return 1.
Example:
numelements(/calibration/value)==int(/calibration/num_values)
This comparison should return true of the product is consistent.
count(node, boolean)
This function is similar to the exists(node, boolean)
function, but will return the total number of array elements for which the boolean expression evaluates to true.
Example:
100.0 * float(count(/calibration/value, float(.) > 10.0))) / float(numelements(/calibration/value))
This calculates the percentage of values for which the value is larger than 10.
add(node, integer)
This function only works if node points to an array in the product. It will then walk the elements of the array, evaluate the integer expression for each element and return the sum of those results.
Example:
float(add(/calibration/value, int(.))) / float(numelements(/calibration/value))
This will calculate the average of the integer values in the value array.
length(string)
This function will return the length (number of characters) of the string argument.
Example:
length("A String")==8
This should always be true.
length(node)
This function only works if node points to string data. It will return the length (number of characters) of the string pointed to by the node.
Example:
length(/some/data)==length(str(/some/data))
This should always be true.
bitsize(node)
This function will return the bit size of the data item pointed to by 'node
'. If the bit size is not available (e.g. for HDF products) this function will return -1.
bytesize(node)
This function will return the byte size (rounded up) of the data item pointed to by 'node
'. If the byte size is not available (e.g. for HDF products) this function will return -1.
productversion()
This function will return the version number of the product format for the file. The version number is a CODA specific version number (the number can be found in the CODA product format definition documentation for the data file).
filesize()
This function will return the size of the file as a number of bytes.
bitoffset(node)
This function will return the bit offset of the data item pointed to by 'node
' relative to the start of the file. If the bit offset is not available (e.g. for HDF products) this function will return -1.
byteoffset(node)
This function will return the byte offset (rounded down) of the data item pointed to by 'node
' relative to the start of the file. If the byte offset is not available (e.g. for HDF products) this function will return -1.
index(node)
This function will return the 0-based index (field number or array element index) that was used to get from the parent node to the current node.
Example:
index(/calibration/value[6])==6
This should always be true.
index(node, boolean)
This function is similar to the exists(node, boolean)
function, but will return the 0-based array index of the first array element for which the boolean expression evaluates to true. If none of the array elements match, the function will return -1.
if(boolean, integer, integer)
If the boolean expression evaluates to true the second argument is evaluated and its result returned, otherwise the third argument is evaluated and its result returned.
Example:
if(float(/calibration/value[0]) < 0.0, -1, 1)
This will return the sign of the first array element.
unboundindex(node, boolean)
This function has the same behavior as the index(node, boolean)
function, but it will not check for the array size. This function only has use for binary/ascii data where data outside the boundary can be reinterpreted as array elements this way. Since arrays are treated as unlimited arrays, you should be careful that the expression will not try to read beyond the boundary of the overall ascii/binary block of data (e.g. the size of the file). You can do this by providing an explicit termination condition (see unboundindex(node, boolean, boolean)
) or by including a check of e.g. the byteoffset
against the filesize
in the boolean expression of the second argument.
You should integrate the termination condition in the boolean expression of the second argument (and not provide an explicit termination condition as third argument) if you want the function to return the number of the last array element when the termination condition is reached.
unboundindex(node, boolean, boolean)
This function is similar to unboundindex(node, boolean)
, but has an additional argument at the end to explicitly provide a termination condition.
You should provide an explicit termination condition if you want the function to return -1 if no elements were found matching the boolean expression of the second argument.
The function will, for each array element, always first evaluate the termination condition. Only if that evaluates to false will the boolean expression of the second argument be evaluated (otherwise the function will return -1).
at(node, integer)
Evaluate the integer expression as provided in the second argument with the current node position moved according to the expression from the first argument. This function is recommended if evaluating the integer expression requires multiple navigations to the same common node position.
with(indexvar = integer, integer)
Evaluate the integer expression as provided in the second argument while using the given indexvar
(which can be either i
, j
, or k
) as precalculated/cached value. This is useful if e.g. an expression needs to use an integer value from a product several times and you only want to read the value once.
Example:
with(k = int(/data/intvalue), if(k > 10, k + 10, k - 5))
This will be faster than using 'if(int(/data/intvalue) > 10, int(/data/intvalue) + 10, int(/data/intvalue) - 5)
', since the value /data/intvalue
will only be read once instead of 3 times.
float expressions
A float expression is an expression that results in a floating point value.
All float expressions are evaluated using a IEE754 double precision floating point value.
Any expression that expects a float expression as input will also except an integer expression. In these cases the resulting integer value will be silently cast to a floating point value (e.g. as if the float(integer)
function was used).
constant values
Floating point constant values should have a '.
' and/or an exponent. The exponent should start with either a 'D
' (fortran style) or 'E
' character (case insensitive). The special values 'nan
', and 'inf
' can be used to represent the IEEE754 special cases not-a-number and infinity.
Examples:
1.0
.1
-1.
1.0E-20
-.09e99
.133000D+03
1e-6
nan
inf
-inf
+inf
operators
The following operations are provided: addition ('+
'), subtraction ('-
'), multiplication ('*
'), division ('/
'), modulo ('%
'), and power ('^
').
A unary '-
' before a float expression can be used to turn the sign of the value.
Float expressions can be put between braces ( '(
' and ')
' ) in order to guide the evaluation order.
functions
float(node)
Reads the value at the node as a floating point value. If the node does not point to data that represents an floating point or integer value this will result in an error.
float(integer)
Convert the integer to a floating point value.
For large integer values this can result in a loss of precision.
float(string)
Convert the string to a floating point value.
The rules for conversion are the same as for specifying a constant floating point value in the expression language.
abs(float)
Returns the absolute value of a floating point value as a floating point value.
ceil(float)
Returns the smallest integral value not less than the argument as a floating point value.
floor(float)
Returns the largest integral value not greater than the argument as a floating point value.
round(float)
Returns the floating point value, rounded to nearest integer (halfway away from zero).
max(float, float)
Returns the maximum of both floating point values.
max(node, float)
This function only works if node points to an array in the product. It will then walk the elements of the array, evaluate the floating point expression for each element and return the maximum of those results.
min(float, float)
Returns the minimum of both floating point values.
min(node, float)
This function only works if node points to an array in the product. It will then walk the elements of the array, evaluate the floating point expression for each element and return the minimum of those results.
time(string, string)
Returns a double value giving the amount of seconds since 2000-01-01T00:00:00.000000 for the date/time string value that is provided as the first argument and using the date/time format pattern that is provided by the second argument.
An overview of the time format patterns is provided in the section date/time format patterns.
add(node, float)
This function only works if node points to an array in the product. It will then walk the elements of the array, evaluate the float expression for each element and return the sum of those results.
Example:
float(add(/calibration/value, float(.))) / float(numelements(/calibration/value))
This will calculate the average of the floating point values in the value array.
if(boolean, float, float)
If the boolean expression evaluates to true the second argument is evaluated and its result returned, otherwise the third argument is evaluated and its result returned.
at(node, float)
Evaluate the float expression as provided in the second argument with the current node position moved according to the expression from the first argument. This function is recommended if evaluating the float expression requires multiple navigations to the same common node position.
with(indexvar = integer, float)
Evaluate the float expression as provided in the second argument while using the given indexvar
(which can be either i
, j
, or k
) as precalculated/cached value. This is useful if e.g. an expression needs to use an integer value from a product several times and you only want to read the value once.
Example:
with(k = int(/data/intvalue), if(k <= 0, 0.0, if(k >= 100, 1.0, k / 100.0)))
This will be faster than using 'if(int(/data/intvalue) <= 0, 0.0, if(int(/data/intvalue) >= 100, 1.0, int(/data/intvalue) / 100.0))
'.
string expressions
A string expression is an expression that results in a character string.
A string value is simply a sequence of bytes. No special character encoding interpretation is used and all byte values (0-255) are allowed for a character (e.g. no special string termination character is applied).
constant values
A string constant is provided by a sequence of printable ASCII characters between double quote characters. The double quote character itself or any characters that are not ASCII printable characters can be included in string constants by using an escape sequence. An escape sequence is a '\' followed by either a character or by a 3 digit octal number of the byte value (for instance, '\060' is equal to 'A'). The following table lists the possible escape sequences that are allowed:
Escape sequence | ASCII Character | Decimal code |
---|---|---|
\a | Bell | 7 |
\b | Backspace | 8 |
\t | Tab | 9 |
\n | Linefeed | 10 |
\v | Vertical tab | 11 |
\f | Formfeed | 12 |
\r | Carriage return | 13 |
\" | " | 34 |
\' | ' | 39 |
\\ | \ | 92 |
\nnn | A byte value equal to the octal number 'nnn' |
Examples:
"Hello World!"
""
"Line 1\nLine 2\n"
"A string with a \000 character"
"How to quote a '\"'?"
By default all escaped characters are interpreted and turned into their single character equivalents. However, if you precede a string constant by the letter 'r' this translation will not be performed and you will get the raw string value. For instance, '"abc \\ \" "' will be translated to 'abc \ " ', whereas 'r"abc \\ \" "' will remain as 'abc \\ \" '.
operators
Strings can be concatenated using the '+
' operator.
functions
str(integer)
Convert the integer value into a string value (using decimal notation).
str(node)
Reads the string value for the data element pointed to by the node parameter.
str(node, integer)
Similar to the previous function, but now limit the read to a maximum number of bytes as indicated by the second argument. Note that the length of the returned string may be less than the provided maximum if the length of the data in the product is less than the indicated maximum.
bytes(node)
Reads the data element pointed to by the node parameter as a raw byte array. This function differs from str(node)
because it is not restricted to reading text data, but can be used independent of the type of data at the node position, as long as the data is stored in an 'ascii' or 'binary' formatted product (e.g. it can be used to read a binary record).
bytes(node, integer)
Similar to the previous function, but now explicitly set the number of bytes that should be read (as indicated by the second argument). The function will always read the given number of bytes, even if this exceeds the length of the data item at the node position. If the number of bytes from the node position till the end of the file is less than the provided maximum, CODA will return an error.
bytes(node, integer, integer)
Similar to the previous function, but now explicitly set both the byte offset relative to the node (as second argument) and the number of bytes that should be read (as third argument). Both the offset and number of bytes may exceed the range of the data at the given node. This means that the offset can be negative and/or the offset+length may exceed that of the data item at the node position. If the range exceeds the boundaries of the file (e.g. global offset is negative or offset+length exceeds the file length), CODA will return an error.
substr(integer, integer, string)
Return a substring of the third argument, where the first argument indicates the 0-based offset and the second argument the length. The values for both the offset and length arguments should be greater or equal to 0 and the sum of the offset and length parameters should not exceed the string length of the third argument.
Example:
"bcd" == substr(1, 3, "abcdef")
This should always be true.
ltrim(string)
Return the string with all white space characters removed from the left (=start of string).
The following characters are considered white space: space, tab, newline, carriage return.
rtrim(string)
Return the string with all white space characters removed from the right (=end of string).
The following characters are considered white space: space, tab, newline, carriage return.
trim(string)
Return the string with all white space characters removed from the left and right (=beginning and end of string).
The following characters are considered white space: space, tab, newline, carriage return.
add(node, string)
This function only works if node points to an array in the product. It will then walk the elements of the array (in ascending order), evaluate the string expression for each element and return the concatenated string of the results.
max(string, string)
Returns the maximum of both string values.
max(node, string)
This function only works if node points to an array in the product. It will then walk the elements of the array, evaluate the string expression for each element and return the maximum of those results.
min(string, string)
Returns the minimum of both string values.
min(node, string)
This function only works if node points to an array in the product. It will then walk the elements of the array, evaluate the string expression for each element and return the minimum of those results.
regex(string, string, integer)
Matches a string against a regular expression pattern. The first parameter is the pattern, the second parameter is the string that it should match against and the third parameter is the index of the substring whos value should be returned.
Note that an index value of 0 will give the full match of the pattern, and indices 1..n are the indices for the substrings.
If no match for the specified substring was found an empty string will be returned.
The regular expression engine that is used is PCRE. Please refer to the manual of this software package for an overview of the syntax and options for constructing a pattern and for an explanation of the concept of 'substrings'. Note that the version of PCRE that ships with CODA has been build using default options. The pattern is compiled with the PCRE_DOTALL
and PCRE_DOLLAR_ENDONLY
options.
For example, regex(r"a+(\d+)", "aaa1234aaa", 0)
will return "aaa1234"
and regex(r"a+(\d+)", "aaa1234aaa", 1)
will return "1234"
.
regex(string, string, string)
Matches a string against a regular expression pattern. The first parameter is the pattern, the second parameter is the string that it should match against and the third parameter is the name of the substring whos value should be returned.
If no match for the specified substring was found an empty string will be returned.
The regular expression engine that is used is PCRE. Please refer to the manual of this software package for an overview of the syntax and options for constructing a pattern and for an explanation of the concept of 'substrings'. Note that the version of PCRE that ships with CODA has been build using default options. The pattern is compiled with the PCRE_DOTALL
and PCRE_DOLLAR_ENDONLY
options.
For example, regex(r"a+(?'foo'\d+)", "aaa1234aaa", "foo")
will return "1234"
.
productformat()
Returns the name of the underlying format (e.g. 'binary', 'xml', 'hdf5') of the file. A file that can be opened by CODA will always have an associated product format.
productclass()
Returns the name of the productclass of the file. If the file does not have a productclass an empty string will be returned.
producttype()
Returns the name of the productype of the file. If the file does not have a producttype an empty string will be returned.
filename()
Returns the filename (not including directory path components, but including the file extension) as a string.
strtime(float)
Returns a string representation of the time value, interpreted as seconds since 2000-01-01T00:00:00.000000, using the date/time format "yyyy-MM-dd'T'HH:mm:ss.SSSSSS".
strtime(float, string)
Returns a string representation of the time value, interpreted as seconds since 2000-01-01T00:00:00.000000, using the date/time format as provided by the second argument.
An overview of the time format patterns is provided in the section date/time format patterns.
if(boolean, string, string)
If the boolean expression evaluates to true the second argument is evaluated and its result returned, otherwise the third argument is evaluated and its result returned.
at(node, string)
Evaluate the string expression as provided in the second argument with the current node position moved according to the expression from the first argument. This function is recommended if evaluating the string expression requires multiple navigations to the same common node position.
Example:
at(/some/data, with(k = length(.), substr(k / 2, k - (k / 2), str(.))))
This will be faster than using 'with(k = length(/some/data), substr(k / 2, k - (k / 2), str(/some/data)))
'.
with(indexvar = integer, string)
Evaluate the string expression as provided in the second argument while using the given indexvar
(which can be either i
, j
, or k
) as precalculated/cached value. This is useful if e.g. an expression needs to use an integer value from a product several times and you only want to read the value once.
Example:
with(k = length(/some/data), substr(k / 2, k - (k / 2), str(/some/data)))
This will be faster than using 'substr(length(/some/data) / 2, length(/some/data) - (length(/some/data) / 2), str(/some/data))
'.
node expressions
A node describes a path into a product file similar to XPath for XML.
How to construct a path (i.e. which identifiers to use) depends entirely on the format of the data product. The basis is that a product is composed of compound elements that can be traversed using identifiers (i.e. fields in a record) or using indices (i.e. elements of an array).
Within CODA the format of field names is restricted to identifiers, which means that the first character should be a-z or A-Z and all subsequent characters should be a-z, A-Z, 0-9, or _. Instead of specifying a field name, it is also possible to provide a zero-based index of the field between '{', '}'. So '/first_field
' and '/{0}
' are equivalent.
Array indices are 0-based indices on the flattened view of an array. This means that if an array is defined as having more than one dimension the index as used in a node expression should be between 0 and the total number of array elements (exclusive). For example, for a [10,8] array, the index should be >= 0 and <= 79. The CODA product format definition document will always show the dimension order in such a way that the last dimension is the fastest running dimension.
A node can be specified as a relative or absolute path. An absolute path starts at the root of the product and the expression starts with either '/
', '[
' (start of a top-level index reference), or '@
' (start of a top-level attribute reference). Relative paths start with either '.
', '..
', or ':
'. When a node starts with '..
' this is just a shorthand for './..
', which refers to the parent node of the current node. The relative paths '.
' and ':
' both refer to the current node position but with a slightly different meaning. The ':
' node will always refer to the node with which the evaluation was started, whereas the location of '.
' will depend on where the expression is used (for instance, if '.
' is used within the evaluation of the second argument of a 'count(node, boolean)
' expression it will refer to the array element in 'node
' that is being evaluated). The node position with which the evaluation is started is either equal to the node to which the expression is attached or, if the expression is not attached to a specific product parameter, it is the root of the product.
There is a special node expression 'asciiline
' that can only be used in expressions for product format definitions for ASCII products. The 'asciiline
' expression can only be used as start point of a node expression and will map the view of a file to an array of strings where each string corresponds with a single line (including end of line character(s)) of the ascii file. For example the expression bitoffset(asciiline[index(asciiline, str(., 1) != "#")])
will give the bitoffset in the file of the first line that does not start with a '#' character.
date/time format patterns
For parsing and printing date/time values in textual format, CODA uses a pattern description that is largely based on the Unicode Technical Standard #35 regarding Date Format Patterns. Note that only a very specific subset of patterns are supported and that the pattern for the abbreviated month deviates from the standard.
In a date/time format pattern certain character series have special meaning and correspond to the various components of a date/time value. Characters that need to be matched literally can be included between two single quote characters ('). Any alphabetical character that is to be treated literally has to be included between quotes. This includes the special characters '|' and '*'. Other characters may be included between quotes (but don't have to). Two single quotes represents a literal single quote, either inside or outside single quotes.
To provide a list of multiple patterns that are tried in sequence until one succeeds, provide the list of patterns separated by '|' (vertical bar) characters in a single format string. When converting a time value to a string, the first pattern of a list will be used for the conversion.
A pattern for a date/time component can have a '*' appended to the format to indicate that it should use leading spaces instead of leading zeros (both for parsing and for printing).
The table below provides an overview of the patterns that are supported in CODA for date/time formats:
Pattern | Description |
---|---|
yyyy | 4-digit year value [0001..9999] |
MM | 2-digit month value [01..12] |
MMM | 3-character abbreviation of the month: "JAN", "FEB", "MAR", "APR", "MAY", "JUN", "JUL", "AUG", "SEP", "OCT", "NOV", "DEC". Lower-case versions of these strings are also accepted when parsing. |
dd | 2-digit day of month value [01..31] |
DDD | 3-digit day of year value [001..366] |
HH | 2-digit hour of day value [00..23] |
mm | 2-digit minute of hour value [00..59] |
ss | 2-digit second of minute value [00..60] Note that a possible leap second value is supported in a string value, but it is not taken into account in a conversion (i.e. it is treated as '00' of the next minute) unless explicitly specified otherwise. |
S SS SSS ... | n-digit fraction of a second. A 3-digit fraction would represent milliseconds and a 6-digit fraction would represent microseconds. Values will be printed truncated (i.e. 12.159 seconds with format ss.SS will be printed as 12.15). Any digits beyond the 6th digit will be ignored when parsing (i.e. values will always be truncated at microsecond resolution). |
Several examples of formats and their string values for the date 4 July 2012 19:32:56.123456:
- "yyyy-MM-dd" : "2012-07-04"
- "yyyy MM* dd*" : "2012 7 4"
- "yyyy-MM-dd'T'HH:mm:ss" : "2012-07-04T19:32:56"
- "dd-MMM-yyyy HH:mm:ss.SSSSSS" : "04-JUL-2012 19:32:56.123456"
- "yyyy DDD" : "2012 186"
formal definition
alpha = 'a'|'b'|'c'|'d'|'e'|'f'|'g'|'h'|'i'|'j'|'k'|'l'|'m'| 'n'|'o'|'p'|'q'|'r'|'s'|'t'|'u'|'v'|'w'|'x'|'y'|'z'| 'A'|'B'|'C'|'D'|'E'|'F'|'G'|'H'|'I'|'J'|'K'|'L'|'M'| 'N'|'O'|'P'|'Q'|'R'|'S'|'T'|'U'|'V'|'W'|'X'|'Y'|'Z' ; character = alpha | digit | ' '|'!'|'"'|'#'|'$'|'%'|'&'|"'"|'('|')'|'*'|'+'|','| '-'|'.'|'/'|':'|';'|'<'|'='|'>'|'?'|'@'|'['|'\'|']'| '^'|'_'|'`'|'{'|'|'|'}'|'~' ; identifier = alpha, [{alpha | digit | '_'}] ; indexvar = 'i'|'j'|'k' digit = '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9' ; sign = '+'|'-' ; intvalue = {digit} ; floatvalue = (intvalue, '.', [intvalue] | '.', intvalue), [('E' | 'e' | 'D' | 'd'), [sign], intvalue] ; stringvalue = '"', [{character-('\', '"') | '\' character}], '"' ; voidexpr = '$' identifier '=' intexpr | '$' identifier '[' intexpr ']' '=' intexpr | voidexpr ';' voidexpr | 'for' indexvar '=' intexpr 'to' intexpr 'do' voidexpr | 'for' indexvar '=' intexpr 'to' intexpr 'step' intexpr 'do' voidexpr | 'for' indexvar '=' intexpr 'to' intexpr 'step' intexpr 'do' voidexpr | 'goto' '(' node ')' | 'with', '(', indexvar, '=', intexpr, ',', voidexpr, ')' ; boolexpr = 'true' | 'false' | boolexpr, '&&', boolexpr | boolexpr, '||', boolexpr | '!', boolexpr | intexpr, '==', intexpr | intexpr, '!=', intexpr | intexpr, '>', intexpr | intexpr, '>=', intexpr | intexpr, '<', intexpr | intexpr, '<=', intexpr | floatexpr, '==', floatexpr | floatexpr, '==', intexpr | intexpr, '==', floatexpr | floatexpr, '!=', floatexpr | floatexpr, '!=', intexpr | intexpr, '!=', floatexpr | floatexpr, '>', floatexpr | floatexpr, '>', intexpr | intexpr, '>', floatexpr | floatexpr, '>=', floatexpr | floatexpr, '>=', intexpr | intexpr, '>=', floatexpr | floatexpr, '<', floatexpr | floatexpr, '<', intexpr | intexpr, '<', floatexpr | floatexpr, '<=', floatexpr | floatexpr, '<=', intexpr | intexpr, '<=', floatexpr | stringexpr, '==', stringexpr | stringexpr, '!=', stringexpr | stringexpr, '>', stringexpr | stringexpr, '>=', stringexpr | stringexpr, '<', stringexpr | stringexpr, '<=', stringexpr | '(', boolexpr, ')' | 'regex', '(', stringexpr, ',', stringexpr ')' | 'exists', '(', node, ')' | 'exists', '(', node, ',', boolexpr, ')' | 'all', '(', node, ',', boolexpr, ')' | 'if', '(' boolexpr, ',', boolexpr, ',', boolexpr, ')' | 'with', '(', indexvar, '=', intexpr, ',', boolexpr, ')' ; intexpr = intvalue | 'int', '(', intexpr, ')' | 'int', '(', boolexpr, ')' | 'int', '(', node, ')' | 'int', '(', stringexpr, ')' | '$', identifier | '$', identifier, '[', intexpr, ']' | indexvar | '-', intexpr | '+', intexpr | intexpr, '+', intexpr | intexpr, '-', intexpr | intexpr, '*', intexpr | intexpr, '/', intexpr | intexpr, '%', intexpr | intexpr, '&', intexpr | intexpr, '|', intexpr | '(', intexpr, ')' | 'abs', '(', intexpr, ')' | 'max', '(', node, ',', intexpr, ')' | 'max', '(', intexpr, ',', intexpr, ')' | 'min', '(', node, ',', intexpr, ')' | 'min', '(', intexpr, ',', intexpr, ')' | 'dim', '(', node, ',', intexpr ')' | 'numdims', '(', node, ')' | 'numelements', '(', node, ')' | 'count', '(' node, ',', boolexpr, ')' | 'add', '(' node, ',', intexpr, ')' | 'length', '(', stringexpr, ')' | 'length', '(', node, ')' | 'bitsize', '(', node, ')' | 'bytesize', '(', node, ')' | 'productversion', '(', ')' | 'filesize', '(', ')' | 'bitoffset', '(', node, ')' | 'byteoffset', '(', node, ')' | 'index', '(', node, ')' | 'index', '(', node, ',', boolexpr, ')' | 'if', '(', boolexpr, ',', intexpr, ',' intexpr, ')' | 'unboundindex', '(', node, ',', boolexpr, ')' | 'unboundindex', '(', node, ',', boolexpr, ',', boolexpr, ')' | 'at', '(', node, ',', intexpr, ')' | 'with', '(', indexvar, '=', intexpr, ',', intexpr, ')' ; floatexpr = floatvalue | 'nan' | 'inf' | 'float', '(', floatexpr, ')' | 'float', '(', node, ')' | 'float', '(', intexpr, ')' | 'float', '(', stringexpr, ')' | '-', floatexpr | '+', floatexpr | floatexpr, '+', floatexpr | floatexpr, '+', intexpr | intexpr, '+', floatexpr | floatexpr, '-', floatexpr | floatexpr, '-', intexpr | intexpr, '-', floatexpr | floatexpr, '*', floatexpr | floatexpr, '*', intexpr | intexpr, '*', floatexpr | floatexpr, '/', floatexpr | floatexpr, '/', intexpr | intexpr, '/', floatexpr | floatexpr, '%', floatexpr | floatexpr, '%', intexpr | intexpr, '%', floatexpr | floatexpr, '^', floatexpr | floatexpr, '^', intexpr | intexpr, '^', floatexpr | intexpr, '^', intexpr | '(', floatexpr, ')' | 'abs', '(', floatexpr, ')' | 'ceil', '(', floatexpr, ')' | 'floor', '(', floatexpr, ')' | 'round', '(', floatexpr, ')' | 'max', '(', node, ',', floatexpr, ')' | 'max', '(', floatexpr, ',', floatexpr, ')' | 'max', '(', floatexpr, ',', intexpr, ')' | 'max', '(', intexpr, ',', floatexpr, ')' | 'min', '(', node, ',', floatexpr, ')' | 'min', '(', floatexpr, ',', floatexpr, ')' | 'min', '(', floatexpr, ',', intexpr, ')' | 'min', '(', intexpr, ',', floatexpr, ')' | 'time', '(', stringepxr, ',', stringexpr, ')' | 'add', '(' node, ',', floatexpr, ')' | 'if', '(' boolexpr, ',', floatexpr, ',', floatexpr, ')' | 'if', '(' boolexpr, ',', floatexpr, ',', intexpr, ')' | 'if', '(' boolexpr, ',', intexpr, ',', floatexpr, ')' | 'with', '(', indexvar, '=', intexpr, ',', floatexpr, ')' ; stringexpr = stringvalue | 'r', stringvalue | 'str', '(', stringexpr, ')' | 'str', '(', node, ')' | 'str', '(', node, ',', intexpr, ')' | 'bytes', '(', node, ')' | 'bytes', '(', node, ',', intexpr, ')' | 'bytes', '(', node, ',', intexpr, ',', intexpr, ')' | stringexpr, '+', stringexpr | 'substr', '(', intexpr, ',', intexpr, ',', stringexpr, ')' | 'ltrim', '(', stringexpr, ')' | 'rtrim', '(', stringexpr, ')' | 'trim', '(', stringexpr, ')' | 'add', '(', node, ',', stringexpr, ')' | 'max', '(', node, ',', stringexpr, ')' | 'max', '(', stringexpr, ',', stringexpr, ')' | 'min', '(', node, ',', stringexpr, ')' | 'min', '(', stringexpr, ',', stringexpr, ')' | 'regex', '(', stringexpr, ',', stringexpr, ',', intexpr ')' | 'regex', '(', stringexpr, ',', stringexpr, ',', stringexpr ')' | 'productformat', '(', ')' | 'productclass', '(', ')' | 'producttype', '(', ')' | 'filename', '(', ')' | 'strtime', '(', intexpr, ')' | 'strtime', '(', floatexpr, ')' | 'strtime', '(', intexpr, ',', stringexpr ')' | 'strtime', '(', floatexpr, ',', stringexpr ')' | 'if', '(' boolexpr, ',', stringexpr, ',', stringexpr, ')' | 'with', '(', indexvar, '=', intexpr, ',', stringexpr, ')' ; rootnode = '/' nonrootnode = '.' | ':' | '..' | 'asciiline' | rootnode, identifier | rootnode, '{', intexpr, '}' | nonrootnode, '/..' | nonrootnode, '/', identifier | nonrootnode, '/', '{', intexpr, '}' | '[', intexpr, ']' | node, '[', intexpr, ']' | '@', identifier | '@', '{', intexpr, '}' | node, '@', identifier | node, '@', '{', intexpr, '}' ; node = rootnode | nonrootnode ;