Python API
Table of Contents
- muninn
- muninn.archive
- Archive
- register_namespace
- namespace_schema
- namespaces
- register_product_type
- product_type_plugin
- product_types
- register_remote_backend
- remote_backend
- remote_backends
- register_hook_extension
- hook_extension
- hook_extensions
- attach
- auth_file
- cleanup_derived_products
- close
- count
- create_properties
- delete_properties
- derived_products
- destroy
- destroy_catalogue
- export
- export_formats
- generate_uuid
- identify
- ingest
- link
- prepare
- prepare_catalogue
- product_path
- pull
- rebuild_properties
- rebuild_pull_properties
- remove
- retrieve
- retrieve_properties
- root
- search
- source_products
- strip
- summary
- tag
- tags
- unlink
- untag
- update_properties
- verify_hash
- Archive
muninn
config_path
def config_path()
Return the value of the MUNINN_CONFIG_PATH
environment variable.
open
def open(id=None, **kwargs)
Open an archive for the given archive id, by searching for the
corresponding configuration file in the locations found in the
MUNINN_CONFIG_PATH
environment variable.
Alternatively, the id can be a path/url that points directly to a
muninn archive configuration file (Python 3 only). This path/url
should still end with a .cfg
extension.
Arguments:
id
- Archive id (name of configuration file) or path/url to a configuration file
Returns:
An instance of muninn.archive.Archive
list_archives
def list_archives()
Search locations found in the ‘MUNINN_CONFIG_PATH’ environment variable and return names of all found archives.
muninn.archive
Archive Objects
class Archive(object)
Archive class
The Archive class is used to represent and interact with Muninn archives. It provides functionality such as querying existing or ingesting new products. While at the core of the Muninn command-line tools, it can also be used directly.
It is typically instantiated and used as follows:
with muninn.open(archive_name) as archive:
product = archive.ingest(file_path)
Please see the Muninn documentation for details about how to configure a Muninn archive (and also set an environment variable so Muninn can find its configuration file.)
register_namespace
def register_namespace(namespace, schema)
Register a namespace. A valid namespace identifier starts with a lowercase character, and can contain any number of additional lowercase characters, underscores, or digits.
Arguments:
namespace
- Namespace identifier.schema
- Schema definition of the namespace.
namespace_schema
def namespace_schema(namespace)
Return the schema definition of the specified namespace.
namespaces
def namespaces()
Return a list containing all registered namespaces.
register_product_type
def register_product_type(product_type, plugin)
Register a product type.
Arguments:
product_type
- Product type nameplugin
- Reference to an object that implements the product type plugin API and as such takes care of the details of extracting product properties from products of the specified product type.
product_type_plugin
def product_type_plugin(product_type)
Return a reference to the specified product type plugin.
Arguments:
product_type
- Product type name
product_types
def product_types()
Return a list of registered product types.
register_remote_backend
def register_remote_backend(remote_backend, plugin)
Register a remote backend.
Arguments:
remote_backend
- Remote backend name.plugin
- Reference to an object that implements the remote backend plugin API and as such takes care of the details of extracting product properties from products of the specified remote backend.
remote_backend
def remote_backend(remote_backend)
Return a reference to the specified remote backend plugin.
Arguments:
remote_backend
- Remote backend name.
remote_backends
def remote_backends()
Return a list of supported remote backends.
register_hook_extension
def register_hook_extension(hook_extension, plugin)
Register a hook extension.
Arguments:
hook_extension
- Hook extension nameplugin
- Reference to an object that implements the hook extension plugin API
hook_extension
def hook_extension(hook_extension)
Return the hook extension with the specified name.
Arguments:
hook_extension
- Hook extension name
hook_extensions
def hook_extensions()
Return a list of supported hook extensions.
attach
def attach(paths, product_type=None, use_symlinks=None, verify_hash=False, verify_hash_before=False, use_current_path=False, force=False)
Add a product to the archive using an existing metadata record in the database.
This function acts as the inverse of a strip(). A metadata record for this product should already exist in the database and no product should exist for it in the archive.
The existing metadata record is found by performing a search based on product_type and physical_name.
Arguments:
paths
- List of paths pointing to product files.product_type
- Product type of the product to ingest. If left unspecified, an attempt will be made to determine the product type automatically. By default, the product type will be determined automatically.use_symlinks
- If set to True, symbolic links to the original product will be stored in the archive instead of a copy of the original product. If set to None, the value of the corresponding archive wide configuration option will be used. By default, the archive configuration will be used. This option is ignored if use_current_path=True.verify_hash
- If set to True then, after the ingestion, the product in the archive will be matched against the hash from the metadata (only if the metadata contained a hash).verify_hash_before
- If set to True then, before the product is attached to the archive, it will be matched against the metadata hash (if it exists).use_current_path
- Ingest the product by keeping the file(s) at the current path (which must be inside the root directory of the archive). This option is ignored ifingest_product
is False.force
- If set to True, then skip default size check between product and existing metadata.
Returns:
The attached product.
auth_file
def auth_file()
Return the path of the authentication file to download from remote locations.
cleanup_derived_products
def cleanup_derived_products()
Clean up all derived products for which the source products no longer exist, as specified by the cascade rule configured in the respective product type plugins.
Please see the Muninn documentation for more information on how to configure cascade rules.
close
def close()
Close the archive immediately instead of when (and if) the archive instance is collected.
Using the archive after calling this function results in undefined behavior.
count
def count(where="", parameters={})
Return the number of products matching the specified search expression.
Arguments:
where
- Search expression.parameters
- Parameters referenced in the search expression (if any).
create_properties
def create_properties(properties, disable_hooks=False)
Create a record for the given product (as defined by the provided dictionary of properties) in the product catalogue. An important side effect of this operation is that it will fail if:
- The
core.uuid
is not unique within the product catalogue. - The combination of
core.archive_path
andcore.physical_name
is not unique within the product catalogue.
Arguments:
properties
- The product properties.disable_hooks
- Do not execute any hooks.
delete_properties
def delete_properties(where="", parameters={})
Remove properties for one or more products from the catalogue.
This function will not remove any product files from storage and will not trigger any of the specific cascade rules.
Arguments:
where
- Search expression or one or more product uuid(s) or properties.parameters
- Parameters referenced in the search expression.
Returns:
The number of updated products
derived_products
def derived_products(uuid)
Return the UUIDs of the products that are linked to the given product as derived products.
Arguments:
uuid
- Product UUID.
destroy
def destroy()
Completely remove the archive, including both the products and the product catalogue.
Using the archive after calling this function results in undefined behavior. The prepare() function can be used to bring the archive back into a useable state.
destroy_catalogue
def destroy_catalogue()
Completely remove the catalogue database, but leave the datastore in storage untouched.
Using the archive after calling this function results in undefined behavior. Using the prepare_catalogue() function and ingesting all products again, can bring the archive back into a useable state.
export
def export(where="", parameters={}, target_path=os.path.curdir, format=None)
Export one or more products from the archive.
By default, a copy of the original product will be retrieved from the archive. This default behavior can be customized by the product type plugin. For example, the custom implementation for a certain product type might retrieve one or more derived products and bundle them together with the product itself.
Arguments:
where
- Search expression or one or more product uuid(s) or properties.parameters
- Parameters referenced in the search expression.target_path
- Directory in which the retrieved products will be stored.format
- Format in which the products will be exported.
Returns:
Either a list containing the export paths for the exported products (when a search expression or multiple properties/UUIDs were passed), or a single export path.
export_formats
def export_formats()
Return a list of supported alternative export formats.
generate_uuid
@staticmethod
def generate_uuid()
Return a new generated UUID that can be used as UUID for a product metadata record.
identify
def identify(paths)
Determine the product type of the product (specified as a single path, or a list of paths if it is a multi-part product).
Arguments:
paths
- List of paths pointing to product files.
Returns:
The determined product type.
ingest
def ingest(paths, product_type=None, properties=None, ingest_product=True, use_symlinks=None, verify_hash=False, use_current_path=False, force=False)
Ingest a product into the archive. Multiple paths can be specified, but the set of files and/or directories these paths refer to is always ingested as a single logical product.
Product ingestion consists of two steps. First, product properties are extracted from the product and are used to create an entry for the product in the product catalogue. Second, the product itself is ingested, either by copying the product or by creating symbolic links to the product.
If the product to be ingested is already located at the target location within the archive (and there was not already another catalogue entry pointing to it), muninn will leave the product at its location as-is, and won’t try to copy/symlink it.
Arguments:
paths
- List of paths pointing to product files.product_type
- Product type of the product to ingest. If left unspecified, an attempt will be made to determine the product type automatically. By default, the product type will be determined automatically.properties
- Used as product properties if specified. No properties will be extracted from the product in this case.ingest_product
- If set to False, the product itself will not be ingested into the archive, only its properties. By default, the product will be ingested.use_symlinks
- If set to True, symbolic links to the original product will be stored in the archive instead of a copy of the original product. If set to None, the value of the corresponding archive wide configuration option will be used. By default, the archive configuration will be used. This option is ignored ifuse_current_path
is True.verify_hash
- If set to True then, after the ingestion, the product in the archive will be matched against the hash from the metadata (only if the metadata contained a hash).use_current_path
- Ingest the product by keeping the file(s) at the current path (which must be inside the root directory of the archive). This option is ignored ifingest_product
is False.force
- If set to True then any existing product with the same type and name (unique constraint) will be removed before ingestion, including partially ingested products. NB. Depending on product type specific cascade rules, removing a product can result in one or more derived products being removed (or stripped) along with it.
Returns:
The ingested product.
link
def link(uuid, source_uuids)
Link a product to one or more source products.
Arguments:
uuid
- Product UUID.source_uuids
- Source UUIDs.
prepare
def prepare(force=False)
Prepare the archive for (first) use.
The root path will be created and the product catalogue will be initialized such that the archive is ready for use.
Arguments:
force
- If set to True then any existing products and / or product catalogue will be removed.
prepare_catalogue
def prepare_catalogue(dry_run=False)
Prepare the catalogue of the archive for (first) use.
Arguments:
dry_run
- Do not actually execute the preparation commands.
Returns:
The list of SQL commands that (would) have been executed by this function.
product_path
def product_path(uuid_or_properties)
Return the path in storage to the specified product.
Arguments:
uuid_or_properties
- UUID or dictionary of product properties.
pull
def pull(where="", parameters={}, verify_hash=False, verify_hash_download=False)
Pull one or more remote products into the archive.
Products should have a valid remote_url core metadata field and they should not yet exist in the local archive (i.e. the archive_path core metadata field should not be set).
Arguments:
where
- Search expression or one or more product uuid(s) or properties.parameters
- Parameters referenced in the search expression (if any).verify_hash
- If set to True then, after the pull, the product in the archive will be matched against the hash from the metadata (only if the metadata contained a hash).verify_hash_download
- If set to True then, before the product is stored in the archive, the pulled product will be matched against the metadata hash (if it exists).
Returns:
The number of pulled products.
rebuild_properties
def rebuild_properties(uuid, disable_hooks=False, use_current_path=False)
Rebuild product properties by re-extracting these properties (using product type plugins) from the products stored in the archive. Only properties and tags that are returned by the product type plugin will be updated. Other properties or tags will remain as they were.
Arguments:
uuid
- Product UUIDdisable_hooks
- Disable product type hooks (not meant for routine operation).use_current_path
- Do not attempt to relocate the product to the location specified in the product type plugin. Useful for read-only archives.
rebuild_pull_properties
def rebuild_pull_properties(uuid, verify_hash=False, disable_hooks=False, use_current_path=False)
Refresh products by re-running the pull, but using the existing products stored in the archive.
Arguments:
uuid
- Product UUIDverify_hash
- If set to True then the product in the archive will be matched against the hash from the metadata (only if the metadata contained a hash).disable_hooks
- Disable product type hooks (not meant for routine operation).use_current_path
- Do not attempt to relocate the product to the location specified in the product type plugin. Useful for read-only archives.
remove
def remove(where="", parameters={}, force=False, cascade=True)
Remove one or more products from the archive, both from storage as well as from the product catalogue. Return the number of products removed.
NB. Depending on product type specific cascade rules, removing a product can result in one or more derived products being removed (or stripped) along with it. Such products are not included in the returned count.
Arguments:
where
- Search expression or one or more product uuid(s) or properties.parameters
- Parameters referenced in the search expression.force
- If set to True, also remove partially ingested products. This affects products for which a failure occured during ingestion, as well as products in the process of being ingested. Use this option with care.cascade
- Apply cascade rules to strip/remove dependent products.
Returns:
The number of removed products.
retrieve
def retrieve(where="", parameters={}, target_path=os.path.curdir, use_symlinks=False, verify_hash=False)
Retrieve one or more products from the archive.
Arguments:
where
- Search expression or one or more product uuid(s) or properties.parameters
- Parameters referenced in the search expression.target_path
- Directory under which the retrieved products will be stored.use_symlinks
- If set to True, products will be retrieved as symbolic links to the original products kept in the archive. If set to False, products will retrieved as copies of the original products. By default, products will be retrieved as copies.verify_hash
- If set to True then, after the retrieval, the product will be matched against the hash from the metadata (only if the metadata contained a hash).
Returns:
Either a list containing the target paths for the retrieved products (when a search expression or multiple properties/uuids were passed), or a single target path.
retrieve_properties
def retrieve_properties(uuid, namespaces=[], property_names=[])
Return properties for the specified product.
Arguments:
uuid
- Product UUIDnamespaces
- List of namespaces of which the properties should be retrieved. By default, only properties defined in thecore
namespace will be retrieved.
root
def root()
Return the archive root path.
search
def search(where="", order_by=[], limit=None, parameters={}, namespaces=[], property_names=[])
Search the product catalogue for products matching the specified search expression.
Arguments:
where
- Search expression.order_by
- A list of property names that determines the ordering of the results. If the list is empty, the order of the results in undetermined and can very between calls to this function. Each property name in this list can be provided with a ‘+’ or ‘-‘ prefix, or without a prefix. A ‘+’ prefix, or no prefix denotes ascending sort order, a ‘-‘ prefix denotes decending sort order.limit
- Limit the maximum number of results to the specified number.parameters
- Parameters referenced in the search expression.namespaces
- List of namespaces of which the properties should be retrieved. By default, only properties defined in thecore
namespace will be retrieved.property_names
- List of property names that should be returned. By default all properties of thecore
namespace and those of the namespaces in the namespaces argument are included. If this parameter is a non-empty list then only the referenced properties will be returned. Properties are specified as<namespace>.<identifier>
(the namespace can be omitted for thecore
namespace). If theproperty_names
parameter is provided then the namespaces parameter is ignored.
Returns:
A list of matching products.
source_products
def source_products(uuid)
Return the UUIDs of the products that are linked to the given product as source products.
Arguments:
uuid
- Product UUID
strip
def strip(where="", parameters={}, force=False, cascade=True)
Remove one or more products from storage only (not from the product catalogue).
NB. Depending on product type specific cascade rules, stripping a product can result in one or more derived products being stripped (or removed) along with it.
Arguments:
where
- Search expression or one or more product uuid(s) or properties.parameters
- Parameters referenced in the search expression.force
- If set to True, also strip partially ingested products. This affects products for which a failure occured during ingestion, as well as products in the process of being ingested. Use this option with care.cascade
- Apply cascade rules to strip/purge dependent products.
Returns:
The number of stripped products.
summary
def summary(where="", parameters=None, aggregates=None, group_by=None, group_by_tag=False, having=None, order_by=None)
Return a summary of the products matching the specified search expression.
Arguments:
where
- Search expression.parameters
- Parameters referenced in the search expression.aggregates
- A list of property aggregates defined as<property_name>.<reduce_fn>
. Properties need to be of type long, integer, real, text or timestamp. The reduce function can bemin
,max
,sum
, oravg
.sum
andavg
are not possible for text and timestamp properties. A special propertyvalidity_duration
(defined asvalidity_stop
-validity_start
) can also be used.group_by
- A list of property names whose values are used for grouping the aggregation results. There will be a separate result row for each combination ofgroup_by
property values. Properties need to be of type long, integer, boolean, text or timestamp. Timestamps require a binning subscript which can beyear
,month
,yearmonth
, ordate
(e.g.validity_start.yearmonth
).group_by_tag
- If set to True, results will also be grouped by available tag values. Note that products will be counted multiple times if they have multiple tagshaving
- A list of property aggregates defined as<property_name>.<reduce_fn>
; properties need to be of type long, integer, real, text or timestamp; the reduce function can bemin
,max
,sum
, oravg
;sum
andavg
are not possible for text and timestamp properties; a special propertyvalidity_duration
(defined asvalidity_stop
-validity_start
) can also be used.order_by
- A list of result column names that determines the ordering of the results. If the list is empty, the order of the results is ordered by thegroup_by
specification. Each name in the list can have a+
(ascending) or-
(descending) prefix, or no prefix (ascending).
Returns:
A list of row tuples matching the search expression created from the arguments.
tag
def tag(where=None, tags=None, parameters={})
Set one or more tags on one or more product(s).
Arguments:
where
- Search expression or one or more product uuid(s) or properties.tags
- One or more tags.parameters
- Parameters referenced in the search expression.
tags
def tags(uuid)
Return the tags of a product.
Arguments:
uuid
- Product UUID.
unlink
def unlink(uuid, source_uuids=None)
Remove the link between a product and one or more of its source products.
Arguments:
uuid
- Product UUIDsource_uuids
- Source product UUIDs
untag
def untag(where=None, tags=None, parameters={})
Remove one or more tags from one or more product(s).
Arguments:
where
- Search expression or one or more product uuid(s) or properties.tags
- One or more tags (default: None, meaning all existing tags)parameters
- Parameters referenced in the search expression.
update_properties
def update_properties(properties, uuid=None, create_namespaces=False)
Update product properties in the product catalogue. The UUID of the product to update will be taken from the
core.uuid
property if it is present in the specified properties. Otherwise, the UUID should be provided
separately.
This function allows any property to be changed with the exception of the product UUID, and therefore needs to
be used with care. The recommended way to update product properties is to first retrieve them using either
retrieve_properties()
or search()
, change the properties, and then use this function to update the product
catalogue.
Arguments:
properties
- Product propertiesuuid
- UUID of the product to update. By default, the UUID will be taken from thecore.uuid
property.create_namespaces
- Test if all namespaces are already defined for the product, and create them if needed.
verify_hash
def verify_hash(where="", parameters={})
Verify the hash for one or more products in the archive.
Products that are not active or are not in the archive will be skipped. If there is no hash available in the metadata for a product then an error will be raised.
Arguments:
where
- Search expression or one or more product uuid(s) or properties.parameters
- Parameters referenced in the search expression (if any).
Returns:
A list of UUIDs of products for which the verification failed.