Datastores

Warning

This article covers internal technical details of datastores. The implementation of a datastore may change at any time without warning.

Artifacts

An Artifact is a pickled and gzipped Python object and is written to <datastore-root>/<flow>/artifacts/<hexdigest>.gz. The <hexdigest> used to identify the Artifact is the SHA256 hexdigest of the underlying pickled Python object.

Archives

When an artifact is written in laminar, it is written in two parts. Once as an Archive and once as an Artifact. laminar uses content addressable storage to automatically deduplicate artifacts with the same value across multiple executions.

The Archive schema is

artifacts:
  - hexdigest: str

and is written to <datastore-root>/<flow>/archives/<execution>/<layer>/<index>/<artifact>.json

Because each Layer is assigned a different index, multiple archives can exist for a single Artifact. Archives can also be linked to one or more artifacts, and each Artifact is referenced via a SHA256 hexdigest that makes up the name of each stored Artifact.

Layer.shard() creates one archive with multiple linked artifacts. layers.ForEach creates multiple archives with one or many linked artifacts.

When the a flow’s datastore reads an Archive it knows to create an Accessor if it has more than one Artifact hexdigest. A Layer also knows when multiple archives exist for an Artifact and will create an Accessor across all of them.

Cache

The Datastore also maintains a cache under <datastore-root>/<flow>/.cache. The files in the cache are intended to speed up slow operations.

Archives

When an Artifact with multiple archives is read, it will speed up future accesses by creating a combined Archive at <datastore-root>/<flow>/.cache/<execution>/<layer>/<artifact>.json

Record

After each Layer is executed, it leaves behind a Record. Records detail the execution metadata of a Layer.

The Record schema is:

flow:
  name: str
layer:
  name: str
execution:
  splits: int

and is written to <datastore-root>/<flow>/.cache/<execution>/<layer>/.record.json.