Concepts
One of the core concepts in the Ingestr Framework is the way how the framework opinionates about the way data should be organised and described.
There are some core entities that have a relationship that is described below:
Collection - A grouping of 1 or more related Descriptors
Descriptor - A template describing the needed attributes and behaviours of like Data Sets
Data Set - Represents a uniquely identifiable set of data that will be targetted that conforms to the Descriptor
Descriptor
The Descriptor is a definition of how a future Data Set must be defined, and execute the associated behaviours when processing data.
The Descriptor defines what fields a Data Set should have, and the shared behaviours, but the actual values for these fields are defined on the Data Set.
Descriptor enforces that all related Data Sets contain a Key called Customer ID, but the Data Set contains the value Customer ID = 190
Data Set
A Data Set is a collection of data that shares the same structure (schema) and business context. Data Sets have the following attributes:
Uniquely identifiable by 1 or more Keys (e.g. a Primary Key, or a even the name of a Database Table)
Is described by exactly 1 Descriptor (see above)
Is targetted by 1 or more Ingestions (see below)
Contains an Offset (see below) which adjusts according to the execution of each Ingestion
Offset
The offset represents an arbitrary point within a Data Partition that represents where data processing is currently up to:
Identifiable by 1 or more Offset Keys (E.g. a Timestamp)
Offset is adjusted only if the new Offset represents a new Offset that the previous (i.e. can only move forward and not backwards)
Typically the Offset would be a Timestamp like the "Last Modified" or an "Updated At" but it is possible to use any set of values as the Offset Key.
Last updated
Was this helpful?