Concepts

One of the core concepts in the Ingestr Framework is the way how the framework opinionates about the way data should be organised and described.

There are some core entities that have a relationship that is described below:

Core Entities and relationships in the Ingestr Framework
  • Collection - A grouping of 1 or more related Descriptors

  • Descriptor - A template describing the needed attributes and behaviours of like Data Sets

  • Data Set - Represents a uniquely identifiable set of data that will be targetted that conforms to the Descriptor

Conceptual View of Ingestr Entities and Relationships

Descriptor

The Descriptor is a definition of how a future Data Set must be defined, and execute the associated behaviours when processing data.

The Descriptor defines what fields a Data Set should have, and the shared behaviours, but the actual values for these fields are defined on the Data Set.

Descriptor enforces that all related Data Sets contain a Key called Customer ID, but the Data Set contains the value Customer ID = 190

Data Set

A Data Set is a collection of data that shares the same structure (schema) and business context. Data Sets have the following attributes:

  • Uniquely identifiable by 1 or more Keys (e.g. a Primary Key, or a even the name of a Database Table)

  • Is described by exactly 1 Descriptor (see above)

  • Is targetted by 1 or more Ingestions (see below)

  • Contains an Offset (see below) which adjusts according to the execution of each Ingestion

Offset

The offset represents an arbitrary point within a Data Partition that represents where data processing is currently up to:

  • Identifiable by 1 or more Offset Keys (E.g. a Timestamp)

  • Offset is adjusted only if the new Offset represents a new Offset that the previous (i.e. can only move forward and not backwards)

Typically the Offset would be a Timestamp like the "Last Modified" or an "Updated At" but it is possible to use any set of values as the Offset Key.

Last updated

Was this helpful?