Ingestion

The Ingestion describes which Data Partitions and how they should be ingested. This includes parameters such as:

  • Ingestion Job - The implementation of the Loader as a Streaming or Batch Job

  • Schedules - A list of Schedules defined as cron expressions to be used as a catalyst for execution

  • Triggers - A list of Trigger functions to be used as a catalyst for execution

Definition

The code fragment below is an example of an Ingestion definition that Ingests hypothetical Crypto data based on the following requirements:

I want to ingest data from a Crypto data provider using 2 schedules; 1 for high frequency updates during the day and 1 to reconcile all of the data and end of day

I want to data from the Crypto data provider as soon as there are changes to any of the 'BTC' data

//1.
.ingestion(Ingestion.batch("crypto-ingestion", "Crytpo - Ingestion ", CrytpoBatchJob::new)

   .addSchedule(   //2.
      newSchedule("intraday-hourly", "Intraday", "0 5 * ? * *", "UTC")
        .setParameter(newParameterValue("window", "1")
     )
   )
   .addSchedule( //3.
      newSchedule("eod-weekly", "End of Day", "0 5 0 ? * *", "UTC")
         .setParameter(newParameterValue("window", "24"))
      )
    .addTrigger( //4.
       newTrigger("trigger-btc", CryptoTrigger::new).partitionFilterByTag("btc")
    )
)

The above fragment defines an Ingestion with the following Behaviours

  1. Job Type - Creates a new Ingestion Job called "crypto-ingestion" which:

    • Uses the Batch Job Type behaviours

    • Supplies the CrytpoBatchJob class containing the implementation required to perform the Batch Job

  2. Schedule - Adds the Intraday schedule which invokes the CryptoBatchJob that:

    • Executes at 5 minutes past each hour (according to the cron)

    • Utilises a parameter called window with a deaultValue of 1

  3. Schedule - Adds the End of Day schedule which invokes the CrytpoBatchJob that

    • Executes at 5 minutes past midnight every day in UTC

    • Utilises a parameter called window with a defaultValue of 24

  4. Trigger - Configures a Trigger that

    • Utilises the CryptoTrigger function to determine when the CryptoBatchJob gets invoked

    • Applies a Partition Filter that restricts the eligible Data Partitions to those containing the tag btc

Note: The window parameter has no special meaning to the Ingestr Framework and will pass the appropriate value to implementing class as required. The meaning of the value '1' or '24' is only for consideration within the CrytpoBatchJob and would need to handle the interpretation as appropriate, in this case as hours (1 hour vs 24 hours)

Last updated

Was this helpful?