Data Pipeline Management

Data Pipeline is a set of data processing steps that can be run sequentially or in parallel.

On the Bristlecone NEO® Platform, Data Pipelines provide configurable workflows to ingest, process, transform and store the data in the data lake. A Data Pipeline is configured to ingest and process data in a batch/scheduled mode or can be configured to process data in real-time.

Data Pipeline can be configured to include all or any of the following steps:

Data Ingestion: It is the process of ingesting data from various source systems in an Enterprise.
Data Pre-Processing: This stage of Data Pipeline Management includes data preprocessing events such as

Converting the data present in the PDF to text format
Uncompressing zipped file
Decrypting an encrypted file

Data Quality Assessment: This step constitutes the following

File Validation: This step validates a structured file against a set of configured rules
Dictionary Validation: This step validates a structured file against a set of configured dictionary
Data Processing: This step constitutes Data Transformation and Custom Jobs (execution of Data Analytics Models)
Data Post Processing: This step constitutes the following set of events in the Data Pipeline Management

Converting data as per external (third party) requirements
Converting output data to .zip files
Data Ingestion
Custom Jobs (executing Data Analytics Models)

This feature enables the Provider Data Engineer to

Create a Data Pipeline
Perform Custom Jobs

The Data Pipeline Management in the user interface is divided into two major sections

Pipeline Management
Pipeline Summary