Data Pipeline Management
Data Pipeline is a set of data processing steps that can be run sequentially or in parallel.
On the Bristlecone NEO® Platform, Data Pipelines provide configurable workflows to ingest, process, transform and store the data in the data lake. A Data Pipeline is configured to ingest and process data in a batch/scheduled mode or can be configured to process data in real-time.
Data Pipeline can be configured to include all or any of the following steps:
- Data Ingestion: It is the process of ingesting data from various source systems in an Enterprise.
- Data Pre-Processing: This stage of Data Pipeline Management includes data preprocessing events such as
- Converting the data present in the PDF to text format
- Uncompressing zipped file
- Decrypting an encrypted file
- Data Quality Assessment: This step constitutes the following
- File Validation: This step validates a structured file against a set of configured rules
- Dictionary Validation: This step validates a structured file against a set of configured dictionary
- Data Processing: This step constitutes Data Transformation and Custom Jobs (execution of Data Analytics Models)
- Data Post Processing: This step constitutes the following set of events in the Data Pipeline Management
- Converting data as per external (third party) requirements
- Converting output data to .zip files
- Data Ingestion
- Custom Jobs (executing Data Analytics Models)
This feature enables the Provider Data Engineer to
- Create a Data Pipeline
- Perform Custom Jobs
The Data Pipeline Management in the user interface is divided into two major sections
- Pipeline Management
- Pipeline Summary