Data Pipeline is a set of data processing steps that can be run sequentially or in parallel.

On the Bristlecone NEO® Platform, Data Pipelines provide configurable workflows to ingest, process, transform and store the data in the data lake. A Data Pipeline is configured to ingest and process data in a batch/scheduled mode or can be configured to process data in real-time.

Data Pipeline can be configured to include all or any of the following steps:

  • Data Ingestion: It is the process of ingesting data from various source systems in an Enterprise.
  • Data Pre-Processing: This stage of Data Pipeline Management includes data preprocessing events such as
  • Converting the data present in the PDF to text format
  • Uncompressing zipped file
  • Decrypting an encrypted file  
  • Data Quality Assessment: This step constitutes the following
  • File Validation: This step validates a structured file against a set of configured rules
  • Dictionary Validation: This step validates a structured file against a set of configured dictionary
  • Data Processing: This step constitutes Data Transformation and Custom Jobs (execution of Data Analytics Models)
  • Data Post Processing: This step constitutes the following set of events in the Data Pipeline Management
  • Converting data as per external (third party) requirements
  • Converting output data to .zip files
  • Data Ingestion
  • Custom Jobs (executing Data Analytics Models)

This feature enables the Provider Data Engineer to

  • Create a Data Pipeline
  • Perform Custom Jobs

The Data Pipeline Management in the user interface is divided into two major sections

  • Pipeline Management
  • Pipeline Summary