Data Organization offers a structured approach for storing and cataloging the data in a data lake. A lack of data organization could result in the Data Lake turning into a data swamp over a period.

Data Lake Zones

Within a Data Lake, zones allow the logical and physical separation of data that keeps the environment secure, organized, and Agile. Each file on the Bristlecone NEO® Platform, based on the stage of processing, can be in the following zones.

Raw Zone: A file is in the Raw zone when it is ingested from a customer system/external data source to Bristlecone NEO® Data Lake. Every file begins in the Raw zone.

Staged Zone: During the data processing (in a Data Pipeline) the intermediate results are stored in the bucket called "Staged Zone".

Processed Zone: A file that has undergone all possible data processing steps as configured in the data pipeline and that is assumed to be ready for downstream applications to consume is stored in the bucket called "Processed Zone"

Data Organization Process Flow

Data Organization Work Flow

The dashboard of Data Organization is as shown below