Data Dictionary

This feature enables a Provider Data Scientist /Business Analyst/Data Engineer to create/add/attach/view a Data Dictionary in Bristlecone NEO^® Platform

Data Dictionary is a set of metadata used to validate the data ingested in the Data Lake as a step in the pipeline

The Data Dictionary is also used while performing data export to a relational database

It is mandatory to provide Data Dictionary for every column.

Following validation rules are supported by the Data Dictionary feature in the Bristlecone NEO^® Platform

DATA_TYPE: Data type of the column
MAX_LENGTH: Maximum length allowed for string values
MAX_PRECISION: Maximum precision allowed for double values
MAX_SCALE: Maximum scale allowed for double values
MAX_VAL: Maximum allowed value
MIN_LENGTH: Minimum length allowed for string values
MIN_PRECISION: Minimum precision allowed for double values
MIN_SCALE: Minimum scale allowed for double values
MIN_VAL: Minimum allowed value
NULLABLE: if nulls are allowed in the column, set it to true if null check is required
UNIQUE: Set it to true if the column should only contain unique values
VALUESET: All the values in the column

The Bristlecone NEO® platform supports two ways of creating a Data Dictionary.

Creating Dictionary using the User Interface

Refer the user interface and create a schema using the following variables:

COLUMN_NAME: A mandatory name the row
COLUMN_DESCRIPTION: An optional field that describes the row
DATA_TYPE: A mandatory field that describes the data type of the row
DATA_FORMAT: It describes the data format of the row
LENGTH: A mandatory field when the data type of a row is either string,character or an integer
PRECISION: A mandatory field when the data type of a row is either float or decimal
SCALE: A mandatory field when the data type of a row is either float or decimal
NULLABLE: An optional field which signifies if the row can have null values
SEQUENCE: An optional field which describes the hierarchy of a row within the schema file
PRIMARYKEY: An optional field that signifies a row as a primary key in the schema file
UNIQUEKEY: An optional field which signifies if whether the row is duplicated in the specific schema file
ALLOWEDVALUESET: An optional field that denotes the type of values that can be entered the specific row

Example: If the row has currency defined in EUR/USD as the Allowed Value Set then, the row is expected to have numeric that corresponds to currency values.

TAGS: An optional field that indicates the relation of the Column Name with other rows present in the schema file

Create Dictionary using schema file

The schema file should contain all the mandatory parameters listed below. A sample schema file in .csv format is depicted below.