Data Dictionary
This feature enables a Provider Data Scientist /Business Analyst/Data Engineer to create/add/attach/view a Data Dictionary in Bristlecone NEO® Platform
Data Dictionary is a set of metadata used to validate the data ingested in the Data Lake as a step in the pipeline
The Data Dictionary is also used while performing data export to a relational database
It is mandatory to provide Data Dictionary for every column.
Following validation rules are supported by the Data Dictionary feature in the Bristlecone NEO® Platform
- DATA_TYPE: Data type of the column
- MAX_LENGTH: Maximum length allowed for string values
- MAX_PRECISION: Maximum precision allowed for double values
- MAX_SCALE: Maximum scale allowed for double values
- MAX_VAL: Maximum allowed value
- MIN_LENGTH: Minimum length allowed for string values
- MIN_PRECISION: Minimum precision allowed for double values
- MIN_SCALE: Minimum scale allowed for double values
- MIN_VAL: Minimum allowed value
- NULLABLE: if nulls are allowed in the column, set it to true if null check is required
- UNIQUE: Set it to true if the column should only contain unique values
- VALUESET: All the values in the column
The Bristlecone NEO® platform supports two ways of creating a Data Dictionary.
- Creating Dictionary using the User Interface
Refer the user interface and create a schema using the following variables:
- COLUMN_NAME: A mandatory name the row
- COLUMN_DESCRIPTION: An optional field that describes the row
- DATA_TYPE: A mandatory field that describes the data type of the row
- DATA_FORMAT: It describes the data format of the row
- LENGTH: A mandatory field when the data type of a row is either string,character or an integer
- PRECISION: A mandatory field when the data type of a row is either float or decimal
- SCALE: A mandatory field when the data type of a row is either float or decimal
- NULLABLE: An optional field which signifies if the row can have null values
- SEQUENCE: An optional field which describes the hierarchy of a row within the schema file
- PRIMARYKEY: An optional field that signifies a row as a primary key in the schema file
- UNIQUEKEY: An optional field which signifies if whether the row is duplicated in the specific schema file
- ALLOWEDVALUESET: An optional field that denotes the type of values that can be entered the specific row
Example: If the row has currency defined in EUR/USD as the Allowed Value Set then, the row is expected to have numeric that corresponds to currency values.
- TAGS: An optional field that indicates the relation of the Column Name with other rows present in the schema file
- Create Dictionary using schema file
The schema file should contain all the mandatory parameters listed below. A sample schema file in .csv format is depicted below.