This feature enables a Provider Data Scientist /Business Analyst/Data Engineer to create/add/attach/view a Data Dictionary in Bristlecone NEO® Platform

Data Dictionary is a set of metadata used to validate the data ingested in the Data Lake as a step in the pipeline

The Data Dictionary is also used while performing data export to a relational database

It is mandatory to provide Data Dictionary for every column.

Following validation rules are supported by the Data Dictionary feature in the Bristlecone NEO® Platform

  • DATA_TYPE: Data type of the column
  • MAX_LENGTH: Maximum length allowed for string values
  • MAX_PRECISION: Maximum precision allowed for double values
  • MAX_SCALE: Maximum scale allowed for double values
  • MAX_VAL: Maximum allowed value
  • MIN_LENGTH: Minimum length allowed for string values
  • MIN_PRECISION: Minimum precision allowed for double values
  • MIN_SCALE: Minimum scale allowed for double values
  • MIN_VAL: Minimum allowed value
  • NULLABLE: if nulls are allowed in the column, set it to true if null check is required
  • UNIQUE: Set it to true if the column should only contain unique values
  • VALUESET: All the values in the column

The Bristlecone NEO® platform supports two ways of creating a Data Dictionary.

  • Creating Dictionary using the User Interface

Refer the user interface and create a schema using the following variables:

    • COLUMN_NAME: A mandatory name the row
    • COLUMN_DESCRIPTION: An optional field that describes the row
    • DATA_TYPE: A mandatory field that describes the data type of the row
    • DATA_FORMAT: It describes the data format of the row
    • LENGTH: A mandatory field when the data type of a row is either string,character or an integer
    • PRECISION: A mandatory field when the data type of a row is either float or decimal
    • SCALE: A mandatory field when the data type of a row is either float or decimal
    • NULLABLE: An optional field which signifies if the row can have null values
    • SEQUENCE: An optional field which describes the hierarchy of a row within the schema file
    • PRIMARYKEY: An optional field that signifies a row as a primary key in the schema file
    • UNIQUEKEY: An   optional field which signifies if whether the row is duplicated in the specific schema file
    • ALLOWEDVALUESET: An optional field that denotes the type of values that can be entered the specific row

Example: If the row has currency defined in EUR/USD as the Allowed Value Set then, the row is expected to have numeric that corresponds to currency values.

    • TAGS: An optional field that indicates the relation of the Column Name with other rows present in the schema file
  • Create Dictionary using schema file

The schema file should contain all the mandatory parameters listed below. A sample schema file in .csv format is depicted below.