Data contracts

“APIs for data”

What’s a data contract?

…as an “API for data” it provides interfaces to the properties of a dataset

  • Fundamentals
  • Schema
  • Data quality
  • Support & communication channels
  • Pricing
  • Team
  • Roles
  • Service-level agreement
  • Infrastructures & servers
  • Custom & other properties
    • Data protection

Why have an API for data?


having such an interface helps a data practice organise knowledge
  • data consumers <~> data producers
  • governance function <~> data catalogue
  • interested parties <~> interested parties
data contracts “shift-left” focus on data protection and quality control
governance is understood early in data product development (safety first principle)

What is “shift left”

A Quality Assurance practice, in which protection and quality have an earlier focus in the development of data products (influenced by “Test Driven Development”)

comparison of approaches to quality

comparison of approaches to quality

Data contracts shift protection and quality focus closer to source

How to draw up a contract

  1. Propose a new/changed dataset
    1. Get the schema
      1. Get a contract spec
        1. Apply schema to contract
          1. Add protection rules
            1. Add quality rules
              1. Consult!

1/7 Propose a new/changed dataset🎄

A festive example: LettersToSanta

[
  {
    "DATE": "2024-12-24",
    "NAME": "Ebeneezer Scrooge",
    "UPRN": "6733852",
    "ADDRESS": "TOWER HAMLETS TOWN HALL, 160, WHITECHAPEL ROAD, LONDON, TOWER HAMLETS, E1 1BJ",
    "D.O.B": "1812-01-01",
    "INTERESTS": ["The paranormal", "Festivities", "Society"],
    "MERITS": ["Helping Tiny Tim with his studies", "Organising parties"],
    "DEMERITS": ["None this year"],
    "LETTER": "Dear Santa, Would you recommend a gift Tiny Tim for Christmas, I'm a bit rusty in the gift department but was thinking about a sleigh and wondered if you might recommend one? Many thanks, Ebeneezer."
  }
]

2/7 Get the schema

CREATE TABLE "letters_to_santa" (
   [DATE] TEXT,
   [NAME] TEXT,
   [UPRN] INTEGER,
   [ADDRESS] TEXT,
   [D.O.B] TEXT,
   [INTERESTS/0] TEXT,
   [INTERESTS/1] TEXT,
   [INTERESTS/2] TEXT,
   [MERITS/0] TEXT,
   [DEMERITS/0] TEXT,
   [LETTER] TEXT
);

3/7 Get a contract spec

title: Open Data Contract Standard (ODCS)
description: An open data contract specification to establish agreement between
  data producers and consumers.
type: object
properties:
  version:
    type: string
    description: Current version of the data contract.
  kind:
    type: string
    default: DataContract
    description: The kind of file this is. Valid value is `DataContract`.
    enum:
      - DataContract
  apiVersion:
    type: string
    default: v3.0.0
    description:
      Version of the standard used to build data contract. Default value
      is v3.0.0.
    enum:
      - v3.0.0
      - v2.2.2
      - v2.2.1
      - v2.2.0
  id:
    type: string
    description: A unique identifier used to reduce the risk of dataset name
      collisions, such as a UUID.
  name:
    type: string
    description: Name of the data contract.
  tenant:
    type: string
    description:
      Indicates the property the data is primarily associated with. Value
      is case insensitive.
  tags:
    $ref: "#/$defs/Tags"
  status:
    type: string
    description: Current status of the dataset. Valid values are `production`,
      `test`, or `development`.
    examples:
      - production
      - test
      - development
  servers:
    type: array
    description: List of servers where the datasets reside.
    items:
      $ref: "#/$defs/Server"
  dataProduct:
    type: string
    description: The name of the data product.
  description:
    type: object
    description: High level description of the dataset.
    properties:
      usage:
        type: string
        description: Intended usage of the dataset.
      purpose:
        type: string
        description: Purpose of the dataset.
      limitations:
        type: string
        description: Limitations of the dataset.
  domain:
    type: string
    description: Name of the logical data domain.
    examples:
      - imdb_ds_aggregate
      - receiver_profile_out
      - transaction_profile_out
  schema:
    type: array
    description: A list of elements within the schema to be cataloged.
    items:
      $ref: "#/$defs/SchemaObject"
  support:
    $ref: "#/$defs/Support"
  price:
    $ref: "#/$defs/Pricing"
  team:
    type: array
    items:
      $ref: "#/$defs/Team"
  roles:
    type: array
    description: A list of roles that will provide user access to the dataset.
    items:
      $ref: "#/$defs/Role"
  slaDefaultElement:
    type: string
    description: Element (using the element path notation) to do the checks on.
  slaProperties:
    type: array
    description:
      A list of key/value pairs for SLA specific properties. There is no
      limit on the type of properties (more details to come).
    items:
      $ref: "#/$defs/ServiceLevelAgreementProperty"
  customProperties:
    $ref: "#/$defs/CustomProperties"
  contractCreatedTs:
    type: string
    format: date-time
    description: Timestamp in UTC of when the data contract was created.
required:
  - version
  - apiVersion
  - kind
  - id
  - status
additionalProperties: false
$defs:
  Server:
    type: object
    description: Data source details of where data is physically stored.
    properties:
      server:
        type: string
        description: Identifier of the server.
      type:
        type: string
        description: Type of the server.
        enum:
          - api
          - athena
          - azure
          - bigquery
          - clickhouse
          - databricks
          - denodo
          - dremio
          - duckdb
          - glue
          - cloudsql
          - db2
          - informix
          - kafka
          - kinesis
          - local
          - mysql
          - oracle
          - postgresql
          - postgres
          - presto
          - pubsub
          - redshift
          - s3
          - sftp
          - snowflake
          - sqlserver
          - synapse
          - trino
          - vertica
          - custom
      description:
        type: string
        description: Description of the server.
      environment:
        type: string
        description: Environment of the server.
        examples:
          - prod
          - preprod
          - dev
          - uat
      roles:
        type: array
        description: List of roles that have access to the server.
        items:
          $ref: "#/$defs/Role"
      customProperties:
        $ref: "#/$defs/CustomProperties"
    allOf:
      - if:
          properties:
            type:
              const: api
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/ApiServer"
      - if:
          properties:
            type:
              const: athena
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/AthenaServer"
      - if:
          properties:
            type:
              const: azure
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/AzureServer"
      - if:
          properties:
            type:
              const: bigquery
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/BigQueryServer"
      - if:
          properties:
            type:
              const: clickhouse
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/ClickHouseServer"
      - if:
          properties:
            type:
              const: databricks
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/DatabricksServer"
      - if:
          properties:
            type:
              const: denodo
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/DenodoServer"
      - if:
          properties:
            type:
              const: dremio
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/DremioServer"
      - if:
          properties:
            type:
              const: duckdb
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/DuckdbServer"
      - if:
          properties:
            type:
              const: glue
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/GlueServer"
      - if:
          properties:
            type:
              const: cloudsql
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/GoogleCloudSqlServer"
      - if:
          properties:
            type:
              const: db2
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/IBMDB2Server"
      - if:
          properties:
            type:
              const: informix
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/InformixServer"
      - if:
          properties:
            type:
              const: custom
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/CustomServer"
      - if:
          properties:
            type:
              const: kafka
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/KafkaServer"
      - if:
          properties:
            type:
              const: kinesis
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/KinesisServer"
      - if:
          properties:
            type:
              const: local
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/LocalServer"
      - if:
          properties:
            type:
              const: mysql
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/MySqlServer"
      - if:
          properties:
            type:
              const: oracle
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/OracleServer"
      - if:
          properties:
            type:
              const: postgresql
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/PostgresServer"
      - if:
          properties:
            type:
              const: postgres
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/PostgresServer"
      - if:
          properties:
            type:
              const: presto
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/PrestoServer"
      - if:
          properties:
            type:
              const: pubsub
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/PubSubServer"
      - if:
          properties:
            type:
              const: redshift
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/RedshiftServer"
      - if:
          properties:
            type:
              const: s3
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/S3Server"
      - if:
          properties:
            type:
              const: sftp
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/SftpServer"
      - if:
          properties:
            type:
              const: snowflake
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/SnowflakeServer"
      - if:
          properties:
            type:
              const: sqlserver
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/SqlserverServer"
      - if:
          properties:
            type:
              const: synapse
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/SynapseServer"
      - if:
          properties:
            type:
              const: trino
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/TrinoServer"
      - if:
          properties:
            type:
              const: vertica
          required:
            - type
        then:
          $ref: "#/$defs/ServerSource/VerticaServer"
    required:
      - server
      - type
  ServerSource:
    ApiServer:
      type: object
      title: AthenaServer
      properties:
        location:
          type: string
          format: uri
          description: The url to the API.
          examples:
            - https://api.example.com/v1
      required:
        - location
    AthenaServer:
      type: object
      title: AthenaServer
      properties:
        stagingDir:
          type: string
          format: uri
          description:
            Amazon Athena automatically stores query results and metadata
            information for each query that runs in a query result location that
            you can specify in Amazon S3.
          examples:
            - s3://my_storage_account_name/my_container/path
        schema:
          type: string
          description: Identify the schema in the data source in which your tables exist.
        catalog:
          type: string
          description: Identify the name of the Data Source, also referred to as a Catalog.
          default: awsdatacatalog
        regionName:
          type: string
          description: The region your AWS account uses.
          examples:
            - eu-west-1
      required:
        - staging_dir
        - schema
    AzureServer:
      type: object
      title: AzureServer
      properties:
        location:
          type: string
          format: uri
          description:
            Fully qualified path to Azure Blob Storage or Azure Data Lake
            Storage (ADLS), supports globs.
          examples:
            - az://my_storage_account_name.blob.core.windows.net/my_container/path/*.parquet
            - abfss://my_storage_account_name.dfs.core.windows.net/my_container_name/path/*.parquet
        format:
          type: string
          enum:
            - parquet
            - delta
            - json
            - csv
          description: File format.
        delimiter:
          type: string
          enum:
            - new_line
            - array
          description:
            Only for format = json. How multiple json documents are delimited
            within one file
      required:
        - location
        - format
    BigQueryServer:
      type: object
      title: BigQueryServer
      properties:
        project:
          type: string
          description: The GCP project name.
        dataset:
          type: string
          description: The GCP dataset name.
      required:
        - project
        - dataset
    ClickHouseServer:
      type: object
      title: ClickHouseServer
      properties:
        host:
          type: string
          description: The host of the ClickHouse server.
        port:
          type: integer
          description: The port to the ClickHouse server.
        database:
          type: string
          description: The name of the database.
      required:
        - host
        - port
        - database
    DatabricksServer:
      type: object
      title: DatabricksServer
      properties:
        host:
          type: string
          description: The Databricks host
          examples:
            - dbc-abcdefgh-1234.cloud.databricks.com
        catalog:
          type: string
          description: The name of the Hive or Unity catalog
        schema:
          type: string
          description: The schema name in the catalog
      required:
        - catalog
        - schema
    DenodoServer:
      type: object
      title: DenodoServer
      properties:
        host:
          type: string
          description: The host of the Denodo server.
        port:
          type: integer
          description: The port of the Denodo server.
        database:
          type: string
          description: The name of the database.
      required:
        - host
        - port
    DremioServer:
      type: object
      title: DremioServer
      properties:
        host:
          type: string
          description: The host of the Dremio server.
        port:
          type: integer
          description: The port of the Dremio server.
        schema:
          type: string
          description: The name of the schema.
      required:
        - host
        - port
    DuckdbServer:
      type: object
      title: DuckdbServer
      properties:
        database:
          type: string
          description: Path to duckdb database file.
        schema:
          type: integer
          description: The name of the schema.
      required:
        - database
    GlueServer:
      type: object
      title: GlueServer
      properties:
        account:
          type: string
          description: The AWS Glue account
          examples:
            - 1234-5678-9012
        database:
          type: string
          description: The AWS Glue database name
          examples:
            - my_database
        location:
          type: string
          format: uri
          description: The AWS S3 path. Must be in the form of a URL.
          examples:
            - s3://datacontract-example-orders-latest/data/{model}
        format:
          type: string
          description: The format of the files
          examples:
            - parquet
            - csv
            - json
            - delta
      required:
        - account
        - database
    GoogleCloudSqlServer:
      type: object
      title: GoogleCloudSqlServer
      properties:
        host:
          type: string
          description: The host of the Google Cloud Sql server.
        port:
          type: integer
          description: The port of the Google Cloud Sql server.
        database:
          type: string
          description: The name of the database.
        schema:
          type: string
          description: The name of the schema.
      required:
        - host
        - port
        - database
        - schema
    IBMDB2Server:
      type: object
      title: IBMDB2Server
      properties:
        host:
          type: string
          description: The host of the IBM DB2 server.
        port:
          type: integer
          description: The port of the IBM DB2 server.
        database:
          type: string
          description: The name of the database.
        schema:
          type: string
          description: The name of the schema.
      required:
        - host
        - port
        - database
    InformixServer:
      type: object
      title: InformixServer
      properties:
        host:
          type: string
          description: "The host to the Informix server. "
        port:
          type: integer
          description: The port to the Informix server. Defaults to 9088.
        database:
          type: string
          description: The name of the database.
      required:
        - host
        - database
    CustomServer:
      type: object
      title: CustomServer
      properties:
        account:
          type: string
          description: Account used by the server.
        catalog:
          type: string
          description: Name of the catalog.
        database:
          type: string
          description: Name of the database.
        dataset:
          type: string
          description: Name of the dataset.
        delimiter:
          type: string
          description: Delimiter.
        endpointUrl:
          type: string
          description: Server endpoint.
          format: uri
        format:
          type: string
          description: File format.
        host:
          type: string
          description: Host name or IP address.
        location:
          type: string
          description: A URL to a location.
          format: uri
        path:
          type: string
          description: Relative or absolute path to the data file(s).
        port:
          type: integer
          description: Port to the server. No default value is assumed for custom servers.
        project:
          type: string
          description: Project name.
        region:
          type: string
          description: Cloud region.
        regionName:
          type: string
          description: Region name.
        schema:
          type: string
          description: Name of the schema.
        serviceName:
          type: string
          description: Name of the service.
        stagingDir:
          type: string
          description: Staging directory.
        warehouse:
          type: string
          description: Name of the cluster or warehouse.
    KafkaServer:
      type: object
      title: KafkaServer
      description: Kafka Server
      properties:
        host:
          type: string
          description: The bootstrap server of the kafka cluster.
        format:
          type: string
          description: The format of the messages.
          examples:
            - json
            - avro
            - protobuf
            - xml
          default: json
      required:
        - host
    KinesisServer:
      type: object
      title: KinesisDataStreamsServer
      description: Kinesis Data Streams Server
      properties:
        region:
          type: string
          description: AWS region.
          examples:
            - eu-west-1
        format:
          type: string
          description: The format of the record
          examples:
            - json
            - avro
            - protobuf
    LocalServer:
      type: object
      title: LocalServer
      properties:
        path:
          type: string
          description: The relative or absolute path to the data file(s).
          examples:
            - ./folder/data.parquet
            - ./folder/*.parquet
        format:
          type: string
          description: The format of the file(s)
          examples:
            - json
            - parquet
            - delta
            - csv
      required:
        - path
        - format
    MySqlServer:
      type: object
      title: MySqlServer
      properties:
        host:
          type: string
          description: The host of the MySql server.
        port:
          type: integer
          description: The port of the MySql server.
        database:
          type: string
          description: The name of the database.
      required:
        - host
        - port
        - database
    OracleServer:
      type: object
      title: OracleServer
      properties:
        host:
          type: string
          description: The host to the oracle server
          examples:
            - localhost
        port:
          type: integer
          description: The port to the oracle server.
          examples:
            - 1523
        serviceName:
          type: string
          description: The name of the service.
          examples:
            - service
      required:
        - host
        - port
        - serviceName
    PostgresServer:
      type: object
      title: PostgresServer
      properties:
        host:
          type: string
          description: The host to the Postgres server
        port:
          type: integer
          description: The port to the Postgres server.
        database:
          type: string
          description: The name of the database.
        schema:
          type: string
          description: The name of the schema in the database.
      required:
        - host
        - port
        - database
        - schema
    PrestoServer:
      type: object
      title: PrestoServer
      properties:
        host:
          type: string
          description: The host to the Presto server
          examples:
            - localhost:8080
        catalog:
          type: string
          description: The name of the catalog.
          examples:
            - postgres
        schema:
          type: string
          description: The name of the schema.
          examples:
            - public
      required:
        - host
    PubSubServer:
      type: object
      title: PubSubServer
      properties:
        project:
          type: string
          description: The GCP project name.
      required:
        - project
    RedshiftServer:
      type: object
      title: RedshiftServer
      properties:
        host:
          type: string
          description: An optional string describing the server.
        database:
          type: string
          description: The name of the database.
        schema:
          type: string
          description: The name of the schema.
        region:
          type: string
          description: AWS region of Redshift server.
          examples:
            - us-east-1
        account:
          type: string
          description: The account used by the server.
      required:
        - database
        - schema
    S3Server:
      type: object
      title: S3Server
      properties:
        location:
          type: string
          format: uri
          description: S3 URL, starting with `s3://`
          examples:
            - s3://datacontract-example-orders-latest/data/{model}/*.json
        endpointUrl:
          type: string
          format: uri
          description: The server endpoint for S3-compatible servers.
          examples:
            - https://minio.example.com
        format:
          type: string
          enum:
            - parquet
            - delta
            - json
            - csv
          description: File format.
        delimiter:
          type: string
          enum:
            - new_line
            - array
          description:
            Only for format = json. How multiple json documents are delimited
            within one file
      required:
        - location
    SftpServer:
      type: object
      title: SftpServer
      properties:
        location:
          type: string
          format: uri
          pattern: ^sftp://.*
          description: SFTP URL, starting with `sftp://`
          examples:
            - sftp://123.123.12.123/{model}/*.json
        format:
          type: string
          enum:
            - parquet
            - delta
            - json
            - csv
          description: File format.
        delimiter:
          type: string
          enum:
            - new_line
            - array
          description:
            Only for format = json. How multiple json documents are delimited
            within one file
      required:
        - location
    SnowflakeServer:
      type: object
      title: SnowflakeServer
      properties:
        host:
          type: string
          description: The host to the Snowflake server
        port:
          type: integer
          description: The port to the Snowflake server.
        account:
          type: string
          description: The Snowflake account used by the server.
        database:
          type: string
          description: The name of the database.
        schema:
          type: string
          description: The name of the schema.
        warehouse:
          type: string
          description:
            The name of the cluster of resources that is a Snowflake virtual
            warehouse.
      required:
        - account
        - database
        - schema
    SqlserverServer:
      type: object
      title: SqlserverServer
      properties:
        host:
          type: string
          description: The host to the database server
          examples:
            - localhost
        port:
          type: integer
          description: The port to the database server.
          default: 1433
          examples:
            - 1433
        database:
          type: string
          description: The name of the database.
          examples:
            - database
        schema:
          type: string
          description: The name of the schema in the database.
          examples:
            - dbo
      required:
        - host
        - database
        - schema
    SynapseServer:
      type: object
      title: SynapseServer
      properties:
        host:
          type: string
          description: The host of the Synapse server.
        port:
          type: integer
          description: The port of the Synapse server.
        database:
          type: string
          description: The name of the database.
      required:
        - host
        - port
        - database
    TrinoServer:
      type: object
      title: TrinoServer
      properties:
        host:
          type: string
          description: The Trino host URL.
          examples:
            - localhost
        port:
          type: integer
          description: The Trino port.
        catalog:
          type: string
          description: The name of the catalog.
          examples:
            - hive
        schema:
          type: string
          description: The name of the schema in the database.
          examples:
            - my_schema
      required:
        - host
        - port
        - catalog
        - schema
    VerticaServer:
      type: object
      title: VerticaServer
      properties:
        host:
          type: string
          description: The host of the Vertica server.
        port:
          type: integer
          description: The port of the Vertica server.
        database:
          type: string
          description: The name of the database.
        schema:
          type: string
          description: The name of the schema.
      required:
        - host
        - port
        - database
        - schema
  SchemaElement:
    type: object
    properties:
      name:
        type: string
        description: Name of the element.
      physicalType:
        type: string
        description: The physical element data type in the data source.
        examples:
          - table
          - view
          - topic
          - file
      description:
        type: string
        description: Description of the element.
      businessName:
        type: string
        description: The business name of the element.
      authoritativeDefinitions:
        $ref: "#/$defs/AuthoritativeDefinitions"
      tags:
        $ref: "#/$defs/Tags"
      customProperties:
        $ref: "#/$defs/CustomProperties"
  SchemaObject:
    type: object
    properties:
      logicalType:
        type: string
        description: The logical element data type.
        enum:
          - object
      physicalName:
        type: string
        description: Physical name.
        examples:
          - table_1_2_0
      dataGranularityDescription:
        type: string
        description: Granular level of the data in the object.
        examples:
          - Aggregation by country
      properties:
        type: array
        description: A list of properties for the object.
        items:
          $ref: "#/$defs/SchemaProperty"
      quality:
        $ref: "#/$defs/DataQualityChecks"
    allOf:
      - $ref: "#/$defs/SchemaElement"
    required:
      - name
    unevaluatedProperties: false
  SchemaBaseProperty:
    type: object
    properties:
      primaryKey:
        type: boolean
        description:
          Boolean value specifying whether the element is primary or not.
          Default is false.
      primaryKeyPosition:
        type: integer
        default: -1
        description:
          If element is a primary key, the position of the primary key
          element. Starts from 1. Example of `account_id, name` being primary
          key columns, `account_id` has primaryKeyPosition 1 and `name`
          primaryKeyPosition 2. Default to -1.
      logicalType:
        type: string
        description: The logical element data type.
        enum:
          - string
          - date
          - number
          - integer
          - object
          - array
          - boolean
      logicalTypeOptions:
        type: object
        description: Additional optional metadata to describe the logical type.
      physicalType:
        type: string
        description:
          The physical element data type in the data source. For example,
          VARCHAR(2), DOUBLE, INT.
      required:
        type: boolean
        default: false
        description:
          Indicates if the element may contain Null values; possible values
          are true and false. Default is false.
      unique:
        type: boolean
        default: false
        description:
          Indicates if the element contains unique values; possible values
          are true and false. Default is false.
      partitioned:
        type: boolean
        default: false
        description:
          Indicates if the element is partitioned; possible values are true
          and false.
      partitionKeyPosition:
        type: integer
        default: -1
        description:
          If element is used for partitioning, the position of the partition
          element. Starts from 1. Example of `country, year` being partition
          columns, `country` has partitionKeyPosition 1 and `year`
          partitionKeyPosition 2. Default to -1.
      classification:
        type: string
        description:
          Can be anything, like confidential, restricted, and public to more
          advanced categorization. Some companies like PayPal, use data
          classification indicating the class of data in the element; expected
          values are 1, 2, 3, 4, or 5.
        examples:
          - confidential
          - restricted
          - public
      encryptedName:
        type: string
        description:
          The element name within the dataset that contains the encrypted
          element value. For example, unencrypted element `email_address` might
          have an encryptedName of `email_address_encrypt`.
      transformSourceObjects:
        type: array
        description: List of objects in the data source used in the transformation.
        items:
          type: string
      transformLogic:
        type: string
        description: Logic used in the element transformation.
      transformDescription:
        type: string
        description: Describes the transform logic in very simple terms.
      examples:
        type: array
        description: List of sample element values.
        items:
          $ref: "#/$defs/AnyType"
      criticalDataElement:
        type: boolean
        default: false
        description:
          True or false indicator; If element is considered a critical data
          element (CDE) then true else false.
      quality:
        $ref: "#/$defs/DataQualityChecks"
    allOf:
      - $ref: "#/$defs/SchemaElement"
      - if:
          properties:
            logicalType:
              const: string
        then:
          properties:
            logicalTypeOptions:
              type: object
              properties:
                minLength:
                  type: integer
                  minimum: 0
                  description: Minimum length of the string.
                maxLength:
                  type: integer
                  minimum: 0
                  description: Maximum length of the string.
                pattern:
                  type: string
                  description:
                    Regular expression pattern to define valid value. Follows regular
                    expression syntax from ECMA-262
                    (https://262.ecma-international.org/5.1/#sec-15.10.1).
                format:
                  type: string
                  examples:
                    - password
                    - byte
                    - binary
                    - email
                    - uuid
                    - uri
                    - hostname
                    - ipv4
                    - ipv6
                  description: Provides extra context about what format the string follows.
              additionalProperties: false
      - if:
          properties:
            logicalType:
              const: date
        then:
          properties:
            logicalTypeOptions:
              type: object
              properties:
                format:
                  type: string
                  examples:
                    - yyyy-MM-dd
                    - yyyy-MM-dd HH:mm:ss
                    - HH:mm:ss
                  description:
                    Format of the date. Follows the format as prescribed by [JDK
                    DateTimeFormatter](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html).
                    For example, format 'yyyy-MM-dd'.
                exclusiveMaximum:
                  type: boolean
                  default: false
                  description:
                    If set to true, all values are strictly less than the maximum value
                    (values < maximum). Otherwise, less than or equal to the
                    maximum value (values <= maximum).
                maximum:
                  type: string
                  description:
                    All date values are less than or equal to this value (values <=
                    maximum).
                exclusiveMinimum:
                  type: boolean
                  default: false
                  description:
                    If set to true, all values are strictly greater than the minimum
                    value (values > minimum). Otherwise, greater than or equal
                    to the minimum value (values >= minimum).
                minimum:
                  type: string
                  description:
                    All date values are greater than or equal to this value (values >=
                    minimum).
              additionalProperties: false
      - if:
          anyOf:
            - properties:
                logicalType:
                  const: integer
        then:
          properties:
            logicalTypeOptions:
              type: object
              properties:
                multipleOf:
                  type: number
                  exclusiveMinimum: 0
                  description:
                    Values must be multiples of this number. For example, multiple of 5
                    has valid values 0, 5, 10, -5.
                maximum:
                  type: number
                  description:
                    All values are less than or equal to this value (values <=
                    maximum).
                exclusiveMaximum:
                  type: boolean
                  default: false
                  description:
                    If set to true, all values are strictly less than the maximum value
                    (values < maximum). Otherwise, less than or equal to the
                    maximum value (values <= maximum).
                minimum:
                  type: number
                  description:
                    All values are greater than or equal to this value (values >=
                    minimum).
                exclusiveMinimum:
                  type: boolean
                  default: false
                  description:
                    If set to true, all values are strictly greater than the minimum
                    value (values > minimum). Otherwise, greater than or equal
                    to the minimum value (values >= minimum).
                format:
                  type: string
                  default: i32
                  description:
                    Format of the value in terms of how many bits of space it can use
                    and whether it is signed or unsigned (follows the Rust
                    integer types).
                  enum:
                    - i8
                    - i16
                    - i32
                    - i64
                    - i128
                    - u8
                    - u16
                    - u32
                    - u64
                    - u128
              additionalProperties: false
      - if:
          anyOf:
            - properties:
                logicalType:
                  const: number
        then:
          properties:
            logicalTypeOptions:
              type: object
              properties:
                multipleOf:
                  type: number
                  exclusiveMinimum: 0
                  description:
                    Values must be multiples of this number. For example, multiple of 5
                    has valid values 0, 5, 10, -5.
                maximum:
                  type: number
                  description:
                    All values are less than or equal to this value (values <=
                    maximum).
                exclusiveMaximum:
                  type: boolean
                  default: false
                  description:
                    If set to true, all values are strictly less than the maximum value
                    (values < maximum). Otherwise, less than or equal to the
                    maximum value (values <= maximum).
                minimum:
                  type: number
                  description:
                    All values are greater than or equal to this value (values >=
                    minimum).
                exclusiveMinimum:
                  type: boolean
                  default: false
                  description:
                    If set to true, all values are strictly greater than the minimum
                    value (values > minimum). Otherwise, greater than or equal
                    to the minimum value (values >= minimum).
                format:
                  type: string
                  default: i32
                  description:
                    Format of the value in terms of how many bits of space it can use
                    (follows the Rust float types).
                  enum:
                    - f32
                    - f64
              additionalProperties: false
      - if:
          properties:
            logicalType:
              const: object
        then:
          properties:
            logicalTypeOptions:
              type: object
              properties:
                maxProperties:
                  type: integer
                  minimum: 0
                  description: Maximum number of properties.
                minProperties:
                  type: integer
                  minimum: 0
                  default: 0
                  description: Minimum number of properties.
                required:
                  type: array
                  items:
                    type: string
                  minItems: 1
                  uniqueItems: true
                  description: Property names that are required to exist in the object.
              additionalProperties: false
            properties:
              type: array
              description: A list of properties for the object.
              items:
                $ref: "#/$defs/SchemaProperty"
      - if:
          properties:
            logicalType:
              const: array
        then:
          properties:
            logicalTypeOptions:
              type: object
              properties:
                maxItems:
                  type: integer
                  minimum: 0
                  description: Maximum number of items.
                minItems:
                  type: integer
                  minimum: 0
                  default: 0
                  description: Minimum number of items
                uniqueItems:
                  type: boolean
                  default: false
                  description: If set to true, all items in the array are unique.
              additionalProperties: false
            items:
              $ref: "#/$defs/SchemaItemProperty"
              description:
                "List of items in an array (only applicable when `logicalType:
                array`)."
  SchemaProperty:
    type: object
    $ref: "#/$defs/SchemaBaseProperty"
    required:
      - name
    unevaluatedProperties: false
  SchemaItemProperty:
    type: object
    $ref: "#/$defs/SchemaBaseProperty"
    properties:
      properties:
        type: array
        description: A list of properties for the object.
        items:
          $ref: "#/$defs/SchemaProperty"
    unevaluatedProperties: false
  Tags:
    type: array
    description: A list of tags that may be assigned to the elements (object or
      property); the tags keyword may appear at any level.
    items:
      type: string
  DataQuality:
    type: object
    properties:
      authoritativeDefinitions:
        $ref: "#/$defs/AuthoritativeDefinitions"
      businessImpact:
        type: string
        description: Consequences of the rule failure.
        examples:
          - operational
          - regulatory
      customProperties:
        type: array
        description: Additional properties required for rule execution.
        items:
          $ref: "#/$defs/CustomProperty"
      description:
        type: string
        description: Describe the quality check to be completed.
      dimension:
        type: string
        description: The key performance indicator (KPI) or dimension for data quality.
        enum:
          - accuracy
          - completeness
          - conformity
          - consistency
          - coverage
          - timeliness
          - uniqueness
      method:
        type: string
        examples:
          - reconciliation
      name:
        type: string
        description: Name of the data quality check.
      schedule:
        type: string
        description: Rule execution schedule details.
        examples:
          - 0 20 * * *
      scheduler:
        type: string
        description: The name or type of scheduler used to start the data quality check.
        examples:
          - cron
      severity:
        type: string
        description: The severance of the quality rule.
        examples:
          - info
          - warning
          - error
      tags:
        $ref: "#/$defs/Tags"
      type:
        type: string
        description:
          The type of quality check. 'text' is human-readable text that
          describes the quality of the data. 'library' is a set of maintained
          predefined quality attributes such as row count or unique. 'sql' is an
          individual SQL query that returns a value that can be compared.
          'custom' is quality attributes that are vendor-specific, such as Soda
          or Great Expectations.
        enum:
          - text
          - library
          - sql
          - custom
        default: library
      unit:
        type: string
        description:
          Unit the rule is using, popular values are `rows` or `percent`, but
          any value is allowed.
        examples:
          - rows
          - percent
    allOf:
      - if:
          properties:
            type:
              const: library
        then:
          $ref: "#/$defs/DataQualityLibrary"
      - if:
          properties:
            type:
              const: sql
          required:
            - type
        then:
          $ref: "#/$defs/DataQualitySql"
      - if:
          properties:
            type:
              const: custom
          required:
            - type
        then:
          $ref: "#/$defs/DataQualityCustom"
  DataQualityChecks:
    type: array
    description:
      Data quality rules with all the relevant information for rule setup
      and execution.
    items:
      $ref: "#/$defs/DataQuality"
  DataQualityLibrary:
    type: object
    properties:
      rule:
        type: string
        description: Define a data quality check based on the predefined rules as per ODCS.
        examples:
          - duplicateCount
          - validValues
          - rowCount
      mustBe:
        description:
          Must be equal to the value to be valid. When using numbers, it is
          equivalent to '='.
      mustNotBe:
        description:
          Must not be equal to the value to be valid. When using numbers, it
          is equivalent to '!='.
      mustBeGreaterThan:
        type: number
        description: Must be greater than the value to be valid. It is equivalent to '>'.
      mustBeGreaterOrEqualTo:
        type: number
        description:
          Must be greater than or equal to the value to be valid. It is
          equivalent to '>='.
      mustBeLessThan:
        type: number
        description: Must be less than the value to be valid. It is equivalent to '<'.
      mustBeLessOrEqualTo:
        type: number
        description: Must be less than or equal to the value to be valid. It is
          equivalent to '<='.
      mustBeBetween:
        type: array
        description:
          Must be between the two numbers to be valid. Smallest number first
          in the array.
        minItems: 2
        maxItems: 2
        uniqueItems: true
        items:
          type: number
      mustNotBeBetween:
        type: array
        description:
          Must not be between the two numbers to be valid. Smallest number
          first in the array.
        minItems: 2
        maxItems: 2
        uniqueItems: true
        items:
          type: number
    required:
      - rule
  DataQualitySql:
    type: object
    properties:
      query:
        type: string
        description: Query string that adheres to the dialect of the provided server.
        examples:
          - SELECT COUNT(*) FROM ${table} WHERE ${column} IS NOT NULL
    required:
      - query
  DataQualityCustom:
    type: object
    properties:
      engine:
        type: string
        description: Name of the engine which executes the data quality checks.
        examples:
          - soda
          - great-expectations
          - monte-carlo
          - dbt
      implementation:
        oneOf:
          - type: string
          - type: object
    required:
      - engine
      - implementation
  AuthoritativeDefinitions:
    type: array
    description:
      List of links to sources that provide more details on the dataset;
      examples would be a link to an external definition, a training video, a
      GitHub repo, Collibra, or another tool. Authoritative definitions follow
      the same structure in the standard.
    items:
      type: object
      properties:
        url:
          type: string
          description: URL to the authority.
        type:
          type: string
          description:
            "Type of definition for authority: v2.3 adds standard values:
            `businessDefinition`, `transformationImplementation`,
            `videoTutorial`, `tutorial`, and `implementation`."
          examples:
            - businessDefinition
            - transformationImplementation
            - videoTutorial
            - tutorial
            - implementation
      required:
        - url
        - type
  Support:
    type: array
    description: Top level for support channels.
    items:
      $ref: "#/$defs/SupportItem"
  SupportItem:
    type: object
    properties:
      channel:
        type: string
        description: Channel name or identifier.
      url:
        type: string
        description: Access URL using normal [URL
          scheme](https://en.wikipedia.org/wiki/URL#Syntax) (https, mailto,
          etc.).
      description:
        type: string
        description: Description of the channel, free text.
      tool:
        type: string
        description: Name of the tool, value can be `email`, `slack`, `teams`,
          `discord`, `ticket`, or `other`.
        examples:
          - email
          - slack
          - teams
          - discord
          - ticket
          - other
      scope:
        type: string
        description: "Scope can be: `interactive`, `announcements`, `issues`."
        examples:
          - interactive
          - announcements
          - issues
      invitationUrl:
        type: string
        description:
          Some tools uses invitation URL for requesting or subscribing.
          Follows the [URL scheme](https://en.wikipedia.org/wiki/URL#Syntax).
    required:
      - channel
      - url
  Pricing:
    type: object
    properties:
      priceAmount:
        type: number
        description: Subscription price per unit of measure in `priceUnit`.
      priceCurrency:
        type: string
        description: Currency of the subscription price in `price.priceAmount`.
      priceUnit:
        type: string
        description: The unit of measure for calculating cost. Examples megabyte, gigabyte.
  Team:
    type: object
    properties:
      username:
        type: string
        description: The user's username or email.
      role:
        type: string
        description:
          The user's job role; Examples might be owner, data steward. There
          is no limit on the role.
      dateIn:
        type: string
        format: date
        description: The date when the user joined the team.
      dateOut:
        type: string
        format: date
        description: The date when the user ceased to be part of the team.
      replacedByUsername:
        type: string
        description: The username of the user who replaced the previous user.
  Role:
    type: object
    properties:
      role:
        type: string
        description: Name of the IAM role that provides access to the dataset.
      description:
        type: string
        description: Description of the IAM role and its permissions.
      access:
        type: string
        description: The type of access provided by the IAM role.
      firstLevelApprovers:
        type: string
        description: The name(s) of the first-level approver(s) of the role.
      secondLevelApprovers:
        type: string
        description: The name(s) of the second-level approver(s) of the role.
    required:
      - role
  ServiceLevelAgreementProperty:
    type: object
    properties:
      property:
        type: string
        description:
          Specific property in SLA, check the periodic table. May requires
          units (more details to come).
      value:
        anyOf:
          - type: string
          - type: number
          - type: integer
          - type: boolean
          - type: "null"
        description: Agreement value. The label will change based on the property itself.
      valueExt:
        $ref: "#/$defs/AnyNonCollectionType"
        description:
          Extended agreement value. The label will change based on the
          property itself.
      unit:
        type: string
        description:
          "**d**, day, days for days; **y**, yr, years for years, etc. Units
          use the ISO standard."
      element:
        type: string
        description:
          Element(s) to check on. Multiple elements should be extremely rare
          and, if so, separated by commas.
      driver:
        type: string
        description: "Describes the importance of the SLA from the list of:
          `regulatory`, `analytics`, or `operational`."
        examples:
          - regulatory
          - analytics
          - operational
    required:
      - property
      - value
  CustomProperties:
    type: array
    description: A list of key/value pairs for custom properties.
    items:
      $ref: "#/$defs/CustomProperty"
  CustomProperty:
    type: object
    properties:
      property:
        type: string
        description:
          The name of the key. Names should be in camel case–the same as if
          they were permanent properties in the contract.
      value:
        $ref: "#/$defs/AnyType"
        description: The value of the key.
  AnyType:
    anyOf:
      - type: string
      - type: number
      - type: integer
      - type: boolean
      - type: "null"
      - type: array
      - type: object
  AnyNonCollectionType:
    anyOf:
      - type: string
      - type: number
      - type: integer
      - type: boolean
      - type: "null"

Open Data Contract Standard (ODCS) schema as YAML for readability (see bitol 2024a; JSON schema bitol 2024b)

4/7 Apply schema to contract

# Current version of the data contract.
version: 0.1.0
#  The kind of file this is. Valid value is `DataContract`.
kind: DataContract
# Version of the standard used to build data contract. Default value
apiVersion: v3.0.0
#  A unique identifier used to reduce the risk of dataset name collisions, such as a UUID.
id: DC-0001
# Name of the data contract.
name: Letters to Santa
# Indicates the property the data is primarily associated with. Value is case insensitive.
tenant: North pole
tags:
  - santa
  - elf
# Current status of the dataset. Valid values are `production`, `test`, or `development`.
status: production
# List of servers where the datasets reside.
servers:
  # Data source details of where data is physically stored.
  - server: 'letter-logger'
    type: 'api'
    description: 'a data feed of all letters to santa'
    environment: 'prod'
    location: 'https://api.letters-to-santa.org'
# Top level for support channels.
support:
  - channel: Ticketing
    description: Fault escalation
    url: https://lbth.ivanticloud.com/
# For calculating costs.
price:
  priceAmount: 0.14
  priceCurrency: GBP
  priceUnit: DBU
team:
  - username: elf.foo
    # The user's job role; Examples might be owner, data steward. There is no limit on the role.
    role: Data processor
    dateIn: "2022-12-01"
    dateOut: "2022-12-26"
    replacedByUsername: elf.bar
  - username: elf.bar
    role: Data processor
    dateIn: "2022-10-01"
# A list of roles that will provide user access to the dataset.
roles:
  # Name of the IAM role that provides access to the dataset.
  - role: IAM_LETTERS_TO_SANTA_RED
    # The type of access provided by the IAM role.
    access: read
    # The name(s) of the first-level approver(s) of the role.
    firstLevelApprovers: Reporting Manager
    # The name(s) of the second-level approver(s) of the role.
    secondLevelApprovers: Service Head
slaDefaultElement: letter_to_santa.DATE
slaProperties:
  # Specific property in SLA
  - property: latency
    # Agreement value. The label will change based on the property itself.
    value: 24
    unit: hr
    element: letter_to_santa.DATE
    # "Describes the importance of the SLA from the list of: `regulatory`, `analytics`, or `operational`."
    driver: operational
# RFC 339 formatted timestamp
contractCreatedTs: "2024-12-04T13:30:00.00Z"
# A list of elements within the schema to be cataloged.
schema:
  - name: letters_to_santa 
    physicalName: santa_service_requests
    physicalType: table
    description: annual collection of santa service requests
    tags: [santa, naughty list]
    # Granular level of the data in the object.
    dataGranularityDescription: agregation by date, location.
    properties: 
      - name: DATE
        primaryKey: false
        # The physical element data type in the data source. For example, VARCHAR(2), DOUBLE, INT.
        physicalType:  INTEGER
        # Indicates if the element may contain Null values; possible values are true and false. Default is false.
        required: true
        # True or false indicator; If element is considered a critical data element (CDE) then true else false.
        criticalDataElement: true
        # Can be anything, like confidential, restricted, and public to more advanced categorization. Some companies like PayPal, use data classification indicating the class of data in the element; expected values are 1, 2, 3, 4, or 5.
        classification: public
        # Additional properties required for rule execution.
        customProperties:
          - property: anonymisationStrategy
            value: OpenPseudonymiser
        # Data quality rules with all the relevant information for rule setup and execution.
        quality: 
          - rule: nullCheck
            description: column should not contain null values
            dimension: completeness
            type: library
            severity: error
            # Consequences of the rule failure: operational, regulatory
            businessImpact: operational
          - type: sql 
            query: |
              SELECT COUNT(*) FROM ${table} WHERE ${column} IS NOT NULL
            mustBeGreaterThan: 0
            mustBeLessThan: 6,500,000,000
      - name: NAME
        primaryKey: false
        physicalType:  INTEGER
        required: true
        criticalDataElement: true
        classification: confidential
        customProperties:
          - property: anonymisationStrategy
            value: OpenPseudonymiser
        quality: 
          - rule: nullCheck
            description: column should not contain null values
            dimension: completeness
            type: library
            severity: error
            businessImpact: operational
      - name: UPRN
        primaryKey: false
        physicalType:  INTEGER
        required: true
        criticalDataElement: true
        classification: confidential
        customProperties:
          - property: anonymisationStrategy
            value: OpenPseudonymiser
        quality: 
          - rule: nullCheck
            description: column should not contain null values
            dimension: completeness
            type: library
            severity: error
            businessImpact: operational
      - name: ADDRESS
        primaryKey: false
        physicalType:  INTEGER
        required: true
        criticalDataElement: true
        classification: confidential
        customProperties:
          - property: anonymisationStrategy
            value: OpenPseudonymiser
        quality: 
          - rule: nullCheck
            description: column should not contain null values
            dimension: completeness
            type: library
            severity: error
            businessImpact: operational
      - name: D.O.B
        primaryKey: false
        physicalType:  INTEGER
        required: true
        criticalDataElement: true
        classification: confidential
        customProperties:
          - property: anonymisationStrategy
            value: OpenPseudonymiser
        quality: 
          - rule: nullCheck
            description: column should not contain null values
            dimension: completeness
            type: library
            severity: error
            businessImpact: operational
      - name: INTERESTS/0
        primaryKey: false
        physicalType:  INTEGER
        required: true
        criticalDataElement: true
        classification: restricted
        customProperties:
          - property: anonymisationStrategy
            value: OpenPseudonymiser
        quality: 
          - rule: nullCheck
            description: column should not contain null values
            dimension: completeness
            type: library
            severity: error
            businessImpact: operational
      - name: INTERESTS/1
        primaryKey: false
        physicalType:  INTEGER
        required: true
        criticalDataElement: true
        classification: restricted
        customProperties:
          - property: anonymisationStrategy
            value: OpenPseudonymiser
        quality: 
          - rule: nullCheck
            description: column should not contain null values
            dimension: completeness
            type: library
            severity: error
            businessImpact: operational
      - name: INTERESTS/2
        primaryKey: false
        physicalType:  INTEGER
        required: true
        criticalDataElement: true
        classification: restricted
        customProperties:
          - property: anonymisationStrategy
            value: OpenPseudonymiser
        quality: 
          - rule: nullCheck
            description: column should not contain null values
            dimension: completeness
            type: library
            severity: error
            businessImpact: operational
      - name: MERITS/0
        primaryKey: false
        physicalType:  INTEGER
        required: true
        criticalDataElement: true
        classification: restricted
        customProperties:
          - property: anonymisationStrategy
            value: OpenPseudonymiser
        quality: 
          - rule: nullCheck
            description: column should not contain null values
            dimension: completeness
            type: library
            severity: error
            businessImpact: operational
      - name: DEMERITS/0
        primaryKey: false
        physicalType:  INTEGER
        required: true
        criticalDataElement: true
        classification: restricted
        customProperties:
          - property: anonymisationStrategy
            value: OpenPseudonymiser
        quality: 
          - rule: nullCheck
            description: column should not contain null values
            dimension: completeness
            type: library
            severity: error
            businessImpact: operational
      - name: LETTER
        primaryKey: false
        physicalType:  INTEGER
        required: true
        criticalDataElement: true
        classification: restricted
        customProperties:
          - property: anonymisationStrategy
            value: OpenPseudonymiser
        quality: 
          - rule: nullCheck
            description: column should not contain null values
            dimension: completeness
            type: library
            severity: error
            businessImpact: operational

5/7 Add protection rules

    # FIELD
    properties:
      - name: UPRN

        classification: confidential
        # DATA PROTECTION
        customProperties:
          - property: classificationDetail
            value: UPRNs can be shared across systems,
             but not with other identifying information 
             which can only be shared if UPRNs are 
             subsequently removed from a dataset, or 
             anonymised.
          - property: anonymisationStrategy
            # ENCRYPTION
            value: OpenPseudonymiser*

* see OpenPseudonymiser (2011) and Joe Leach (2024b)

6/7 Add quality rules

    properties:

        quality:
          - rule: nullCheck
            description: column should not contain null values
            dimension: completeness
            type: library
            severity: error
            businessImpact: operational
          - type: sql 
            query: |
              SELECT COUNT(*) FROM ${object} WHERE ${property} IS NOT NULL
            mustBeGreaterThan: 0
            mustBeLessThan: 6,500,000,000

Quality dimensions listed in the ODCS v3 spec

  DataQuality:
    properties:
      dimension:
        type: string
        description: The key performance indicator (KPI) or dimension for data quality.
        enum:
          - accuracy
          - completeness
          - conformity
          - consistency
          - coverage
          - timeliness
          - uniqueness

see Askham et al. (2013) and Joe Leach (2024a)

7/7 Consult!

Design: A data governance network setting showing a data contract (in green) surrounded by six systems: [datasource, pipeline, data lake, transformer, data catalogue, analytics] and four roles: [service head, architect, practitioner, engineer]

Design: A data governance network setting showing a data contract (in green) surrounded by six systems: [datasource, pipeline, data lake, transformer, data catalogue, analytics] and four roles: [service head, architect, practitioner, engineer]

Key: Folder(data lake); paper(data contract); ui(transformer and catalogue); person(staff); datastore(datasource); ui(analytics)

Key: Folder(data lake); paper(data contract); ui(transformer and catalogue); person(staff); datastore(datasource); ui(analytics)

The End…

References

Askham, Nicola, Denise Cook, Martin Doyle, Helen Fereday, Mike Gibson, Ulrich Landbeck, Rob Lee, Chris Maynard, Gary Palmer, and Julian Schwarzenbach. 2013. “The Six Primary Dimensions for Data Quality Assessment.” DAMA UK Working Group, 432–35.
bitol. 2024a. “Open Data Contract Standard — Bitol-Io.github.io.” https://bitol-io.github.io/open-data-contract-standard.
———. 2024b. “Open-Data-Contract-Standard/Schema/Odcs-Json-Schema-Latest.json at Main · Bitol-Io/Open-Data-Contract-Standard — Github.com.” https://github.com/bitol-io/open-data-contract-standard/blob/main/schema/odcs-json-schema-latest.json.
Joe Leach. 2024a. Catch the Quality Line to Quality St. — Joelbth.codeberg.page.” https://joelbth.codeberg.page/data-quality.
———. 2024b. Data Anonymisation — Joelbth.codeberg.page.” https://joelbth.codeberg.page/data-anonymisation.
OpenPseudonymiser. 2011. OpenPseudononymiser — Openpseudonymiser.org.” https://www.openpseudonymiser.org.