Defining a config schema

We will start off by defining how we want our users to configure our extractor. cogex has already created a file with a config class based on the BaseConfig class from extractor-utils. We need to extend this with a list of files the extractor should read from.

Config schemas are defined using data classes and type hints. For each file we want to read we need to know

  • The path to the file. We can use str for this.

  • Which column in the CSV file to use as the key in RAW. We can use str for this too.

  • Which RAW database and table to write to. For this, we will use the pre-built RawDestinationConfig from cognite.extractorutils.configtools.

Our final config class for CSV files looks like the following:

class FileConfig:
    path: str
    key_column: str
    destination: RawDestinationConfig

We now need to update the auto-generated Config class with a list of files to extract from:

class Config(BaseConfig):
    extractor: ExtractorConfig = ExtractorConfig()
    files: List[FileConfig]

This means that our users can now configure the extractor like so:

# This comes from BaseConfig:
  project: publicdata
  api-key: ${COGNITE_API_KEY}

    level: INFO
    path: "debug.log"
    level: DEBUG

# This is from our additions:
  - path: "pumps.csv"
    key-column: serial_number
      database: csv_assets
      table: pumps

  - path: "valves.csv"
    key-column: id
      database: csv_assets
      table: valves