Defining a config schema

We will start off by defining how we want our users to configure our extractor. cogex has already created a config.py file with a config class based on the BaseConfig class from extractor-utils. We need to extend this with a list of files the extractor should read from.

Config schemas are defined using data classes and type hints. For each file we want to read we need to know

  • The path to the file. We can use str for this.

  • Which column in the CSV file to use as the key in RAW. We can use str for this too.

  • Which RAW database and table to write to. For this, we will use the pre-built RawDestinationConfig from cognite.extractorutils.configtools.

Our final config class for CSV files looks like the following:

@dataclass
class FileConfig:
    path: str
    key_column: str
    destination: RawDestinationConfig

We now need to update the auto-generated Config class with a list of files to extract from:

@dataclass
class Config(BaseConfig):
    extractor: ExtractorConfig = ExtractorConfig()
    files: List[FileConfig]

This means that our users can now configure the extractor like so:

# This comes from BaseConfig:
cognite:
  project: publicdata

  idp-authentication:
    token-url: ${COGNITE_TOKEN_URL}

    client-id: ${COGNITE_CLIENT_ID}
    secret: ${COGNITE_CLIENT_SECRET}
    scopes:
      - ${COGNITE_BASE_URL}/.default

  connection:
    disable-ssl: False

logging:
  console:
    level: INFO
  file:
    path: "debug.log"
    level: DEBUG

# This is from our additions:
files:
  - path: "pumps.csv"
    key-column: serial_number
    destination:
      database: csv_assets
      table: pumps

  - path: "valves.csv"
    key-column: id
    destination:
      database: csv_assets
      table: valves