Defining a config schema
We will start off by defining how we want our users to configure our extractor. cogex
has already created a
config.py
file with a config class based on the BaseConfig class from extractor-utils. We
need to extend this with a list of files the extractor should read from.
Config schemas are defined using data classes and type hints. For each file we want to read we need to know
The path to the file. We can use
str
for this.Which column in the CSV file to use as the key in RAW. We can use
str
for this too.Which RAW database and table to write to. For this, we will use the pre-built
RawDestinationConfig
fromcognite.extractorutils.configtools
.
Our final config class for CSV files looks like the following:
@dataclass
class FileConfig:
path: str
key_column: str
destination: RawDestinationConfig
We now need to update the auto-generated Config
class with a list of files to extract from:
@dataclass
class Config(BaseConfig):
extractor: ExtractorConfig = ExtractorConfig()
files: List[FileConfig]
This means that our users can now configure the extractor like so:
# This comes from BaseConfig:
cognite:
project: publicdata
idp-authentication:
token-url: ${COGNITE_TOKEN_URL}
client-id: ${COGNITE_CLIENT_ID}
secret: ${COGNITE_CLIENT_SECRET}
scopes:
- ${COGNITE_BASE_URL}/.default
connection:
disable-ssl: False
logging:
console:
level: INFO
file:
path: "debug.log"
level: DEBUG
# This is from our additions:
files:
- path: "pumps.csv"
key-column: serial_number
destination:
database: csv_assets
table: pumps
- path: "valves.csv"
key-column: id
destination:
database: csv_assets
table: valves