Declarative Pipelines (YAML)
Declarative pipelines allow you to define your ETL logic in a simple YAML configuration file. This is the preferred way to use Data-Genie for non-developers, CI/CD pipelines, or when you want to keep your ETL logic separate from your application code.
Quick Start (Minimal Example)
The simplest pipeline converts one format to another:
# pipeline.yaml
pipeline:
read: { type: csv, path: input.csv }
write: { type: json, path: output.json }Run it via the CLI:
data-genie run pipeline.yamlNote: Ensure you have installed the CLI globally via npm i -g @pujansrt/data-genie
Interactive Pipeline Builder
Use the tool below to visually build your pipeline and copy the generated YAML.
job:
name: "Daily Sales Sync"
showProgress: true
pipeline:
read:
type: csv
path: input.csv
write:
type: json
path: output.json
Schema Overview
The following illustrates the full pipeline schema. Mandatory fields are marked with # *.
version: "1.0" # Optional
job: # Optional
name: "Daily Sales Sync"
showProgress: true
pipeline: # *
read: # *
type: csv # * (options: csv, json, ndjson)
path: input.csv # *
options: # Optional
delimiter: ","
hasHeader: true
transform: # Optional (List of steps)
- type: filter
expression: "amount > 100"
- type: rename
mapping:
old_name: new_name
write: # *
type: json # * (options: csv, json, ndjson, console)
path: out.json # * (except for console writer)Readers
The read section defines the source of your data.
CSV Reader
Reads data from a CSV file.
read:
type: csv
path: data.csv
options:
delimiter: ","
hasHeader: trueJSON / NDJSON Reader
Reads data from a JSON or NDJSON file.
read:
type: json
path: data.jsonTransformers
The transform section is an optional list of steps.
Filter
Filters records based on a logical expression.
- type: filter
expression: "age >= 18 && status == 'active'"Rename
Renames one or more fields.
- type: rename
mapping:
firstName: first_name
lastName: last_nameSelect / Remove
- type: select
fields: ["id", "email"]Type Convert
Converts field values to a specific type.
- type: type-convert
fields: ["age", "count"]
to: intPII Masking
Masks sensitive information.
- type: pii-masking
masks:
email: redact # Options: redact, hash, partial, nullWriters
The write section defines where the processed data should be saved.
File Writers (CSV, JSON, NDJSON)
write:
type: csv
path: output.csvConsole Writer
Outputs the records directly to the terminal.
write:
type: consoleCurrent Limitations
While declarative pipelines are convenient, they currently have some limitations compared to the Programmatic API:
- Unsupported Destinations: You cannot yet write to AWS S3 or SQL Databases via YAML. These require the TypeScript/Node.js API.
- Custom Logic: Complex transformations that require custom JavaScript functions (e.g., calling an external API mid-stream) are not supported in YAML.
- Complex Sinks: Advanced writer patterns like
MultiWriter(fan-out) orRetryingWriterare not yet exposed via the declarative schema.
If you need these features, please refer to the Programmatic Pipelines Guide.