Getting Started

Installation

As a Library

Install Data-Genie for use in your TypeScript/JavaScript project:

bash

npm install @pujansrt/data-genie

As a CLI Tool

Install Data-Genie globally to use the declarative pipeline runner:

bash

npm install -g @pujansrt/data-genie

Choose Your Path

Data-Genie supports two ways of building ETL pipelines:

Declarative (CLI): Define pipelines in YAML. Fast to write, no code needed.
Programmatic (API): Full control via TypeScript. Best for complex logic.

1. Declarative: YAML Pipelines

Create a pipeline.yaml file:

yaml

pipeline:
  read:
    type: csv
    path: input.csv
  transform:
    ...
  write:
    type: json
    path: output.json

Run it instantly:

bash

data-genie run pipeline.yaml

See the Declarative Pipelines Guide for full documentation on the YAML schema and all supported options.

Interactive Code Generator

Use the interactive generator below to build your TypeScript pipeline:

TypeScript Code

import { CSVReader, Job, JsonWriter } from '@pujansrt/data-genie';

const reader = new CSVReader('input.csv');


const writer = new JsonWriter('output.json');

async function run() {
  const metrics = await Job.run(reader, writer);
  console.log('ETL Finished:', metrics);
}

run().catch(console.error);

2. Programmatic: CSV to JSON

Some features require additional peer dependencies. Only install them if you need that specific functionality:

Zod: npm install zod (for Schema Validation)
AWS S3: npm install @aws-sdk/client-s3 @aws-sdk/lib-storage (for S3 transport)
Excel: npm install exceljs (for XlsxReader/Writer)
Parquet: npm install parquetjs-lite (for ParquetReader/Writer)

Quick Start: CSV to JSON

Convert a CSV file to JSON in just a few lines of code.

typescript

import { CSVReader, JsonWriter, Job } from '@pujansrt/data-genie';

const reader = new CSVReader('users.csv');
const writer = new JsonWriter('output.json');

async function main() {
  // This will process any file size with minimal RAM
  const metrics = await Job.run(reader, writer);
  
  console.log(`--- Job Completed ---`);
  console.log(`Processed: ${metrics.recordCount} records`);
  console.log(`Duration:  ${(metrics.durationMs / 1000).toFixed(2)}s`);
}

main().catch(console.error);

Pro Tip: For large jobs, you can listen to real-time events by instantiating the job: new Job(reader, writer).on('progress', (m) => ...). See the Observability Guide for more.

Previewing Data

Before running a full job, you can preview the first few records to ensure your readers and transformers are working correctly.

typescript

import { CSVReader, Job } from '@pujansrt/data-genie';

const reader = new CSVReader('large_data.csv');

// Displays a beautiful table in the console
await Job.preview(reader, { limit: 5 });

Generating Schemas Instantly

Don't waste time writing schemas for 100-column CSV files. Let Data-Genie infer them for you.

typescript

import { CSVReader, Job } from '@pujansrt/data-genie';

const reader = new CSVReader('complex_data.csv');

// Sample first 1000 records and generate schemas
const schema = await Job.inferSchema(reader);

console.log(schema.typescript); // Ready-to-use Interface
console.log(schema.zod);        // Ready-to-use Validation
console.log(schema.sql);        // Ready-to-use CREATE TABLE

Getting Started ​

Installation ​

As a Library ​

As a CLI Tool ​

Choose Your Path ​

1. Declarative: YAML Pipelines ​

Interactive Code Generator ​

2. Programmatic: CSV to JSON ​

Quick Start: CSV to JSON ​

Previewing Data ​

Generating Schemas Instantly ​

Getting Started

Installation

As a Library

As a CLI Tool

Choose Your Path

1. Declarative: YAML Pipelines

Interactive Code Generator

2. Programmatic: CSV to JSON

Quick Start: CSV to JSON

Previewing Data

Generating Schemas Instantly