Overview

ETLBox provides Streaming Connectors that enable efficient data movement between files, web services, and cloud storage. These connectors allow you to read, process, and write various data formats, supporting both ocal file storage and remote HTTP or cloud-based endpoints.

Supported Formats

ETLBox offers streaming connectors for the following formats:

  • CSV – Structured text with delimiters
  • JSON – Flexible structured data
  • XML – Hierarchical structured data
  • Excel – Spreadsheet-based tabular data
  • Text – Unstructured or custom-formatted text
  • Parquet – Optimized columnar storage

Each format has a corresponding source (for reading) and destination (for writing), allowing seamless integration into ETL workflows.

Installation

Each format has a dedicated connector package that must be included alongside the core ETLBox package. For example, to work with JSON, install the following package:

Connector-Specific Highlights

FormatPackageSpecial Features
CSVETLBox.CsvUses CsvHelper, supports delimiter configuration
JSONETLBox.JsonSupports nested structures, JSONPath queries
XMLETBox.XmlHandles attributes & elements, namespace support
ExcelETLBox.ExcelReads sheets, column mapping, limited row capacity
TextETLBox.TextCustom parsing via ParseLineAction
ParquetETLBox.ParquetOptimized for big data, column-based storage

If you need to read binary data, consider using CustomSource and CustomDestination. These connectors allow you to implement custom parsing logic for non-standard data formats.

Key Features

Unified Streaming Model

All connectors share a common architecture, meaning you can use the same API structure to read or write different formats.

For example, switching from a local CSV file to a JSON-based web API requires minimal code changes:

CsvSource source = new CsvSource("data.csv");
source.ResourceType = ResourceType.File;

// Switching to JSON API
JsonSource apiSource = new JsonSource("https://api.example.com/data");
apiSource.ResourceType = ResourceType.Http;

Flexible Resource Types

Each connector can work with multiple resource types:

  • File-based: Local storage or network shares (e.g., "C:/data/file.csv")
  • HTTP-based: REST APIs (GET/POST) (e.g., "https://api.example.com/data")
  • Azure Blob Storage: Cloud storage integration

Example: Reading from Azure Blob Storage

CsvSource source = new CsvSource("dataset.csv");
source.ResourceType = ResourceType.AzureBlob;
source.AzureBlobStorage.ConnectionString = "<your_connection_string>";
source.AzureBlobStorage.ContainerName = "data-container";

Streaming & Buffering

  • Data is streamed record-by-record instead of loading the entire dataset into memory.
  • Adjustable buffer sizes via MaxBufferSize ensures that the system can buffer processed rows for faster processing speed.

Paging Support for API Requests

For web-based data sources, ETLBox supports pagination:

JsonSource<MyRow> source = new JsonSource<MyRow>();
int page = 1;
source.GetNextUri = streamMetaData => $"https://api.example.com/data?page={page++}";
source.HasNextUri = streamMetaData => streamMetaData.ProcessedRows > 0;

Schema Flexibility

ETLBox supports different ways to map data from streaming sources:

POCO Mapping (Plain Old C# Objects)**

When reading structured data, you can map it directly to a C# class:

public class MyDataRow {
    public int Id { get; set; }
    public string Name { get; set; }
}

CsvSource<MyDataRow> source = new CsvSource<MyDataRow>("data.csv");

Dynamic Objects (ExpandoObject)**

For unknown schemas, use ExpandoObject to handle flexible structures:

CsvSource<dynamic> source = new CsvSource<dynamic>("data.csv");
source.RowModificationAction = (row, meta) => {
    dynamic r = row;
    Console.WriteLine($"ID: {r.Id}, Name: {r.Name}");
};

Attribute-Based Mapping

This example demonstrates how CsvHelper’s attribute-based mapping can be used to rename and reorder columns. Since ETLBox uses third-party libraries internally, attribute configurations vary by connector.

public class MyDataRow {
    [CsvHelper.Configuration.Attributes.Name("Identifier")]
    public int Id { get; set; }

    [CsvHelper.Configuration.Attributes.Index(1)]
    public string FullName { get; set; }
}

CsvSource<MyDataRow> source = new CsvSource<MyDataRow>("data.csv");