Database Connectors

ETLBox provides components for reading, writing, and merging data in relational databases. These components ensure efficient data processing by utilizing streaming reads and bulk operations where possible.

General Database Compatibility

ETLBox supports the following databases:

DatabaseNative supportODBC supportRemarks
SQL Server+ OleDb support
PostgreSQL
SQLite
Oracle
MySQL
MariaDB
Db2XLUW + z/OS + Cloud
SAP ASE (Sybase)
SAP HANA
Snowflake
Microsoft AccessX
Apache Spark (SQL)X
Other databasesX*Generic ODBC support
Custom connector****Using own implementation

* Limited support for other databases is available through the generic ODBC or OleDb connection managers. However, these connections come with some limitations.

** A CustomSource or CustomDestination can be implemented to create a database connector with custom logic.

Database Package Requirements

To use ETLBox with a specific database, you need to install the corresponding ETLBox database package. The core package ETLBox is always required, but the database-specific package ensures compatibility and optimized performance.

DatabaseRequired Package
SQL ServerETLBox.SqlServer
PostgreSQLETLBox.Postgres
SQLiteETLBox.SQLite
OracleETLBox.Oracle
MySQLETLBox.MySql
MariaDBETLBox.MariaDb
Db2ETLBox.Db2
SAP ASE (Sybase)ETLBox.Sap.Ase
SAP HANAETLBox.Sap.Hana
SnowflakeETLBox.Snowflake
Microsoft AccessETLBox.Odbc
Apache Spark (SQL)ETLBox.Odbc

For ODBC-based connections, the generic ETLBox.Odbc package is available, but database-specific packages are recommended for best performance and feature support. For OleDB connections, reference the ETLBox.OleDb package.

Installing the Required Package

Use the following command to install the package for your database:

dotnet add package ETLBox.SqlServer

Replace ETLBox.SqlServer with the appropriate package name for your database. Once installed, the package provides the necessary connection manager and database-specific optimizations.

Managing Database Connections

To interact with a database, ETLBox uses connection managers, which handle communication and abstract database-specific differences. Each database type has its corresponding connection manager. Here is an example for SqlConnectionManager:

var conn = new SqlConnectionManager("Data Source=myServer;Initial Catalog=myDB;User ID=myUser;Password=myPass;");

For PostgreSQL, the equivalent connection manager would be:

var conn = new PostgresConnectionManager("Host=myServer;Database=myDB;Username=myUser;Password=myPass;");

This connection manager can then be used in any database component.

Reading Data with DbSource

DbSource streams data from a database into the data flow. It does not load all data into memory but processes rows as they are read.

  • Reads data as a continuous stream, reducing memory usage
  • Supports full table reads and custom SQL queries
  • Handles data type conversion automatically

Code Snippet: Reading from a Table

var conn = new SqlConnectionManager("Data Source=myServer;Initial Catalog=myDB;User ID=myUser;Password=myPass;");
var source = new DbSource<MyData>(conn, "MyTable");

Code Snippet: Using a SQL Query

var conn = new PostgresConnectionManager("Host=myServer;Database=myDB;Username=myUser;Password=myPass;");
var source = new DbSource<MyData>(conn, "SELECT Id, Name FROM MyTable WHERE Active = 1");

If column names are unknown at design time, ExpandoObject can be used instead:

var conn = new MySqlConnectionManager("Server=myServer;Database=myDB;User=myUser;Password=myPass;");
var source = new DbSource(conn, "MyTable"); // Uses ExpandoObject by default

Writing Data with DbDestination

DbDestination inserts data into a database using bulk insert statements that insert batches of data at once.

  • Uses database-specific bulk operations for high performance
  • Handles identity columns and column mappings automatically

Code Snippet: Writing to a Table

var conn = new SQLiteConnectionManager("Data Source=mydatabase.db;");
var destination = new DbDestination<MyData>(conn, "TargetTable");

When possible, DbDestination uses native bulk operations (BULK INSERT for SQL Server, COPY for PostgreSQL). It also handles identity inserts and allows custom column mappings. The default batch size for bulk inserts is 1000 rows, but it can be adjusted. E.g. if set to 1, each row is inserted individually, which will decrease performance due to increased database calls. Larger batch sizes can improve efficiency, but may require more memory.

Merging Data with DbMerge

DbMerge combines insert, update, and delete operations into a single step. It checks whether a record already exists and updates it; otherwise, it inserts a new row. Additionally, it supports deleting records, making it useful for full and delta loads.

  • Uses database-specific merge commands or bulk operations where available
  • Can define custom merge keys to determine how records are matched
  • Supports full and delta mode for synchronization scenarios
  • Allows configuring lookup caching strategies for better performance

Code Snippet: Merging Data

var conn = new Db2ConnectionManager("Server=myServer;Database=myDB;UID=myUser;PWD=myPass;");
var merge = new DbMerge<MyData>(conn, "TargetTable");

By default, DbMerge performs batched lookups, loading source data into memory in bulk for each incoming batch to check whether a record exists in the target. It also supports different caching modes, such as preloading the entire target table into memory for faster lookups when processing large datasets.

Example: Reading, Writing, and Merging

This example demonstrates how to read data from a database using DbSource, insert it into an archive table using DbDestination, and merge it into another table using DbMerge. A Multicast component is used to duplicate the data stream, ensuring that the same records are processed independently by both destinations.

diagram

Setting Up the Database

The source table orders contains customer transactions with an OrderId, CustomerName, and OrderAmount. The order_archive table stores a historical record of all processed orders, while merged_orders synchronizes data, ensuring that outdated records are removed, and existing ones are updated.

Before execution, order_archive contains one existing entry, and merged_orders holds one record that should be updated and another that should be deleted. This setup allows us to demonstrate the behavior of DbDestination and DbMerge in a single ETL process.

Table Contents Before Execution

orders (source table)

OrderIdCustomerNameOrderAmount
1Alice100.50
2Bob200.75
3Charlie150.00

order_archive (insert destination)

OrderIdCustomerNameOrderAmount
0Archived100.00

merged_orders (merge destination)

OrderIdCustomerNameOrderAmount
0ToDelete100.00
1ToUpdate100.00

Defining the Data Model

public class OrderData : MergeableRow {
    [DbColumnMap("OrderId")]
    [IdColumn]
    public int Id { get; set; }

    [DbColumnMap("CustomerName")]
    [CompareColumn]
    public string Name { get; set; }

    [DbColumnMap("OrderAmount")]
    [CompareColumn]
    public decimal Amount { get; set; }
}

ETL Process

// Read data from source table
var source = new DbSource<OrderData>(conn, "orders");

// Create multicast component to duplicate data
var multicast = new Multicast<OrderData>();

// Define first destination: simple insert (archive orders)
var archiveDest = new DbDestination<OrderData>(conn, "order_archive");

// Define second destination: full merge (synchronize orders)
var mergeDest = new DbMerge<OrderData>(conn, "merged_orders") {
    MergeMode = MergeMode.Full,
    CacheMode = CacheMode.Partial
};

// Link components
source.LinkTo(multicast);
multicast.LinkTo(archiveDest);
multicast.LinkTo(mergeDest);

// Execute the network
Network.Execute(source);

Table Contents After Execution

order_archive (insert destination)

OrderIdCustomerNameOrderAmount
0Archived100.00
1Alice100.50
2Bob200.75
3Charlie150.00

merged_orders (merge destination)

OrderIdCustomerNameOrderAmount
1Alice100.50
2Bob200.75
3Charlie150.00

Explanation and Full Example

The DbSource component reads data from orders and sends it into the pipeline. The Multicast component duplicates the data, allowing two separate operations to take place. One copy is written directly to order_archive using DbDestination, ensuring all records are stored as a historical log. The other copy is processed by DbMerge, which updates existing records in merged_orders, removes obsolete ones, and inserts new data as needed.

Full Example on GitHub

The complete code for this example, including setup scripts and execution details, is available on GitHub. You can find it along with other ETLBox examples and demos   .

Next Steps

The following sections provide more details on each component:

  • Connection Management – How to set up and configure connection managers
  • DbSource – How to extract data efficiently
  • DbDestination – How bulk inserts improve performance
  • DbMerge – How to synchronize datasets efficiently