Database Connectors

ETLBox provides components for reading, writing, and merging data in relational databases. These components ensure efficient data processing by utilizing streaming reads and bulk operations where possible.

General Database Compatibility

ETLBox supports the following databases:

Database	Native support	ODBC support	Remarks
SQL Server	✓	✓	+ OLEDB support
PostgreSQL	✓	✓
SQLite	✓	✓
Oracle	✓	✓
MySQL	✓	✓
MariaDB	✓	✓
Db2	✓	X	LUW + z/OS + Cloud
SAP ASE (Sybase)	✓	✓
SAP HANA	✓	✓
Snowflake	✓	✓
Microsoft Access	X	✓	+ OLEDB support
Apache Spark (SQL)	X	✓
Other databases	X	*	Generic ODBC & OLEDB support
Custom connector	**	**	Using own implementation

* Limited support for other databases is available through the generic ODBC or OLEDB connection managers. However, these connections come with some limitations.

** A CustomSource or CustomDestination can be implemented to create a database connector with custom logic.

Database Package Requirements

To use ETLBox with a specific database, you need to install the corresponding ETLBox database package. The core package ETLBox is always required, but the database-specific package ensures compatibility and optimized performance.

Database	Required Package
SQL Server	`ETLBox.SqlServer`
PostgreSQL	`ETLBox.Postgres`
SQLite	`ETLBox.SQLite`
Oracle	`ETLBox.Oracle`
MySQL	`ETLBox.MySql`
MariaDB	`ETLBox.MariaDb`
Db2	`ETLBox.Db2`
SAP ASE (Sybase)	`ETLBox.Sap.Ase`
SAP HANA	`ETLBox.Sap.Hana`
Snowflake	`ETLBox.Snowflake`
Microsoft Access	`ETLBox.Odbc`
Apache Spark (SQL)	`ETLBox.Odbc`

For ODBC-based connections, the generic ETLBox.Odbc package is available, but database-specific packages are recommended for best performance and feature support. For OLEDB connections, reference the ETLBox.OleDb package.

Installing the Required Package

Use the following command to install the package for your database:

dotnet add package ETLBox.SqlServer

Replace ETLBox.SqlServer with the appropriate package name for your database. Once installed, the package provides the necessary connection manager and database-specific optimizations.

Managing Database Connections

To interact with a database, ETLBox uses connection managers, which handle communication and abstract database-specific differences. Each database type has its corresponding connection manager. Here is an example for SqlConnectionManager:

var conn = new SqlConnectionManager("Data Source=myServer;Initial Catalog=myDB;User ID=myUser;Password=myPass;");

For PostgreSQL, the equivalent connection manager would be:

var conn = new PostgresConnectionManager("Host=myServer;Database=myDB;Username=myUser;Password=myPass;");

This connection manager can then be used in any database component.

Reading Data with DbSource

DbSource streams data from a database into the data flow. It does not load all data into memory but processes rows as they are read.

Reads data as a continuous stream, reducing memory usage
Supports full table reads and custom SQL queries
Handles data type conversion automatically

Code Snippet: Reading from a Table

var conn = new SqlConnectionManager("Data Source=myServer;Initial Catalog=myDB;User ID=myUser;Password=myPass;");
var source = new DbSource<MyData>(conn, "MyTable");

Code Snippet: Using a SQL Query

var conn = new PostgresConnectionManager("Host=myServer;Database=myDB;Username=myUser;Password=myPass;");
var source = new DbSource<MyData>(conn, "SELECT Id, Name FROM MyTable WHERE Active = 1");

If column names are unknown at design time, ExpandoObject can be used instead:

var conn = new MySqlConnectionManager("Server=myServer;Database=myDB;User=myUser;Password=myPass;");
var source = new DbSource(conn, "MyTable"); // Uses ExpandoObject by default

Writing Data with DbDestination

DbDestination inserts data into a database using bulk insert statements that insert batches of data at once.

Uses database-specific bulk operations for high performance
Handles identity columns and column mappings automatically

Code Snippet: Writing to a Table

var conn = new SQLiteConnectionManager("Data Source=mydatabase.db;");
var destination = new DbDestination<MyData>(conn, "TargetTable");

When possible, DbDestination uses native bulk operations (BULK INSERT for SQL Server, COPY for PostgreSQL). It also handles identity inserts and allows custom column mappings. The default batch size for bulk inserts is 1000 rows, but it can be adjusted. E.g. if set to 1, each row is inserted individually, which will decrease performance due to increased database calls. Larger batch sizes can improve efficiency, but may require more memory.

Merging Data with DbMerge

DbMerge combines insert, update, and delete operations into a single step. It checks whether a record already exists and updates it; otherwise, it inserts a new row. Additionally, it supports deleting records, making it useful for full and delta loads.

Uses database-specific merge commands or bulk operations where available
Can define custom merge keys to determine how records are matched
Supports full and delta mode for synchronization scenarios
Allows configuring lookup caching strategies for better performance

Code Snippet: Merging Data

var conn = new Db2ConnectionManager("Server=myServer;Database=myDB;UID=myUser;PWD=myPass;");
var merge = new DbMerge<MyData>(conn, "TargetTable");

By default, DbMerge performs batched lookups, loading source data into memory in bulk for each incoming batch to check whether a record exists in the target. It also supports different caching modes, such as preloading the entire target table into memory for faster lookups when processing large datasets.

Example: Reading, Writing, and Merging

This example demonstrates how to read data from a database using DbSource, insert it into an archive table using DbDestination, and merge it into another table using DbMerge. A Multicast component is used to duplicate the data stream, ensuring that the same records are processed independently by both destinations.

Setting Up the Database

The source table orders contains customer transactions with an OrderId, CustomerName, and OrderAmount. The order_archive table stores a historical record of all processed orders, while merged_orders synchronizes data, ensuring that outdated records are removed, and existing ones are updated.

Before execution, order_archive contains one existing entry, and merged_orders holds one record that should be updated and another that should be deleted. This setup allows us to demonstrate the behavior of DbDestination and DbMerge in a single ETL process.

Table Contents Before Execution

orders (source table)

OrderId	CustomerName	OrderAmount
1	Alice	100.50
2	Bob	200.75
3	Charlie	150.00

order_archive (insert destination)

OrderId	CustomerName	OrderAmount
0	Archived	100.00

merged_orders (merge destination)

OrderId	CustomerName	OrderAmount
0	ToDelete	100.00
1	ToUpdate	100.00

Defining the Data Model

public class OrderData : MergeableRow {
    [DbColumnMap("OrderId")]
    [IdColumn]
    public int Id { get; set; }

    [DbColumnMap("CustomerName")]
    [CompareColumn]
    public string Name { get; set; }

    [DbColumnMap("OrderAmount")]
    [CompareColumn]
    public decimal Amount { get; set; }
}

ETL Process

// Read data from source table
var source = new DbSource<OrderData>(conn, "orders");

// Create multicast component to duplicate data
var multicast = new Multicast<OrderData>();

// Define first destination: simple insert (archive orders)
var archiveDest = new DbDestination<OrderData>(conn, "order_archive");

// Define second destination: full merge (synchronize orders)
var mergeDest = new DbMerge<OrderData>(conn, "merged_orders") {
    MergeMode = MergeMode.Full,
    CacheMode = CacheMode.Partial
};

// Link components
source.LinkTo(multicast);
multicast.LinkTo(archiveDest);
multicast.LinkTo(mergeDest);

// Execute the network
Network.Execute(source);

Table Contents After Execution

order_archive (insert destination)

OrderId	CustomerName	OrderAmount
0	Archived	100.00
1	Alice	100.50
2	Bob	200.75
3	Charlie	150.00

merged_orders (merge destination)

OrderId	CustomerName	OrderAmount
1	Alice	100.50
2	Bob	200.75
3	Charlie	150.00

Explanation and Full Example

The DbSource component reads data from orders and sends it into the pipeline. The Multicast component duplicates the data, allowing two separate operations to take place. One copy is written directly to order_archive using DbDestination, ensuring all records are stored as a historical log. The other copy is processed by DbMerge, which updates existing records in merged_orders, removes obsolete ones, and inserts new data as needed.

Full Example on GitHub

The complete code for this example, including setup scripts and execution details, is available on GitHub. You can find it along with other ETLBox examples and demos .

Next Steps

The following sections provide more details on each component:

Connection Management – How to set up and configure connection managers
DbSource – How to extract data efficiently
DbDestination – How bulk inserts improve performance
DbMerge – How to synchronize datasets efficiently

Edit this page on GitHub

Shared Features

Connection Manager

Docs

ETLBox

Title here

Database Connectors

General Database Compatibility

Database Package Requirements

Installing the Required Package

Managing Database Connections

Reading Data with DbSource

Code Snippet: Reading from a Table

Code Snippet: Using a SQL Query

Writing Data with DbDestination

Code Snippet: Writing to a Table

Merging Data with DbMerge

Code Snippet: Merging Data

Example: Reading, Writing, and Merging

Setting Up the Database

Table Contents Before Execution

Defining the Data Model

ETL Process

Table Contents After Execution

Explanation and Full Example

Full Example on GitHub

Next Steps

Database Connectors

General Database Compatibility#

Database Package Requirements#

Installing the Required Package#

Managing Database Connections#

Reading Data with DbSource#

Code Snippet: Reading from a Table#

Code Snippet: Using a SQL Query#

Writing Data with DbDestination#

Code Snippet: Writing to a Table#

Merging Data with DbMerge#

Code Snippet: Merging Data#

Example: Reading, Writing, and Merging#

Setting Up the Database#

Table Contents Before Execution#

Defining the Data Model#

ETL Process#

Table Contents After Execution#

Explanation and Full Example#

Full Example on GitHub#

Next Steps#

General Database Compatibility

Database Package Requirements

Installing the Required Package

Managing Database Connections

Reading Data with DbSource

Code Snippet: Reading from a Table

Code Snippet: Using a SQL Query

Writing Data with DbDestination

Code Snippet: Writing to a Table

Merging Data with DbMerge

Code Snippet: Merging Data

Example: Reading, Writing, and Merging

Setting Up the Database

Table Contents Before Execution

Defining the Data Model

ETL Process

Table Contents After Execution

Explanation and Full Example

Full Example on GitHub

Next Steps