Relational databases
A detailed overview of the DbSource and DbDestination connector.
On this page
If you want to start with example code right away, you will find it in the recipes section for the DbSource and DbDestination. The components are also used frequently in other examples.
DbSource
The DbSource gives you access to any database table or with your database. Or you can directly pass an sql statement that is used as source. The use of the DbSource is very straight forward. You simply pass a connection manager (the right one for your database) and a table name.
Let’s assume that SourceTable
has two columns and is defined like this:
Now we can create our strongly typed object (POCO) to match with with this definition:
Let’s assume we create a POCO (Plain old component object) MySimpleRow
that looks like this:
Now we can set up our data flow like this:
ETLBox will automatically extract missing metadata during runtime from the source table and the involved types. In our example, the source table has columns with exact same name as the object - ETLBox will check this and write data from the Id column into the Id property, and data from the column Value into the Value property. Each record in the source will create a new object that is then passed to the connected components.
In case you want to test your data flow, you could connect your database source to a memory destination and check the data that you retrieved:
Working with SQL statements
For the DbSource
, you can also specify some SQL code to retrieve your data:
Please note that providing a TableName
is not necessary when you use a SQL statement.
There is no particular limitation if you are using custom sql code to retrieve data. For example, you could use a complex query like this:
You could also call a CTE or a stored procedure / function in your SQL.
Note
SELECT * FROM table
could lead to problems, as there are no column names in the SQL that ETLBox could use to map with a property in your object. It is recommended to change this SELECT Id, Value FROM table
or to provide a TableDefinition
object containing the meta data.Query parameters in sql statements
Instead of providing a static string, you can also use a parameterized query. A parameterized query contains variable placeholders that normally start with @
(exception: Oracle uses the :
). Each occurrence of a parameter is then replaced with the actual provided value in a parameter list.
Here is an example:
Using query parameters can improve performance (e.g., your database can reuse an already calculated plan when running the statement again with different parameters). Also, this reduces the risk of sql injections - any special characters in your parameter are escaped before they are inserted in the sql string.
Using dynamic object
The default implementation of DbSource will use an ExpandoObject. This dynamic object will then have properties with the same names as the columns in your source.
No object is needed when using this. Make sure that all other components also either use the default implementation, or alternatively you cast the ExpandoObject into an object or array of your choice. This can be done e.g. with a RowTransformation
Using string arrays
Also you can use the DbSource to read your data directly into an array. This could be a string array. The order of the columns of your table or your SQL code then matches the order in your array. Also, you don’t need any other object definition then.
Automatic meta data retrieval
ETLBox will try to automatically collect the correct meta data information from the provided information.
If you are passing a table name, a SQL query is sent to the database to check if the table exists and then retrieve additional information about the table (including column names, data types, identity columns, keys, default values, and computed values). This will happen when reading from a table and also before writing into a table.
In the simple case where you provide the TableName
property, this table is queried for the meta data information.
If you are providing a SQL statement, ETLBox will try to parse the provided SQL statement to extract the column names from the query. This will only works if you are explicitly define the column names in a query. Whenever there are columns names in the query, e.g. SELECT a.Col1, CAST(b.Test2 AS INT) AS Col2) ...
, the corresponding column names Test2
and Col2
can be extracted from the sql. Unfortunately, this won’t work for stored procedures or the *
operation: EXEC TestProcedure
or SELECT * FROM table
does not provide any information about the column names.
Nevertheless, ETLBox has 3 fallback mechanism in this case:
- If a
TableDefinition
is provided, this information will be used instead. Please note that if you provide your ownTableDefinition
, this will always be used primarily. - If the
TableName
was set, ETLBox will read and use the meta data of this table instead. - The underlying ADO.NET connector may also have information about the column names, which is then used.
DbDestination
The DbDestination will write that data from your flow into the a table. Like the DbSource, you need to pass a connection manager and the destination table name. For any property in your object, the data will be written into the table if the column names match the property name.
If your table has the columns Id and/or Value, the data of your flow will be written into this columns.
Using dynamic objects
Of course you can also use the default implementation of the DbDestination to write data into a table.
As with an object, the properties of the ExpandoObject are used to map the values to the right columns. Only if the ExpandoObject object has a property with the same name as the column in the destination table, data is written into this column. Unfortunately, the Column mapping attributes are not working here.
Using arrays
You can also use the DbDestination with array.
The data is written into the columns in the same order as it is stored in the array. E.g., if your string array has three values, these values are stored into the first, second and third column of your destination table. If your destination table has more columns, these will be ignored. Identity columns (or auto increment / serial values) are ignored.
Batch Size
By default, the DbDestination will create batches of data that are then inserted in whole into the database. This is faster than creating a single insert for each incoming row. So, the DbDestination is a little bit different from the other destinations: It will always wait until it has received the full number of rows needed for a batch and then perform the insert. The default batch size is 1000. You can play around with the batch size to gain higher performance. 1000 rows per batch is a solid value for most operations. If you encounter the issue that inserted the data into the destinations takes too long, try reducing the batch size significantly.
BatchSize in Odbc & OleDb
If you leave the default value for batch size set, it will be changed to 100 rows for Odbc and OleDb connections. As the connection here is much slower than “native” connections, and bulk inserts need to be translated into “INSERT INTO” statements, 100 rows per batch leads to a much better performance than 1000 rows.
Allowing identity inserts
By default, identity columns (also called auto-increment or serial columns) are left untouched when writing into a destination table. Even if you have a property with the same name or a column mapping set up for an identity column, by default no value is written from your data into the column - the identity column will auto generate a unique (and ascending) key.
Sometimes you want to overwrite the auto-generated generated values with the values that you provide with your data. To force the DbDestination to insert values for an identity column, you can set the property AllowIdentityInsert
to true. If set to true, the DbDestination will try to overwrite the identity column with the provided value in your data. (Make sure there is a corresponding property in your data object mapped to the identity column.
Column Mapping
Of course, the property names in the object and the column names can differ - ETLBox will only load columns from a source where it can find a matching property with the same name. If the data type is different, ETLBox will try to automatically convert the data. If the names are different, you can use the attribute DbColumnMap to define the matching columns name for a property.
Let’s reconsider our example at the beginning. We create a table like this:
Now we could define our POCO using the DbColumnMap
attribute. In this example, we replace the property Id with the property Key. In order to still be able to read data from the Id column, we add the DbColumnMap attribute above it. Please note that the data type was changed to string as well - ETLBox will automatically try to convert the integer values into a string if data types do not match.
The setup of our data flow would be left untouched:
Column Mapping with ExpandoObject
If you use the default implementation of DbSource/DbDestination, then the ExpanoObject will be used internally. This dynamic object doesn’t allow you to set attributes as decorators for property. Instead you can pass the attributes manually to the ColumnMapping
property.
Column Converters
Both DbSource and DbDestination allows the use of column converters. These are special converter function that are executed for every record in a particular column. The idea of this converters is to do special data conversion directly in the source/destination which are not supported by ETLBox. E.g. for SQLServer, it is quite common to use a date format like “20200101”. This date format is not supported by the C# DateTime object, so you need to define your own column conversion function if you want to write this date format directly from a string value into a DbDestination.
Here is an example how to write the string value “20200101” into a database column “DateCol”.
Table Definitions
If you pass a table name to a DBsource
or DbDestination
or a SQL statement to a DbSource
, the meta data
of the table is automatically derived from that table or sql statement by ETLBox. For table or views this is done via a SQL statement that queries
system information, and for the SQL statement this is done via parsing the statement.
If you don’t want ETLBox to read this information, or if you want to provide your own meta information,
you can pass a TableDefinition
instead.
This could look like this:
ETLBox will use this meta data instead to get the right column names.
Bulk operations
The DbSource
by default will read data from a table or database query result without filtering or modifying the query, and the DbDestination
by default will simply insert the arriving data into the table. But the DbSource
also offers a BulkSelect
operation mode, and the DbDestination
offers a BulkUpdate
and BulkDelete
. They were introduced mainly for supporting the DbMerge
, but can also be used standalone.
BulkSelect
The BulkSelect requires an IdColumn
(set either as attribute or passed into the IdColumns
property of the DbSource
) to identify the rows by which the source data is filtered. The SelectColumn
(can also be passed to the SelectColumns
property) defines the properties in which data should be read - all other properties will be ignored.
The BulkSelect will only work with a TableNam, and not with custom sql code. Most of the times it will be easier if you pass your own custom sql with a WHERE clause, but sometimes this way of querying your database table may be handy.