Distinct<TInput>
Class Distinct<TInput>
The 'Distinct' class in ETLBox is designed to efficiently filter out duplicate records in a data flow. It operates by generating unique hash values for each record based on specified properties, allowing for precise identification and exclusion of duplicates. This class supports various functionalities like setting DistinctColumns for targeted filtering, linking to other transformations or destinations to handle duplicates, and the ability to define custom functions for unique key generation. It is ideal for scenarios where data uniqueness and quality are of paramount importance.
Implements
Inherited Members
Namespace: ETLBox.DataFlow
Assembly: ETLBox.dll
Syntax
Type Parameters
Name | Description |
---|---|
TInput |
Constructors
Distinct()
Declaration
Properties
DistinctColumns
Specifies a collection of property names to identify the uniqueness of an object. This collection dictates which properties are considered for uniqueness evaluation.
Declaration
Property Value
Type | Description |
---|---|
ICollection<DistinctColumn> |
DuplicatesSourceBlock
Declaration
Property Value
Type | Description |
---|---|
ISourceBlock<TInput> |
GetUniqueKeyFunc
Allows the specification of a custom function to generate a unique identifier for each row. This function can be used as an alternative to defining distinct properties. Note: When this property is set, it overrides any DistinctColumns settings. The use of this function is optional.
Declaration
Property Value
Type | Description |
---|---|
Func<TInput, object> |
SourceBlock
SourceBlock from the underlying TPL.Dataflow which is used as output buffer for the component.
Declaration
Property Value
Type | Description |
---|---|
ISourceBlock<TInput> |
Overrides
TargetBlock
TargetBlock from the underlying TPL.Dataflow which is used as input buffer for the component.
Declaration
Property Value
Type | Description |
---|---|
ITargetBlock<TInput> |
Overrides
Methods
CheckParameter()
Declaration
Overrides
CleanUpOnFaulted(Exception)
Declaration
Parameters
Type | Name | Description |
---|---|---|
Exception | e |
Overrides
CleanUpOnSuccess()
Declaration
Overrides
InitCheckedParameter()
Declaration
Overrides
InitComponent()
Declaration
Overrides
LinkDuplicatesTo(IDataFlowDestination<TInput>)
Establishes a link from the current block to another transformation or destination. This linked component will exclusively receive the duplicates detected by the current block.
Declaration
Parameters
Type | Name | Description |
---|---|---|
IDataFlowDestination<TInput> | target | The transformation or destination to which this block is linked. |
Returns
Type | Description |
---|---|
IDataFlowSource<TInput> | The component to which the duplicates are linked. |
LinkDuplicatesTo(IDataFlowDestination<TInput>, Predicate<TInput>)
Establishes a link from the current block to another transformation or destination. This linked component will exclusively receive the duplicates detected by the current block.
Declaration
Parameters
Type | Name | Description |
---|---|---|
IDataFlowDestination<TInput> | target | The transformation or destination to which this block is linked. |
Predicate<TInput> | rowsToKeep | A predicate to determine which rows to send to the connected target. Rows that satisfy this predicate (evaluate to true) are forwarded. |
Returns
Type | Description |
---|---|
IDataFlowSource<TInput> | The component to which the duplicates are linked. |
LinkDuplicatesTo(IDataFlowDestination<TInput>, Predicate<TInput>, Predicate<TInput>)
Establishes a link from the current block to another transformation or destination. This linked component will exclusively receive the duplicates detected by the current block.
Declaration
Parameters
Type | Name | Description |
---|---|---|
IDataFlowDestination<TInput> | target | The transformation or destination to which this block is linked. |
Predicate<TInput> | rowsToKeep | A predicate to determine which rows to send to the connected target. Rows that satisfy this predicate (evaluate to true) are forwarded. |
Predicate<TInput> | rowsIntoVoid | A predicate to filter out rows. Rows that satisfy this predicate (evaluate to true) are discarded. |
Returns
Type | Description |
---|---|
IDataFlowSource<TInput> | The component to which the duplicates are linked. |