Distinct<TInput>

Class Distinct<TInput>

The 'Distinct' class in ETLBox is designed to efficiently filter out duplicate records in a data flow. It operates by generating unique hash values for each record based on specified properties, allowing for precise identification and exclusion of duplicates. This class supports various functionalities like setting DistinctColumns for targeted filtering, linking to other transformations or destinations to handle duplicates, and the ability to define custom functions for unique key generation. It is ideal for scenarios where data uniqueness and quality are of paramount importance.

Inherited Members
Namespace: ETLBox.DataFlow
Assembly: ETLBox.dll
Syntax
    public class Distinct<TInput> : DataFlowTransformation<TInput, TInput>, IDataFlowLogging, IDataFlowTransformation<TInput, TInput>, IDataFlowSource<TInput>, IDataFlowSource, IDataFlowDestination<TInput>, IDataFlowDestination, IDataFlowComponent, ILoggableTask
Type Parameters
NameDescription
TInput

Constructors

Distinct()

Declaration
    public Distinct()

Properties

DistinctColumns

Specifies a collection of property names to identify the uniqueness of an object. This collection dictates which properties are considered for uniqueness evaluation.

Declaration
    public ICollection<DistinctColumn> DistinctColumns { get; set; }
Property Value
TypeDescription
ICollection<DistinctColumn>

DuplicatesSourceBlock

Declaration
    public ISourceBlock<TInput> DuplicatesSourceBlock { get; }
Property Value
TypeDescription
ISourceBlock<TInput>

GetUniqueKeyFunc

Allows the specification of a custom function to generate a unique identifier for each row. This function can be used as an alternative to defining distinct properties. Note: When this property is set, it overrides any DistinctColumns settings. The use of this function is optional.

Declaration
    public Func<TInput, object> GetUniqueKeyFunc { get; set; }
Property Value
TypeDescription
Func<TInput, object>

SourceBlock

SourceBlock from the underlying TPL.Dataflow which is used as output buffer for the component.

Declaration
    public override ISourceBlock<TInput> SourceBlock { get; }
Property Value
TypeDescription
ISourceBlock<TInput>
Overrides

TargetBlock

TargetBlock from the underlying TPL.Dataflow which is used as input buffer for the component.

Declaration
    public override ITargetBlock<TInput> TargetBlock { get; }
Property Value
TypeDescription
ITargetBlock<TInput>
Overrides

Methods

CheckParameter()

Declaration
    protected override void CheckParameter()
Overrides

CleanUpOnFaulted(Exception)

Declaration
    protected override void CleanUpOnFaulted(Exception e)
Parameters
TypeNameDescription
Exceptione
Overrides

CleanUpOnSuccess()

Declaration
    protected override void CleanUpOnSuccess()
Overrides

InitCheckedParameter()

Declaration
    protected override void InitCheckedParameter()
Overrides

InitComponent()

Declaration
    protected override void InitComponent()
Overrides

LinkDuplicatesTo(IDataFlowDestination<TInput>)

Establishes a link from the current block to another transformation or destination. This linked component will exclusively receive the duplicates detected by the current block.

Declaration
    public IDataFlowSource<TInput> LinkDuplicatesTo(IDataFlowDestination<TInput> target)
Parameters
TypeNameDescription
IDataFlowDestination<TInput>target

The transformation or destination to which this block is linked.

Returns
TypeDescription
IDataFlowSource<TInput>

The component to which the duplicates are linked.

LinkDuplicatesTo(IDataFlowDestination<TInput>, Predicate<TInput>)

Establishes a link from the current block to another transformation or destination. This linked component will exclusively receive the duplicates detected by the current block.

Declaration
    public virtual IDataFlowSource<TInput> LinkDuplicatesTo(IDataFlowDestination<TInput> target, Predicate<TInput> rowsToKeep)
Parameters
TypeNameDescription
IDataFlowDestination<TInput>target

The transformation or destination to which this block is linked.

Predicate<TInput>rowsToKeep

A predicate to determine which rows to send to the connected target. Rows that satisfy this predicate (evaluate to true) are forwarded.

Returns
TypeDescription
IDataFlowSource<TInput>

The component to which the duplicates are linked.

LinkDuplicatesTo(IDataFlowDestination<TInput>, Predicate<TInput>, Predicate<TInput>)

Establishes a link from the current block to another transformation or destination. This linked component will exclusively receive the duplicates detected by the current block.

Declaration
    public virtual IDataFlowSource<TInput> LinkDuplicatesTo(IDataFlowDestination<TInput> target, Predicate<TInput> rowsToKeep, Predicate<TInput> rowsIntoVoid)
Parameters
TypeNameDescription
IDataFlowDestination<TInput>target

The transformation or destination to which this block is linked.

Predicate<TInput>rowsToKeep

A predicate to determine which rows to send to the connected target. Rows that satisfy this predicate (evaluate to true) are forwarded.

Predicate<TInput>rowsIntoVoid

A predicate to filter out rows. Rows that satisfy this predicate (evaluate to true) are discarded.

Returns
TypeDescription
IDataFlowSource<TInput>

The component to which the duplicates are linked.

PrepareParameterForCheck()

Declaration
    protected override void PrepareParameterForCheck()
Overrides

Reset()

Declaration
    protected override void Reset()
Overrides

Implements