Row normaliser

Having a table with several colums that should be transposed to rows, the row normaliser step can be used:


Type field: can be anything

Fieldname: Here put the colum header names

Type: can be anything

new field: this will be the field where the value of the column will be transposed.

Result:

Note, that fields, which are not added in the rows normaliser step as fields, will simply be added to the output without normalization.

Advertisement

Concat Strings

There are several possibilities in Pentaho Data Integration to put together several fields:

a) Step: Concat fields

This step is very straightforward if you want to concat fields using the same separator:

Concat Last and First name to a field “person_p3”, with separator “, “

b) Step: Formula

Similar to Excel you can create a new string using your fields and ad hoc defined strings in the Formula step:

c) Step: User Defined Java Expression

Similarity of person names

If you want to compare strings using fuzzy logic, you either can use the step “Fuzzy match” or calculate the similarity within the “Calculator” step.

Calculator Step: Testing various algorithms

To comparing person names I found the “JaroWinkler similitude” algorithm with a score > 0.75 providing acceptable results:

Results after calculation of similarities (sorted by Jaro Winkler)

Note: In this example “Grams, C. M” is obviously similar to “Grams, Christian Michael Warnfried”. With the Levenshtein distance, this similarity would not have been found.

In order to filter out false positives, you can run additionally the similarity check also on just the last name only.