If you want to compare strings using fuzzy logic, you either can use the step “Fuzzy match” or calculate the similarity within the “Calculator” step.
To comparing person names I found the “JaroWinkler similitude” algorithm with a score > 0.75 providing acceptable results:
Note: In this example “Grams, C. M” is obviously similar to “Grams, Christian Michael Warnfried”. With the Levenshtein distance, this similarity would not have been found.
In order to filter out false positives, you can run additionally the similarity check also on just the last name only.
Pentaho Data Integration (PDI) offers the input step “JSON-Input” to read out data from a JSON file or stream. Often I use this step after a REST-API-Query, so I would have the JSON-Input as a field from a previous step.
In order to test the field-extraction, it’s helpful to save some local samples of the possible responses. In the tab “File” you first can do your tests with the local file and switch later to “Source is from a previous step”.
Since recently Pentaho offers an “internal helper” to select the fields. However it unfortunately does not to work for most of my use-cases. Instead I found http://jsonpathfinder.com/ very useful.
Then add the fields with the corresponding path in PDI: