Split first and lastname

For splitting a Name in First and Lastname I found the following simple Regex working in most cases:
(.*?)([^\s]*)$

Indeed you have to evaluate manually if you have names with more than 3 words

LASTNAME in Capitals + no speparation

In another case I came across a textline, where the firstname was in Capital, however the firstname was not easy seperable by the following words.

I found the the following Regex (including already the exceptions of the existing data) would work in most my cases:

^([A-ZÀÂÄÆÁÃÅĀÈÉÊËĘĖĒÎÏĪĮÍÌÔŌØÕÓÒÖŒÙÛÜŪÚŸÇĆČŃÑ\-'de]{2,20}\s(?:[A-ZÀÂÄÆÁÃÅĀÈÉÊËĘĖĒÎÏĪĮÍÌÔŌØÕÓÒÖŒÙÛÜŪÚŸÇĆČŃÑ\-']{2,15})?)\s*?([^\s]+\s(?:Huy|Christine|Flora|Deborah|Gösta)?)(.*)

Concat Strings

There are several possibilities in Pentaho Data Integration to put together several fields:

a) Step: Concat fields

This step is very straightforward if you want to concat fields using the same separator:

Concat Last and First name to a field “person_p3”, with separator “, “

b) Step: Formula

Similar to Excel you can create a new string using your fields and ad hoc defined strings in the Formula step:

c) Step: User Defined Java Expression