Regex Evaluation – Arxiv ID

You can extract a pattern as an additional field with Pentaho using the “Regex evaluation” step.

Example to extract the Arxiv-ID with this Regex: .*(\d{4}\.\d{4,5}).*

The found regex will be added as new field to the stream.

Advertisement