You can extract a pattern as an additional field with Pentaho using the “Regex evaluation” step.
Example to extract the Arxiv-ID with this Regex: .*(\d{4}\.\d{4,5}).*
Notes about using Pentaho Data Integration and Apache Hop as Non-Programmer
You can extract a pattern as an additional field with Pentaho using the “Regex evaluation” step.
Example to extract the Arxiv-ID with this Regex: .*(\d{4}\.\d{4,5}).*