Using the “Modified JavaScript value”-step you can call functions in java script, like Math.max().
Month: May 2020
Extract email from website
How to extract the email of the corresponding author of a publication, like: https://doi.org/10.1039/C7CS00709D with Pentaho Data integration?
- Get the HTML of the publications via REST Step, store it in one field.
- Extract email via “Regex evaluation” step using the Regex
.*mailto:([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+).*
with the step options:- Enable dotall mode
- Enable multiline mode
The first email appearing in the HTML will put into the filed email.
Alternatively the Online Service https://www.convertcsv.com/email-extractor.htm also provides a nice possibility to extract emails from several websites:
Add a sub-transformation with mapping steps
In the last post I created a sub-transformation with a “transformation executor” step. It works, but I had to look up the results from the sub-transformation in a later step. However, Pentaho Data Integration (PDI) however offers a more elegant way to add sub-transformation.
I will use the same example as previously.
a) Sub-Transformation
In your sub-transformation you insert a “Mapping input specific” step at the beginning of your sub-transformation and define in this step what input fields you expect. At the end you add an “Mapping output specification” step, where you don’t have to specify anything.
b) Parent/Main-transformation
So in the main transformation you can add the step “Simple mapping (sub-transformation)”.
In this step you can map the fields of the parent transformation to the expected fields that you have defined in the input step of the sub-transformation. If you use the same field names, PDI provides a nice auto-mapping feature in the step options: “Mapping…” -> “Guess…”
It is not necessary to specify the “Output” tab, because in this case all fields created in the sub-transformation become available in the following steps of the super/main transformation.
The advantage here is that the fields that you have not passed on to the sub-transformation are directly available in the following steps of the partial/main transformation.