Pentaho Data Integration (PDI) offers the input step “JSON-Input” to read out data from a JSON file or stream. Often I use this step after a REST-API-Query, so I would have the JSON-Input as a field from a previous step.

In order to test the field-extraction, it’s helpful to save some local samples of the possible responses. In the tab “File” you first can do your tests with the local file and switch later to “Source is from a previous step”.

Since recently Pentaho offers an “internal helper” to select the fields. However it unfortunately does not to work for most of my use-cases. Instead I found http://jsonpathfinder.com/ very useful.

Get JSON Path to the data you want to extract via http://jsonpathfinder.com/

Then add the fields with the corresponding path in PDI:

Extracting various fields from the JSON response of the Crossref REST API (e.g. http://api.crossref.org/works/10.1002/2016gl068428)