jq is amazing. It is an unique combination of javascript and linux shell which gives an immensely powerful tool to work with JSON files (This post gives a introduction to JSON format) . It plays really well with the existing shell tools and has quickly become one of the most used tools in my data analysis/ processing pipeline.
jq is like sed (streaming editor). It is takes an input stream, applies the expression on it and returns an output stream. It does not modifies files directly. The syntax is,
input stream | jq 'expression' | output stream
Input and Outputs
The input and output streams are just plain text streams. They can be a file, program, http request etc. etc. For example consider the following commands,
curl "https://jsonplaceholder.typicode.com/posts/" | jq '.[0:5]' > posts.json
cat posts.json | jq '.[].id' > post_ids.json
cat post_ids.json | jq '.' | curl -X POST -d "$(</dev/stdin)" "http://ptsv2.com/t/5jo6w-1522072388/post"
The first one gets json data from the url, filters the first 5 elements and puts that in posts.json file. The second one takes this posts.json file, filters just the ids from each element and puts that in post_ids.json file. The third one takes this post_ids.json file and posts all of it to a http api as a post request (the results are here). In all these examples, jq does nothing but change input stream and send it to output text stream. This makes it extremely efficient and versatile.
Expressions
The expression part in jq is essentially a tiny javascript engine which is used to manipulate the JSON. This is really powerful. A full list things than can be done is available in the manual. I’ll just outline some basic selection and filtering
selection expressions . - Shows the original object .keyname - selects the specific field in the object .[] - selects all elements (if the object is an array) .[index (:no of elements)] - selects the specified index from the array function expressions (in addition to basic arithmetic) length - returns length of the array keys - returns fields in an object map - applies a function to all the elements in an array del - deletes an object sel - returns an object in an array if the condition is met test - regex like pattern matching
All these can be combined, nested and piped to each other (yes, these are pipes within pipes) indefinitely to manipulate JSON. For example consider the following JSON file named data.json
[ { "id": 1, "title": "sunt aut facere", "body": "quia et t architecto" }, { "id": 2, "title": "qui est esse", "body": "est rerum tempore" }, { "id": 3, "title": "ea molestias quasi", "body": "et iusto sed quo" }, { "id": 4, "title": "eum et est occaecati", "body": "ullam et saepe" }, { "id": 5, "title": "nesciunt quas odio", "body": "repudiandae veniam quaerat" } ]
This can be filtered in the following ways,
'.' - all the data. '.[0]' - first element of the data '.[1:3]' - three elements from index 1 (ie, second, third and fourth elements.) '.[0].title' - title of the first element '.[].id' - ids of all elements "1,2,3,4,5" '[.[].id]' - ids of all elements as an array. "[1,2,3,4,5]" '. | length' - number of elements (5) '.[] | length' - number of elements in each object of the array [3,3,3,3,3] '.[0] | keys' - the fields/keys in the first element '.[] | select(.id==3)' - the element with id as 3 '. | del(.[2])' - everything but third element '. | del((.[] | select(.id==3)))' - everything but the element with id as 3 '. | map(.id = .id+1)' - increase the id variable for all elements by 1 '. | map(del(.id))' - remove the field id from all elements '.[] | select(.body | test("et"))' - elements with 'et' in the body fields
Combining all these we can easily explore and process, json files right from linux terminal and finally the data can be organised in an array and exported as a csv using the @csv function. For example,
cat data.json | jq -r '.[] | [.id, .title, .body] | @csv' > data.csv
the -r is important since that makes jq to output raw csv text.