M BUZZ CRAZE NEWS
// general

How to transform valid JSON to CSV?

By Emma Johnson

I'm trying to extract information from a JSON file and write some of the contents to a CSV file.

Here is an example of my text

"data":{"headers":{"sender":""
"to":""
"subject":"Help with this project"
"x-received-time":"14144273245408"
"received":"from abc.com ()\r\n by mail.mail.com with SMTP (Postfix)\r\n for ;\r\n Mon
"from":"\"Help with this project\" <>"
"date":"Mon, 27 Oct 2014 09:03:14 -0500"
"id":"1414427328-2345855-frank"
"to":""
"time":14144273245408
"subject":"Help with this project"
"fromfull":""

I want to grab the contents from: to, fromfull, id, subject, date and write it to a csv file where To is column A, fromfull is column B, and so forth.

Can anyone offer any assistance? This is a JSON response.

7

6 Answers

You can convert this JSON to CSV in a single line with jq.

jq '.data.headers | [.sender, .to, .subject, ."x-received-time",
.received, .from, .date, .id, .to, .subject, .fromfull]
+ [(.time | tostring)] | join(", ")'

Breakdown:

  • .data.headers - Emit headers as an object
    • If data contained an array of headers it would be .data[].headers
  • […string keys list…] - Emit string values as an array
  • + [(.time | tostring)] - Emit time as a string and add to the array
  • join(", ") - Join the array values using a comma and a space
    • Substitute your favorite delimiter here
4

You can use the following perl command to create the CSV output, open a terminal and type:

perl -n0e '@a= $_ =~ /"date":(".*?").*?"id":(".*?").*?"to":"(.*?)".*?".*?"subject":(".*?").*?"fromfull":"(.*?)"/gs; while (my @next_n = splice @a, 0, 5) { print join(q{,}, @next_n)."\n"}' inputfile.txt

It will work even if you have multiple headers in your input file.

Note that only the last "to": field is taken into account (it seems that your headers provide the info twice)

The command output:

"Mon, 27 Oct 2014 09:03:14 -0500","1414427328-2345855-frank",,"Help with this project",
4

Since you are working with JSON files, why not parse it as such? Install nodejs-legacy and create a NodeJS script such as:

#!/usr/bin/env node
// parseline.js process lines one by one
'use strict';
var readline = require('readline');
var rl = readline.createInterface({ input: process.stdin, output: process.stdout, terminal: false
});
rl.on('line', function(line){ var obj = JSON.parse(line); // add the fields which you want to extract here: var fields = [ obj.data.headers.to, obj.data.headers.subject, // etc. ]; // print the fields, joined by a comma (CSV, duh.) // No escaping is done, so if the subject contains ',', // then you need additional post-processing. console.log(fields.join(','));
});

Assuming you have a valid JSON string on each line of a file:

node parseline.js < some.txt

Or if you really want to read a single file and parse fields from that:

#!/usr/bin/env node
// parsefile.js - fully read file and parse some data out of it
'use strict';
var filename = process.argv[1]; // first argument
var fs = require('fs');
var text = fs.readFileSync(filename).toString();
var obj = JSON.parse(text);
// add the fields which you want to extract here:
var fields = [ obj.data.headers.to, obj.data.headers.subject, // etc.
];
// print the fields, joined by a comma (CSV, duh.)
// No escaping is done, so if the subject contains ',',
// then you need additional post-processing.
console.log(fields.join(','));

Then run it with:

node parsefile.js yourfile.json > yourfile.csv
4

You can use jsonv from GitHub

And then the following command:

cat YOUR_JSON_FILEname | jsonv to,fromfull,id,subject,date > output.csv

Here's a gawk script I just whipped up for you!

#!/usr/bin/gawk -f
BEGIN { FS="\"" output="" nodata=1
}
/^"data"/{ if( ! nodata ) { gsub("|$","",output) print output nodata=0 } output=""
}
/^"[^d][^a][^t][^a]/{ if ( $2 == "to" || $2 == "fromfull" || $2 == "id" || $2 == "subject" || $2 == "date" ) output=output$4"|"
}
END{ gsub("|$","",output) print output
}

It should work on a file with a bunch of like entries. If you want to add other items to the list, just add them in the if statement. I did find one problem with your data set though: the dates. They contain commas so it can't be a true CSV. Instead I just separated it with another character.

Here is an awk implementation:

 awk -F ":" '{gsub("\"","",$1);key=$1;sub(key " ","");gsub("\\","",$0);value[key]=$0; if ("fromfull"== key) print value["from"] ";" value["to"] ";" value["fromfull"] ";" value["id"] ";" value["subject"] ";" value["date"] ;}' jsonFile > csvFile

This script read line until found "fromfull" line, than print csv line, so it should works also with multiple sequences.

This is the result:

 ""Help with this project" <>";"";"";"1414427328-2345855-frank";"Help with this project";"Mon, 27 Oct 2014 09 03 14 -0500"
2

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy