How to transform valid JSON to CSV?
I'm trying to extract information from a JSON file and write some of the contents to a CSV file.
Here is an example of my text
"data":{"headers":{"sender":""
"to":""
"subject":"Help with this project"
"x-received-time":"14144273245408"
"received":"from abc.com ()\r\n by mail.mail.com with SMTP (Postfix)\r\n for ;\r\n Mon
"from":"\"Help with this project\" <>"
"date":"Mon, 27 Oct 2014 09:03:14 -0500"
"id":"1414427328-2345855-frank"
"to":""
"time":14144273245408
"subject":"Help with this project"
"fromfull":""I want to grab the contents from: to, fromfull, id, subject, date and write it to a csv file where To is column A, fromfull is column B, and so forth.
Can anyone offer any assistance? This is a JSON response.
76 Answers
You can convert this JSON to CSV in a single line with jq.
jq '.data.headers | [.sender, .to, .subject, ."x-received-time",
.received, .from, .date, .id, .to, .subject, .fromfull]
+ [(.time | tostring)] | join(", ")'Breakdown:
.data.headers- Emit headers as an object- If data contained an array of headers it would be
.data[].headers
- If data contained an array of headers it would be
[…string keys list…]- Emit string values as an array+ [(.time | tostring)]- Emit time as a string and add to the arrayjoin(", ")- Join the array values using a comma and a space- Substitute your favorite delimiter here
You can use the following perl command to create the CSV output, open a terminal and type:
perl -n0e '@a= $_ =~ /"date":(".*?").*?"id":(".*?").*?"to":"(.*?)".*?".*?"subject":(".*?").*?"fromfull":"(.*?)"/gs; while (my @next_n = splice @a, 0, 5) { print join(q{,}, @next_n)."\n"}' inputfile.txtIt will work even if you have multiple headers in your input file.
Note that only the last "to": field is taken into account (it seems that your headers provide the info twice)
The command output:
"Mon, 27 Oct 2014 09:03:14 -0500","1414427328-2345855-frank",,"Help with this project", 4 Since you are working with JSON files, why not parse it as such? Install nodejs-legacy and create a NodeJS script such as:
#!/usr/bin/env node
// parseline.js process lines one by one
'use strict';
var readline = require('readline');
var rl = readline.createInterface({ input: process.stdin, output: process.stdout, terminal: false
});
rl.on('line', function(line){ var obj = JSON.parse(line); // add the fields which you want to extract here: var fields = [ obj.data.headers.to, obj.data.headers.subject, // etc. ]; // print the fields, joined by a comma (CSV, duh.) // No escaping is done, so if the subject contains ',', // then you need additional post-processing. console.log(fields.join(','));
});Assuming you have a valid JSON string on each line of a file:
node parseline.js < some.txtOr if you really want to read a single file and parse fields from that:
#!/usr/bin/env node
// parsefile.js - fully read file and parse some data out of it
'use strict';
var filename = process.argv[1]; // first argument
var fs = require('fs');
var text = fs.readFileSync(filename).toString();
var obj = JSON.parse(text);
// add the fields which you want to extract here:
var fields = [ obj.data.headers.to, obj.data.headers.subject, // etc.
];
// print the fields, joined by a comma (CSV, duh.)
// No escaping is done, so if the subject contains ',',
// then you need additional post-processing.
console.log(fields.join(','));Then run it with:
node parsefile.js yourfile.json > yourfile.csv 4 You can use jsonv from GitHub
And then the following command:
cat YOUR_JSON_FILEname | jsonv to,fromfull,id,subject,date > output.csv Here's a gawk script I just whipped up for you!
#!/usr/bin/gawk -f
BEGIN { FS="\"" output="" nodata=1
}
/^"data"/{ if( ! nodata ) { gsub("|$","",output) print output nodata=0 } output=""
}
/^"[^d][^a][^t][^a]/{ if ( $2 == "to" || $2 == "fromfull" || $2 == "id" || $2 == "subject" || $2 == "date" ) output=output$4"|"
}
END{ gsub("|$","",output) print output
}It should work on a file with a bunch of like entries. If you want to add other items to the list, just add them in the if statement. I did find one problem with your data set though: the dates. They contain commas so it can't be a true CSV. Instead I just separated it with another character.
Here is an awk implementation:
awk -F ":" '{gsub("\"","",$1);key=$1;sub(key " ","");gsub("\\","",$0);value[key]=$0; if ("fromfull"== key) print value["from"] ";" value["to"] ";" value["fromfull"] ";" value["id"] ";" value["subject"] ";" value["date"] ;}' jsonFile > csvFileThis script read line until found "fromfull" line, than print csv line, so it should works also with multiple sequences.
This is the result:
""Help with this project" <>";"";"";"1414427328-2345855-frank";"Help with this project";"Mon, 27 Oct 2014 09 03 14 -0500" 2 More in general
"Zoraya ter Beek, age 29, just died by assisted suicide in the Netherlands. She was physically healthy, but psychologically depressed. It's an abomination that an entire society would actively facilitate, even encourage, someone ending their own life because they had no hope. Th…"