Sunday, July 24, 2011

Viewing CSV Files

I find CSV (or, to be more general, DSV) files difficult to read on Unix because you can't tell which column a value is in. So I always end up importing them into a spreadsheet which is a pain. Here is an example of a small pipe-delimited file containing book data:
Title|Author|CoAuthor|Year|ISBN
Carrie|Stephen King||1974|978-0385086950
The Human Web|William McNeill|Robert McNeill|2003|978-0393051797
It would be a lot easier to read, if I could convert the file into dictionaries of key:value pairs in order to see which columns the values were referring to, like this:
Title:Carrie
Author:Stephen King
CoAuthor:
Year:1974
ISBN:978-0385086950

Title:The Human Web
Author:William McNeill
CoAuthor:Robert McNeill
Year:2003
ISBN:978-0393051797
So, I wrote the following Bash script to convert a delimiter separated file into a collection of dictionaries. It uses awk to read the first row, which contains the column names, split it and store it in an array. It then prints out the remaining rows along with their column names which are looked up from the array.
#! /bin/bash
# CSV Viewer
# Usage: csv [-d delim] filename
# default delimiter is pipe.
#
# Input file:
# h1|h2|h3
# v1|v2|v3
# w1|w2|w3
#
# Output:
# h1: v1
# h2: v2
# h3: v3
#
# h1: w1
# h2: w2
# h3: w3

delim=|
while getopts "d:" OPTION
do
   case $OPTION in
     d) delim=$OPTARG; shift $((OPTIND-1)) ;;
   esac
done

if [ $# -eq 0 ]
then
    echo "Usage: csv [-d delim] filename" >&2
    exit 1
fi
awk -F "$delim" '{if(NR==1)split($0,arr);else for(i=1;i<=NF;i++)print arr[i]":"$i;print "";}' "$1"
Running the script:
sharfah@starship:~> csv.sh -d '|' file
Title:Carrie
Author:Stephen King
CoAuthor:
Year:1974
ISBN:978-0385086950

Title:The Human Web
Author:William McNeill
CoAuthor:Robert McNeill
Year:2003
ISBN:978-0393051797

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.