From isql to bcp

I am currently working on migrating a bunch of shell/perl reporting scripts from Solaris to Linux. Clearly, a trivial task since it’s all POSIX … until it’s not!

Some of the reports do things which involve dumping data from one database, importing it into a temp table in another database and then doing further processing of the data there. Normally, this would be done using Sybase’s bcp utility; you would bcp out from the source database & then bcp in at the target database. However, bcp out works only on full tables or views. You can’t provide a select query and get the output of that query in a bcp in compatible format.

The scripts I was working on were relying on some Solaris executable to dump data from a select query in a format that was compatible for import using a bcp in command. Since I didn’t have the source code of this binary (typical!), here’s how I reproduced its functionality.

First, consider the isql query:

$ isql -U$USER -P$PASSWORD -S$SERVER -I$INTERFACES -o$OUT_FILE << EOF
select * from SUPERHEROES where WEAKNESS like '%Kryptonite%'
EOF

Which results in:

$ cat $OUT_FILE
ID   NAME          ALIAS          POWER                 WEAKNESS
---  -----------  -------------  --------------------  --------------
12   Superman      Kal-El         Flight,X-Ray Vision   Kryptonite
13   Supergirl     Kara Zor-El    Flight,X-Ray Vision   Kryptonite

(2 records selected)

But in order for the output to be importable by bcp in, it has to look like this:

12|Superman|Kal-El|Flight,X-Ray Vision|Kryptonite
13|Supergirl|Kara Zor-El|Flight,X-Ray Vision|Kryptonite

In effect, we need to:

  1. Lose the header rows
  2. Use a better delimiter instead of whitespace
  3. Trim the field values of extra whitespace
  4. Lose the footer rows

For requirements 1 & 2, we can modify the isql statement as follows:

$ isql -b -s"|" -w9999 -U$USER -P$PASSWORD -S$SERVER -I$INTERFACES -o$OUT_FILE << EOF
select * from SUPERHEROES where WEAKNESS like '%Kryptonite%'
EOF

The -b flag removes the header rows and the -s flag asks isql to use the pipe character as the field delimiter in the output. The -w flag is added for good measure: it specifies the width of each output line. The default value of 80 characters is likely to cause each output row to be split over multiple output lines.

The output now looks something like this:

|12 |Superman     |Kal-El       |Flight,X-Ray Vision  |Kryptonite     |
|13 |Supergirl    |Kara Zor-El  |Flight,X-Ray Vision  |Kryptonite     |

(2 records selected)

The footer can be removed by:

$ head -n -2 $OUT_FILE > ${OUT_FILE}.new
$ mv ${OUT_FILE}.new $OUT_FILE

Update (29/Nov/2012) : Just use the set nocount on option in your SQL query which will skip printing of the footer.

And finally, we are left with the task of removing all the extra whitespace around field values. Here’s a quick & dirty Python script that does the job:

#!/usr/bin/env python
"""
A simple utility that takes isql generated output
and converts it into a format suitable for import
via 'bcp in'.

The isql output should ideally be generated using
the following flags:
  isql -b -w9999 -s"|"

The delimiter is especially important since this
script assumes pipe ("|") as the field delimiter.

__WARN: The input file will be overwritten!__

Aside: This whole file could probably be replaced
by a one line sed/awk incantation, if I knew how.
"""
import shutil
import sys

DELIM = "|"
NL = "\n"

if len(sys.argv) != 2:
    print >> sys.stderr, """Usage: 
    isql2bcp.py <input_file>"""
    sys.exit(1)

infile = sys.argv[1]
outfile = infile + ".new"

in_fp = open(infile,'r')
out_fp = open(outfile,'w')

for line in in_fp:
    # Strip each line of leading & trailing DELIM
    # as well as trailing newline
    line = line.strip(DELIM+NL)
    tokens = line.split(DELIM)
    # Remove the extra whitespace surrounding each token
    tokens =  [t.strip() for t in tokens]
    line = DELIM.join(tokens)
    out_fp.write(line + NL)

in_fp.close()
out_fp.close()

shutil.move(outfile, infile)

The script can then be invoked as:

$ isql2bcp.py $OUT_FILE

Resulting in output that can be imported using bcp as:

$ bcp $TEMPDB..$TABLE in $OUT_FILE -c -t"|" -I$INTERFACES -S$SERVER -U$USER -P$PASSWORD

That’s it. I hope this will be useful to someone!