Parsing a CSV file within a ZIP archive with Python

I recently had need to parse a large (100 MB) CSV file with Python and was pleased with how speedy it was. However, the test machine I was using was strapped for disk space and a 100 MB file was, shall we say, less than welcome. Zipping the file compressed it down to 5 MB, which was far more reasonable.

Parsing CSV in Python is straightforward… and reading file contents from a ZIP archive is as well. Unless you’re using Python 2.6 or higher, though, getting the file contents out of the ZIP archive in a way csv.reader() likes it (as a file-like object) is not as simple ZipFile.open(). All you get in Python 2.5 is ZipFile.read(), which returns the file contents in string form. Since disk space was a premium, unarchiving the CSV even temporarily was out of the question. Using StringIO, it is possible to take the string contents of the CSV within the ZIP archive and make a file-like object that csv.reader() will be happy with.

Assume that you have a ZIP archive, test.csv.zip, with a single file in it, test.csv. Here’s the code:

import csv
import StringIO
import zipfile

dataFile = ‘test.csv’
archive = ‘.’.join([dataFile, 'zip'])

filehandle = open(archive, ‘rb’)
zfile = zipfile.ZipFile(filehandle)
data = StringIO.StringIO(zfile.read(dataFile))
reader = csv.reader(data)

for row in reader:
    print row

zfile.close()
filehandle.close()

Comments

One Comment so far. Leave a comment below.
  1. Steve,

    Thanks,

    This gave me the piece of the puzzle I’m solving.

    Steve

Add Your Comments

Required
Required
Tips

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <ol> <ul> <li> <strong>

Your email is never published nor shared.

Ready?