I recently had need to parse a large (100 MB) CSV file with Python and was pleased with how speedy it was. However, the test machine I was using was strapped for disk space and a 100 MB file was, shall we say, less than welcome. Zipping the file compressed it down to 5 MB, which was far more reasonable.
Parsing CSV in Python is straightforward… and reading file contents from a ZIP archive is as well. Unless you’re using Python 2.6 or higher, though, getting the file contents out of the ZIP archive in a way csv.reader() likes it (as a file-like object) is not as simple ZipFile.open(). All you get in Python 2.5 is ZipFile.read(), which returns the file contents in string form. Since disk space was a premium, unarchiving the CSV even temporarily was out of the question. Using StringIO, it is possible to take the string contents of the CSV within the ZIP archive and make a file-like object that csv.reader() will be happy with.
Assume that you have a ZIP archive, test.csv.zip, with a single file in it, test.csv. Here’s the code:
import csv
import StringIO
import zipfiledataFile = ‘test.csv’
archive = ‘.’.join([dataFile, 'zip'])filehandle = open(archive, ‘rb’)
zfile = zipfile.ZipFile(filehandle)
data = StringIO.StringIO(zfile.read(dataFile))
reader = csv.reader(data)for row in reader:
print rowzfile.close()
filehandle.close()
Comments
One Comment so far. Leave a comment below.Thanks,
This gave me the piece of the puzzle I’m solving.
Steve