February 4, 2007 tools

.NET: Decompressing zip file entries into memory

I knew that the J# libraries in .NET had zip file support, but I couldn’t find any samples that showed how to decompress the files into memory. The hard part, of course, is that the J# stream objects aren’t the same as the .NET stream objects. If you’re a Java programmer looking for a familiar library, that’s great, but I’m not, so I had to do a little finagling.

The first thing you need to do is to add a reference to the vjslib assembly, which brings in .NET classes in Java namespaces, e.g. java.io. The one we care most about is java.uti.zip, which includes ZipFile and ZipEntry. We also need java.util for the Enumeration class and java.io for the InputStream class. With these in place, we can enumerate a zip file:

using java.util; // all from vjslib assembly
using java.util.zip;
using java.io;

static void Main(string[] args) {
  if( args.Length != 1 ) {
    Console.WriteLine(“Usage: dumpzipfileoftextfiles <file>“);
    return;
  }

  // we’re assuming a zip file full of ASCII text files here
  string filename = args[0];
  ZipFile zip = new ZipFile(filename);

  try {
    // enumerate entries in the zip file
    // NOTE: can’t enum via foreach — Java objects don’t support it
    Enumeration entries = zip.entries();
    while( entries.hasMoreElements() ) {
      ZipEntry entry = (ZipEntry)entries.nextElement();

      // read text bytes into an ASCII string
      byte[] bytes = ReadZipBytes(zip, entry);
      string s = ASCIIEncoding.ASCII.GetString(bytes);

      // do something w/ the text
      string entryname = entry.getName();
      Console.WriteLine(“{0}:\r\n{1}\r\n”, entryname, s);
    }
  }
  finally {
    if( zip != null ) { zip.close(); }
  }
}

Notice the use of the Enumeration object so we can enumerate in the Java style and the use of the ZipFile and ZipEntry types. This is all stuff you could find in readily available online samples (I did). The interesting bit is the ReadZipBytes method:

static byte[] ReadZipBytes(ZipFile zip, ZipEntry entry) {
  // read contents of text stream into bytes
  InputStream instream = zip.getInputStream(entry);
  int size = (int)entry.getSize();
  sbyte[] sbytes = new sbyte[size];

  // read all the bytes into memory
  int offset = 0;
  while( true ) {
    int read = instream.read(sbytes, offset, size - offset);
    if( read == -1 ) { break; }
    offset += read;
  }
  instream.close();

  // this is the magic method for converting signed bytes
  // in unsigned bytes for use with the rest of .NET, e.g.
  // Encoding.GetString(byte[]) or new MemoryStream(byte[])
  return (byte[])(object)sbytes;
}

For those of you familiar with Java, I’m just reading the zip file entry data into an array of signed bytes. However, most .NET APIs like unsigned bytes, e.g. Encoding.GetString(byte[])” or new MemoryStream(byte[])”, which means you’ve got to convert a signed array of bytes in .NET to an unsigned array of bytes. Unfortunately, just casting doesn’t work (the compiler complains). Even more unfortunately, I could find nothing in the Convert or BitConverter classes to perform this feat of magic and the code I wrote was dog slow, so I asked around internally.

Luckily, James Manning, an MS SDE, had the answer: cast the signed byte array to an object first and then to a unsigned byte array. Thank goodness James knew that, because I didn’t find anything on this topic. Hopefully future generations will find this missive.

You can download the sample if you like. Enjoy.