The .Net Framework I/O system offers two methods for compressing data. DEFLATE and GZIP. These methods are very similar and are industry standard compression algorithms which are implemented in the DeflateStream and GZipStream classes. They support both compression and decompression of data and are limited to compression of uncompressed data of up to a maximum of 4 GB.
Which method should I use?
The only difference between the two methods is that GZip allows for headers that include extra info that can be helpful in decompressing the file, for example with the common tool gzip. This small overhead makes files compressed by Deflate slightly smaller than those compressed by the Gzip method.
How to Compress Data.
The compression streams differ from other file streams in that instead of writting to a resource like a file (Filestream)or into memory (memoryStream), it writtes the data into another stream. The compression stream takes in data like any other stream, but when it writes, it pushes the data into another stream in the compressed or decompressed format.
I wrote the following self explanatory, working piece of code in vb.net to show you exactly how this happens. In this example I’ll use the Deflate method.
Public Sub Compress(ByVal infile As String, ByVal outfile As String) 'Open the file to compress and the file you are going to write to Dim sourceFile As FileStream = File.OpenRead(inFile) Dim destFile As FileStream = File.Create(outfile) ' Wrap the outgoing(destination stream) in the compression stream using ' the compression stream constructor. Dim compStream As New DeflateStream(destFile, CompressionMode.Compress) 'read the data byte by byte from the source file 'feed it into the compressing stream Dim myByte As Integer = sourceFile.ReadByte() While myByte <> -1 compStream.WriteByte(CType(myByte, Byte)) myByte = sourceFile.ReadByte() End While 'Close the streams sourceFile.Close() destFile.Close() End Sub
How to Decompress Data
In decompressing, the flow is the same but the logic is slightly different. In this case, you wrap the sourcefile ( the compressed file) in the compression stream because this is where the data is coming from, specify the Compressionmode.Decompress to show that you are decompressing the wrapped stream. You also write directly to the destination file instead of pushing the data into the stream.
The following code does just this.
Public Sub Decompress(ByVal inFile As String, ByVal outFile As String) 'Open the file to decompress and the file you are going to write to Dim sourceFile As FileStream = File.OpenRead(inFile) Dim destFile As FileStream = File.Create(outFile) ' Wrap the incoming (compressed file) in the compression stream using ' the compression stream constructor. Dim compStream As New DeflateStream(sourceFile, CompressionMode.Decompress) 'read the data byte by byte from the stream 'write it out directly to the destination file Dim myByte As Integer = compStream.ReadByte() While myByte <> -1 destFile.WriteByte(CType(myByte, Byte)) myByte = compStream.ReadByte() End While 'Close the streams sourceFile.Close() destFile.Close() End Sub
Conclusion and a few things to note.
- Whether you are compressing or decompressing data, the compression streams’ purpose is to wrap the stream that contains or will contain compressed data.
- If you compress already compressed data, the resulting compressed file might be larger than the original file
- If you compress very small files, the compressed file might also be larger than the uncompressed file due to the compression overhead. If you use larger files, the overhead is negligible.
- Do not forget to include the System.IO.Compression library
- Use Gzip if you wish to transfer the compressed files to be decompresed by a tool like gzip, and use deflate if you want use the files in your own system.
Hope this was helpful. In case of any issues, please do not hesitate to contact me.
Till next time, yours truly.