Archive | January, 2009

Compressing and Decompressing files using GZip & Deflate streams

29 Jan

The .Net Framework I/O system offers two methods for compressing data. DEFLATE and GZIP.  These methods are very similar and are industry standard compression algorithms  which are implemented in the DeflateStream and GZipStream classes. They support both compression and decompression of data and are limited to compression of uncompressed data of up to a maximum of 4 GB.

Which method should I use?

The only difference between the two methods is that GZip allows for headers that include extra info that can be helpful in decompressing the file, for example with the common tool gzip. This small overhead makes files compressed by Deflate slightly smaller than those compressed by the Gzip method.

How to Compress Data.

The compression streams differ from other file streams in that instead of writting to a resource like a file (Filestream)or into memory (memoryStream), it writtes the data into another stream. The compression stream takes in data like any other stream, but when it writes, it pushes the data into another stream in the compressed or decompressed format.

I wrote the following self explanatory, working piece of code in vb.net to show you exactly how this happens. In this example I’ll use the Deflate method.

Public Sub Compress(ByVal infile As String, ByVal outfile As String)

	'Open the file to compress and the file you are going to write to

        Dim sourceFile As FileStream = File.OpenRead(inFile)
        Dim destFile As FileStream = File.Create(outfile)

	' Wrap the outgoing(destination stream) in the compression stream using
        ' the compression stream constructor.

        Dim compStream As New DeflateStream(destFile, CompressionMode.Compress)

	'read the data byte by byte from the source file
	'feed it into the compressing stream

        Dim myByte As Integer = sourceFile.ReadByte()
        While myByte <> -1
            compStream.WriteByte(CType(myByte, Byte))
            myByte = sourceFile.ReadByte()
        End While

	'Close the streams

        sourceFile.Close()
        destFile.Close()

    End Sub

How to Decompress Data

In decompressing, the flow is the same but the logic is slightly different. In this case, you wrap the sourcefile ( the compressed file) in the compression stream because this is where the data is coming from, specify the Compressionmode.Decompress to show that you are decompressing the wrapped stream. You also write directly to the destination file instead of pushing the data into the stream.

The following code does just this.

Public Sub Decompress(ByVal inFile As String, ByVal outFile As String)

	'Open the file to decompress and the file you are going to write to

        Dim sourceFile As FileStream = File.OpenRead(inFile)
        Dim destFile As FileStream = File.Create(outFile)

        ' Wrap the incoming (compressed file) in the compression stream using
        ' the compression stream constructor.

        Dim compStream As New DeflateStream(sourceFile, CompressionMode.Decompress)

	'read the data byte by byte from the stream
	'write it out directly to the destination file

        Dim myByte As Integer = compStream.ReadByte()
        While myByte <> -1
            destFile.WriteByte(CType(myByte, Byte))
            myByte = compStream.ReadByte()
        End While

	'Close the streams

        sourceFile.Close()
        destFile.Close()

    End Sub

 Conclusion and a few things to note.

  • Whether you are compressing or decompressing data, the compression streams’  purpose is to wrap the stream that contains or will contain compressed data.
  • If you compress already compressed data, the resulting compressed file might be larger than the original file
  • If you compress very small files, the compressed file might also be larger than the uncompressed file due to the compression overhead. If you use larger files, the overhead is negligible.
  • Do not forget to include the System.IO.Compression library
  • Use Gzip if you wish to transfer the compressed files to be decompresed by a tool like gzip, and use deflate if you want use the files in your own system.

Hope this was helpful. In case of any issues, please do not hesitate to contact me.

Till next time, yours truly.

For vs For Each loops – Performance Issues.

26 Jan

I’m sure one of you has come to a point where they thought, should I use a For loop or a For Each loop? And issues like which one is faster or which one is more effective usually arise. Well,  a real developer should be thinking like “I can measure this.”, which is what I’m going to do here today.

The speed of a For or For Each loop usually depends on the number of items in the list, the type of list ( array, arraylist, generic list, dictionary) etc.

I wrote this peace of code here (VB.Net) and used an arraylist and a generic list, both filled with integers from 0 to 1000 and measured the speed of writting all the numbers to the console window using For and For Each loops.

Imports System.Diagnostics
Module Sample

    Sub Main()
        'Dim list As New ArrayList
        Dim list As New List(Of Integer)
        For i As Integer = 0 To 1000
            list.Add(i)
        Next

        'Timer starts here
        Dim t As New Stopwatch
        t.Start()

        For Each i As Integer In list
            Console.WriteLine(i.ToString)
        Next

        ' For i As Integer = 0 To list.Count - 1
        'Console.WriteLine(list(i).ToString)
        'Next

        t.Stop()
        Console.WriteLine("LOOP time: " & t.ElapsedMilliseconds)

        Console.Read()
    End Sub

End Module

Remove the comments appropriately if you are going to test it.

On my machine I got these statistics in milliseconds.

loops

Now, these results does not necesarly mean that generic lists are slower than arraylists or For Each loops are slower than For loops. That is not the point of this post at all.

The point is just simple. You should learn to put your concerns to the test. So I welcome you to test your loops and see for yourself which one is better than the other, though personally I never even think about it. I just use the one I think is more appropriate depending on the situation.

Till next time.

Yours truly.

Back from Vacation

19 Jan

Yeah. I was away for a month. That explains the silence on the blog. After six months of intoxicating myself with code and work, I had to go on a little code detox and now I am back and fresh for yet another round of serious intoxication.
Cheers.

Follow

Get every new post delivered to your Inbox.