jueves, 11 de abril de 2013

Huffman Coding-Generating a Huffman tree

The Huffman coding algorithm its a enconding algorithm used for lossless data compression. It is used to build binary trees with a minimum weighted path lenght from given weights of values or frequencys. Huffman data compression is a simplified version of what some daily use programs like winzip or tar do.


¿How to build the tree?

I will use a tutorial to explain the process of building the tree if you want to see the source tutorial go to Tutorial

1. From a set of numbers or letters we create a table in one side we put the letters or numbers(value) on the other side the frequency or apperance.

Example:




2. We take our table or list and sort it by frequency from the lowest value to the highest value.



3. After we sort the list we will take lowest elements and add them to create a parent node just like this:


as you can see from the previous table we took values a and b which have the lowest frequency values from the table and we join them together to create a parent node with the value ab and frequency 12.

4. After creating the node we remove the values used to create it from the list or table and add the new parent node into the list, so we will end up with this:



we can see that the value and frequency of a and b not longer appear on the table and the new node with the new frequency appears on the list.

5. Repeat the process combining the two lowest elements to get something like this:





new list:





6. Keep doing this until you end up with only one element left in the list:





As you can see we end up with one element 102 so we are done.

7. The last element becomes the root of the huffman tree.


Coding and Decoding


1. To generate a Huffman code from the tree we will tranform all branches into values(starting from the top) left ones take a 0 value and right ones a 1 value.

            
2. So if we want the encoding value of (a) we go thru the tree and the value will be 0010.
and with these values we can see or represent the bits used to calculate the percentage of compression rate each letter has.

Video





If you still dont understand how to do it look at this video:

Code



As you can see the code isn't complete i had some troubles creating the nodes adding the lowest values.

Sources:




1 comentario:

  1. Pues, de lo de análisis de peor caso y caso típico no hay casi nada. Van 3 pts por el código y 2 pts por el reporte.

    ResponderEliminar