HuffmanTree 霍夫曼编码树

Huffman编码:为字母分配代码,代码长度取决于字母的相对使用频率或权重

Huffman树定义:Hf树每个叶子结点对应一个字母,叶子结点的权重就是对应字母的出现频率,它是一颗满树

意义:按照最小外部路径权重建立一棵树.一个叶结点的加权路径长度(WPL)定义为权重乘以深度.具有最小外部路径权重的二叉树就是,对于给定叶结点集合,WPL之和最小的二叉树,权重大的叶结点深度小,权重小的叶结点深度大.


实现

建立n个结点的Hf树.首先创建n个初始的Hf树,每个树只包含单一叶结点,然后取出其中权重最小的两棵树合成一个树,两个叶结点作为新结点的子结点,新结点的权重即为子结点权重和

本例使用基于堆的优先队列创建Hf树

类结构: HuffNode->HuffTree->Heap

Heap类(最小堆)

  1. import edu.princeton.cs.algs4.StdOut;
  2. public class Heap<Key extends Comparable<Key>> {
  3. private int n;//实际大小
  4. private int maxsize;//最大
  5. private Key[] heap;
  6. public Heap(Key[] heap, int n, int maxsize){
  7. this.n = n;
  8. this.maxsize = maxsize;
  9. this.heap = heap;
  10. }
  11. public int size(){ return n;}
  12. private boolean isleaf(int pos){ return pos >= n/2 && pos < n;}
  13. private int lc(int pos){ return 2*pos+1;}
  14. private int rc(int pos){ return 2*pos+2;}
  15. private int parent(int pos){return (pos-1)/2 ;}
  16. //构造堆有序
  17. public void buildHeap(){
  18. for (int i = n/2-1; i >= 0; i--){
  19. sink(i);
  20. }
  21. }
  22. private boolean less(int i, int j){ return heap[i].compareTo(heap[j])<0 ;}
  23. private void swap(int i, int j){
  24. Key temp = heap[i];
  25. heap[i] = heap[j];
  26. heap[j] = temp;
  27. }
  28. private void swim(int pos){
  29. while (pos > 1 && less(pos, parent(pos))){
  30. swap(pos, parent(pos));
  31. pos = parent(pos);
  32. }
  33. }
  34. private void sink(int pos){
  35. while (!isleaf(pos)){
  36. int j = lc(pos), rc = rc(pos);
  37. if (less(rc, j) && rc < n) j = rc;
  38. if (less(pos, j)) return;
  39. swap(j, pos);
  40. pos = j;
  41. }
  42. }
  43. public void insert(Key k){
  44. if (size() == maxsize){
  45. StdOut.println("the heap is full");
  46. return;
  47. }
  48. int curr = n++;
  49. heap[curr] = k;
  50. swim(curr);
  51. }
  52. public Key removefirst(){
  53. Key first = heap[0];
  54. swap(0, --n);
  55. if(n != 0){
  56. sink(0);
  57. heap[n] = null;
  58. }
  59. //若heap[0] = null,最后的hufftree也被清空
  60. return first;
  61. }
  62. }
  • 和algs4中Heap算法不同之处在于此处数组无空头元素,所以在索引和removefirst()等方法有边界变化;且此处buildHeap()方法针对数组直接构造堆有序

HuffNode(abstract)类

  1. public abstract class HuffNode<T> {
  2. public abstract int weight();
  3. public abstract boolean isLeaf();
  4. }

LeafNode叶子结点类

  1. public class LeafNode<T> extends HuffNode<T> {
  2. private T word;//字符
  3. private int weight;//频度
  4. public LeafNode(T word, int weight){
  5. this.word = word;
  6. this.weight = weight;
  7. }
  8. @Override
  9. public int weight(){ return weight;}
  10. @Override
  11. public boolean isLeaf(){return true;}
  12. }

IntlNode内部结点类

  1. public class IntlNode extends HuffNode {
  2. private int weight;
  3. private HuffNode lc;
  4. private HuffNode rc;
  5. public IntlNode(HuffNode lc, HuffNode rc){
  6. weight = lc.weight()+rc.weight();
  7. this.lc = lc;
  8. this.rc = rc;
  9. }
  10. @Override
  11. public boolean isLeaf(){ return false;}
  12. @Override
  13. public int weight(){ return weight;}
  14. public HuffNode lc(){ return lc;}
  15. public void setLc(HuffNode lc){this.lc = lc; }
  16. public HuffNode rc(){ return rc;}
  17. public void setRc(HuffNode rc){this.rc = rc;}
  18. }

HuffTree类

  1. public class HuffTree<T> implements Comparable<HuffTree<T>>{
  2. private HuffNode<T> root;
  3. //内部结点Hf树
  4. public HuffTree(HuffTree<T> lc, HuffTree<T> rc){
  5. root = new IntlNode(lc.root(), rc.root());
  6. }
  7. //叶结点Hf树
  8. public HuffTree(T word, int wgt){
  9. root = new LeafNode<T>(word, wgt);
  10. }
  11. public int weight(){ return root.weight();}
  12. public HuffNode root(){ return root;}
  13. //实现内部比较方法
  14. @Override
  15. public int compareTo(HuffTree<T> another){
  16. if (root.weight() < another.weight()) return -1;
  17. else if (root.weight() > another.weight()) return 1;
  18. else return 0;
  19. }
  20. }
  • implements Comparables接口,实现CompareTo()方法,若泛型HuffTree,则compareTo中参数也必须为HuffTree
  • 实际上比较的就是HuffTree的权重和,所以HuffTree类实现接口和compareTo()方法
  • Comparator和Comparable之间的区别

test类和静态buildHuff()

  1. import edu.princeton.cs.algs4.StdIn;
  2. public class test<T> {
  3. public static void main(String[] args){
  4. HuffTree treeArray[] = new HuffTree[100];
  5. int i = 0;
  6. while (! StdIn.isEmpty()){
  7. treeArray[i++] = new HuffTree(StdIn.readChar(), StdIn.readInt());
  8. StdIn.readChar();
  9. }
  10. buildHuff(treeArray,i);
  11. }
  12. public static HuffTree buildHuff(HuffTree [] treeArray, int count){
  13. Heap forest = new Heap(treeArray, count, count);
  14. forest.buildHeap();
  15. HuffTree temp1, temp2, temp3 = null;
  16. while (forest.size() > 1){
  17. temp1 = (HuffTree) forest.removefirst();
  18. temp2 = (HuffTree) forest.removefirst();
  19. temp3 = new HuffTree(temp1, temp2);
  20. forest.insert(temp3);
  21. }
  22. return temp3;
  23. }
  24. }
  • 第一个readChar()读入word,第二个用readInt()读入weight则中间的空格会被忽略,需要额外一次readChar()读取两个word组中间的空格或回车