缘起

最近阅读<<我的第一本算法书>>(【日】石田保辉;宫崎修一)
本系列笔记拟采用golang练习之

堆排序

  1. 堆排序的特点是利用了数据结构中的堆。
  2. 首先,在堆中存储所有的数据,并按降序来构建堆。
  3. 为了排序,需要再从堆中把数据一个个取出来。
  4. 堆排序一开始需要将n个数据存进堆里,所需时间为O(nlogn)。
  5. 每轮取出最大的数据并重构堆所需要的时间为O(logn)。
  6. 由于总共有n轮,所以重构后排序的时间也是O(nlog n)。
  7. 因此,整体来看堆排序的时间复杂度为O(nlog n)。
  8. 这样来看,堆排序的运行时间比之前讲到的冒泡排序、选择排序、插入排序的时间O(n^2)都要短。
  9. 摘自 <<我的第一本算法书>> 【日】石田保辉;宫崎修一

目标

  • 构造一个堆, 并测试堆排序的效率

设计

  • IHeap: 定义堆的接口
  • tArrayHeap
    • 基于数组的堆的实现.
    • 堆是一种特殊的二叉完全树: 父节点总是小于任意的子节点
    • 堆的父子节点的索引存在线性关系, 以0下标为例
      • parent.index = (node.index - 1) / 2
      • leftChild.index = node.index*2 + 1
      • rightChild.index = leftChild.index + 1
  • ISorter:
    • 定义排序接口
    • 定义值比较函数以兼容任意值类型
    • 通过调整比较函数可实现倒序输出
  • tHeapSort
    • 利用tArrayHeap进行堆排序
    • 先把整个数组push进heap
    • 然后逐个pop出来, 即可

单元测试

heap_sort_test.go, 测试过程与之前的冒泡, 选择, 插入等类似, 但测试数组扩大到10万元素.

  1. package sorting
  2. import (
  3. "fmt"
  4. "learning/gooop/sorting"
  5. "learning/gooop/sorting/heap_sort"
  6. "math/rand"
  7. "testing"
  8. "time"
  9. )
  10. func Test_HeapSort(t *testing.T) {
  11. fnAssertTrue := func(b bool, msg string) {
  12. if !b {
  13. t.Fatal(msg)
  14. }
  15. }
  16. reversed := false
  17. fnCompare := func(a interface{}, b interface{}) sorting.CompareResult {
  18. i1 := a.(int)
  19. i2 := b.(int)
  20. if i1 < i2 {
  21. if reversed {
  22. return sorting.GREATER
  23. } else {
  24. return sorting.LESS
  25. }
  26. } else if i1 == i2 {
  27. return sorting.EQUAL
  28. } else {
  29. if reversed {
  30. return sorting.LESS
  31. } else {
  32. return sorting.GREATER
  33. }
  34. }
  35. }
  36. fnTestSorter := func(sorter sorting.ISorter) {
  37. reversed = false
  38. // test simple array
  39. samples := []interface{} { 2,3,1,5,4,7,6 }
  40. sorter.Sort(samples, fnCompare)
  41. fnAssertTrue(fmt.Sprintf("%v", samples) == "[1 2 3 4 5 6 7]", "expecting 1,2,3,4,5,6,7")
  42. t.Log("pass sorting [2 3 1 5 4 7 6] >> [1 2 3 4 5 6 7]")
  43. // test 10000 items sorting
  44. rnd := rand.New(rand.NewSource(time.Now().UnixNano()))
  45. sampleCount := 100*1000
  46. t.Logf("prepare large array with %v items", sampleCount)
  47. samples = make([]interface{}, sampleCount)
  48. for i := 0;i < sampleCount;i++ {
  49. samples[i] = rnd.Intn(sampleCount*10)
  50. }
  51. t.Logf("sorting large array with %v items", sampleCount)
  52. t0 := time.Now().UnixNano()
  53. sorter.Sort(samples, fnCompare)
  54. cost := time.Now().UnixNano() - t0
  55. for i := 1;i < sampleCount;i++ {
  56. fnAssertTrue(fnCompare(samples[i-1], samples[i]) != sorting.GREATER, "expecting <=")
  57. }
  58. t.Logf("end sorting large array, cost = %v ms", cost / 1000000)
  59. // test 0-20
  60. sampleCount = 20
  61. t.Log("sorting 0-20")
  62. samples = make([]interface{}, sampleCount)
  63. for i := 0;i < sampleCount;i++ {
  64. for {
  65. p := rnd.Intn(sampleCount)
  66. if samples[p] == nil {
  67. samples[p] = i
  68. break
  69. }
  70. }
  71. }
  72. t.Logf("unsort = %v", samples)
  73. sorter.Sort(samples, fnCompare)
  74. t.Logf("sorted = %v", samples)
  75. fnAssertTrue(fmt.Sprintf("%v", samples) == "[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]", "expecting 0-20")
  76. t.Log("pass sorting 0-20")
  77. // test special
  78. samples = []interface{} {}
  79. sorter.Sort(samples, fnCompare)
  80. t.Log("pass sorting []")
  81. samples = []interface{} { 1 }
  82. sorter.Sort(samples, fnCompare)
  83. t.Log("pass sorting [1]")
  84. samples = []interface{} { 3,1 }
  85. sorter.Sort(samples, fnCompare)
  86. fnAssertTrue(fmt.Sprintf("%v", samples) == "[1 3]", "expecting 1,3")
  87. t.Log("pass sorting [1 3]")
  88. reversed = true
  89. samples = []interface{} { 2, 3,1 }
  90. sorter.Sort(samples, fnCompare)
  91. fnAssertTrue(fmt.Sprintf("%v", samples) == "[3 2 1]", "expecting 3,2,1")
  92. t.Log("pass sorting [3 2 1]")
  93. }
  94. t.Log("\ntesting HeapSort")
  95. fnTestSorter(heap_sort.HeapSort)
  96. }

测试输出

10万元素排序只需数十毫秒,
确实比冒泡, 选择, 插入等排序有指数级的提升.
代价是多占用了一个堆的空间.

  1. $ go test -v heap_sort_test.go
  2. === RUN Test_HeapSort
  3. heap_sort_test.go:109:
  4. testing HeapSort
  5. heap_sort_test.go:48: pass sorting [2 3 1 5 4 7 6] >> [1 2 3 4 5 6 7]
  6. heap_sort_test.go:53: prepare large array with 100000 items
  7. heap_sort_test.go:59: sorting large array with 100000 items
  8. heap_sort_test.go:66: end sorting large array, cost = 67 ms
  9. heap_sort_test.go:70: sorting 0-20
  10. heap_sort_test.go:81: unsort = [4 15 1 13 19 14 11 12 9 8 3 7 16 18 2 10 17 6 0 5]
  11. heap_sort_test.go:84: sorted = [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
  12. heap_sort_test.go:86: pass sorting 0-20
  13. heap_sort_test.go:91: pass sorting []
  14. heap_sort_test.go:95: pass sorting [1]
  15. heap_sort_test.go:100: pass sorting [1 3]
  16. heap_sort_test.go:106: pass sorting [3 2 1]
  17. --- PASS: Test_HeapSort (0.07s)
  18. PASS
  19. ok command-line-arguments 0.076s

IHeap.go

定义堆的接口

  1. package heap_sort
  2. type IHeap interface {
  3. Size() int
  4. IsEmpty() bool
  5. IsNotEmpty() bool
  6. Push(value interface{})
  7. Pop() (error, interface{})
  8. }

tArrayHeap

  • 基于数组的堆的实现.
  • 堆是一种特殊的二叉完全树: 父节点总是小于任意的子节点
  • 堆的父子节点的索引存在线性关系, 以0下标为例
    • parent.index = (node.index - 1) / 2
    • leftChild.index = node.index*2 + 1
    • rightChild.index = leftChild.index + 1 ```go package heap_sort

import ( “errors” “learning/gooop/sorting” )

type tArrayHeap struct { comparator sorting.CompareFunction items []interface{} size int version int64 }

func newArrayHeap(comparator sorting.CompareFunction) IHeap { return &tArrayHeap{ comparator: comparator, items: make([]interface{}, 0), size: 0, version: 0, } }

func (me *tArrayHeap) Size() int { return me.size }

func (me *tArrayHeap) IsEmpty() bool { return me.size <= 0 }

func (me *tArrayHeap) IsNotEmpty() bool { return !me.IsEmpty() }

func (me *tArrayHeap) Push(value interface{}) { me.version++

  1. me.ensureSize(me.size + 1)
  2. me.items[me.size] = value
  3. me.size++
  4. me.shiftUp(me.size - 1)
  5. me.version++

}

func (me *tArrayHeap) ensureSize(size int) { for ;len(me.items) < size; { me.items = append(me.items, nil) } }

func (me *tArrayHeap) parentOf(i int) int { return (i - 1) / 2 }

func (me tArrayHeap) leftChildOf(i int) int { return i2 + 1 }

func (me *tArrayHeap) rightChildOf(i int) int { return me.leftChildOf(i) + 1 }

func (me *tArrayHeap) last() (i int, v interface{}) { if me.IsEmpty() { return -1, nil }

  1. i = me.size - 1
  2. v = me.items[i]
  3. return i,v

}

func (me *tArrayHeap) shiftUp(i int) { if i <= 0 { return } v := me.items[i]

  1. pi := me.parentOf(i)
  2. pv := me.items[pi]
  3. if me.comparator(v, pv) == sorting.LESS {
  4. me.items[pi], me.items[i] = v, pv
  5. me.shiftUp(pi)
  6. }

}

func (me *tArrayHeap) Pop() (error, interface{}) { if me.IsEmpty() { return gNoMoreElementsError, nil }

  1. me.version++
  2. top := me.items[0]
  3. li, lv := me.last()
  4. me.items[0] = nil
  5. me.size--
  6. if me.IsEmpty() {
  7. return nil, top
  8. }
  9. me.items[0] = lv
  10. me.items[li] = nil
  11. me.shiftDown(0)
  12. me.version++
  13. return nil, top

}

func (me *tArrayHeap) shiftDown(i int) { pv := me.items[i] ok, ci, cv := me.minChildOf(i) if ok && me.comparator(cv, pv) == sorting.LESS { me.items[i], me.items[ci] = cv, pv me.shiftDown(ci) } }

func (me *tArrayHeap) minChildOf(p int) (ok bool, i int, v interface{}) { li := me.leftChildOf(p) if li >= me.size { return false, 0, nil } lv := me.items[li]

  1. ri := me.rightChildOf(p)
  2. if ri >= me.size {
  3. return true, li, lv
  4. }
  5. rv := me.items[ri]
  6. if me.comparator(lv, rv) == sorting.LESS {
  7. return true, li, lv
  8. } else {
  9. return true, ri, rv
  10. }

}

var gNoMoreElementsError = errors.New(“no more elements”)

  1. <a name="To6Sq"></a>
  2. # ISorter
  3. - 定义排序接口
  4. - 定义值比较函数以兼容任意值类型
  5. - 通过调整比较函数可实现倒序输出
  6. ```go
  7. package sorting
  8. type ISorter interface {
  9. Sort(data []interface{}, comparator CompareFunction) []interface{}
  10. }
  11. type CompareFunction func(a interface{}, b interface{}) CompareResult
  12. type CompareResult int
  13. const LESS CompareResult = -1
  14. const EQUAL CompareResult = 0
  15. const GREATER CompareResult = 1

tHeapSort

  • 堆排序器, 实现ISorter接口
  • 利用tArrayHeap进行堆排序
  • 先把整个数组push进heap
  • 然后逐个pop出来, 即可 ```go package heap_sort

import “learning/gooop/sorting”

type tHeapSort struct { }

func newHeapSort() sorting.ISorter { return &tHeapSort{} }

func (me *tHeapSort) Sort(data []interface{}, comparator sorting.CompareFunction) []interface{} { heap := newArrayHeap(comparator) for _,it := range data { heap.Push(it) }

  1. for i,_ := range data {
  2. e,v := heap.Pop()
  3. if e == nil {
  4. data[i] = v
  5. } else {
  6. panic(e)
  7. }
  8. }
  9. return data

}

var HeapSort = newHeapSort() ```

(end)