1. 基本介绍

  • tensorflow 设备内存管理模块实现了一个 best-fit with coalescing 算法(后文简称 bfc 算法)。
  • bfc 算法是 Doung Lea’s malloc(dlmalloc) 的一个非常简单的版本。
  • 它具有内存分配、释放、碎片管理等基本功能。

2. bfc基本算法思想

1. 数据结构

  • 整个内存空间由一个按基址升序排列的 Chunk 双向链表来表示,它们的直接前趋和后继必须在地址连续的内存空间。Chunk 结构体里含有实际大小、请求大小、是否被占用、基址、直接前趋、直接后继、Bin 索引等信息。

TensorFlow内存管理bfc算法 - 图1

2. 申请

  • 用户申请一个内存块(malloc)。根据 chunk 双链表找到一个合适的内存块,如果该内存块的大小是用户申请的大小的二倍以上,那么就将该内存块切分成两块,这就是 split 操作。
    • 返回其中一块给用户,并将该内存块标识为占用
    • Spilt 操作会新增一个 chunk,所以需要修改 chunk 双链表以维持前驱和后继关系
  • 如果用户申请 512 的空间,正好有一块 1024 的 chunk2 是空闲的,由于 1024/512 =2,所以 chunk2 被 split 为 2 块:chunk2_1 和 chunk2_2。返回 chunk2_1 给用户并将其标志位占用状态。

3. 释放

  • 用户释放一个内存块(free)。先将该块标记为空闲。然后根据 chunk 数据结构中的信息找到其前驱和后继内存块。如果前驱和后继块中有空闲的块,那么将刚释放的块和空闲的块合并成一个更大的 chunk(这就是 merge 操作,合并当前块和其前后的空闲块)。再修改双链表结构以维持前驱后继关系。这就做到了内存碎片的回收。
  • 如果用户要 free chunk3,由于 chunk3 的前驱 chunk2 也是空闲的,所以将 chunk2 和 chunk3 合并得到一个新的 chunk2’,大小为 chunk2 和 chunk3 之和。

3. bins

1. bins 数据结构

  • bfc 算法采取的是被动分块的策略。最开始整个内存是一个 chunk,随着用户申请空间的次数增加,最开始的大 chunk 会被不断的 split 开来,从而产生越来越多的小 chunk。当 chunk 数量很大时,为了寻找一个合适的内存块而遍历双链表无疑是一笔巨大的开销。为了实现对空闲块的高效管理,bfc 算法设计了 bin 这个抽象数据结构。
  • 每个 bin 都有一个 size 属性,一个 bin 是一个拥有 chunk size >= binsize 的空闲 chunk 的集合。集合中的 chunk 按照 chunk size 的升序组织成单链表。bfc 算法维护了一个 bin 的集合:bins。它由多个 bin 以及从属于每个 bin 的 chunks 组成。内存中所有的空闲 chunk 都由 bins 管理。

TensorFlow内存管理bfc算法 - 图2

  • 图中每一列表示一个 bin,列首方格中的数字表示 bin 的 size。bin size 的大小都是 256 的 2^n 的倍。每个 bin 下面挂载了一系列的空闲 chunk,每个 chunk 的 chunk size 都大于等于所属的 bin 的 bin size,按照 chunk size 的升序挂载成单链表。

2. bins 操作

  • bfc 算法针对 bins 这个集合设计了三个操作:search、insert、delete。
    • search
      给定一个 chunk size,从 bins 中找到大于等于该 chunksize 的最小的那个空闲 chunk。Search 操作具体流程如下。如果 bin 以数组的形式组织,那么可以从 index = chunk size /256 >>2 的那个 bin 开始查找。最好的情况是开始查找的那个 bin 的 chunk 链表非空,那么直接返回链表头即可。这种情况时间复杂度是常数级的。最坏的情况是遍历 bins 数组中所有的 bin。对于一般大小的内存来说,bins 数组元素非常少,比如 4G 空间只需要 23 个 bin 就足够了(256 * 2 ^ 23 > 4G),因此也很快能返回结果。总体来说 search 操作是非常高效的。对于固定大小内存来说,查找时间是常数量级的。
    • insert
      将一个空闲的 chunk 插入到一个 bin 所挂载的 chunk 链表中,同时需要维持 chunk 链表的升序关系。具体流程是直接将 chunk 插入到 index = chunk size /256 >>2 的那个 bin 中即可。
    • delete
      将一个空闲的 chunk 从 bins 中移除。

4. 总结

  • 将内存分块管理,按块进行空间分配和释放。
  • 通过 split 操作将大内存块分解成用户需要的小内存块。
  • 通过 merge 操作合并小的内存块,做到内存碎片回收
  • 通过 bin 这个抽象数据结构实现对空闲块高效管理。

5. 代码分析

1. 代码地址

2. 数据结构

  • Chunk
  1. static const int kInvalidChunkHandle = -1;
  2. ...
  3. struct Chunk {
  4. size_t size = 0; // Full size of buffer.
  5. // We sometimes give chunks that are larger than needed to reduce
  6. // fragmentation. requested_size keeps track of what the client
  7. // actually wanted so we can understand whether our splitting
  8. // strategy is efficient.
  9. size_t requested_size = 0;
  10. // allocation_id is set to -1 when the chunk is not in use. It is assigned a
  11. // value greater than zero before the chunk is returned from
  12. // AllocateRaw, and this value is unique among values assigned by
  13. // the parent allocator.
  14. int64 allocation_id = -1;
  15. void* ptr = nullptr; // pointer to granted subbuffer.
  16. // If not kInvalidChunkHandle, the memory referred to by 'prev' is directly
  17. // preceding the memory used by this chunk. E.g., It should start
  18. // at 'ptr - prev->size'
  19. ChunkHandle prev = kInvalidChunkHandle;
  20. // If not kInvalidChunkHandle, the memory referred to by 'next' is directly
  21. // following the memory used by this chunk. E.g., It should be at
  22. // 'ptr + size'
  23. ChunkHandle next = kInvalidChunkHandle;
  24. // What bin are we in?
  25. BinNum bin_num = kInvalidBinNum;
  26. bool in_use() const { return allocation_id != -1; }
  27. };
  • Bin
  1. // A Bin is a collection of similar-sized free chunks.
  2. struct Bin {
  3. // All chunks in this bin have >= bin_size memory.
  4. size_t bin_size = 0;
  5. struct ChunkComparator {
  6. explicit ChunkComparator(BFCAllocator* allocator)
  7. : allocator_(allocator) {}
  8. // Sort first by size and then use pointer address as a tie breaker.
  9. bool operator()(const ChunkHandle ha,
  10. const ChunkHandle hb) const NO_THREAD_SAFETY_ANALYSIS {
  11. const Chunk* a = allocator_->ChunkFromHandle(ha);
  12. const Chunk* b = allocator_->ChunkFromHandle(hb);
  13. if (a->size != b->size) {
  14. return a->size < b->size;
  15. }
  16. return a->ptr < b->ptr;
  17. }
  18. private:
  19. BFCAllocator* allocator_; // The parent allocator
  20. };
  21. typedef std::set<ChunkHandle, ChunkComparator> FreeChunkSet;
  22. // List of free chunks within the bin, sorted by chunk size.
  23. // Chunk * not owned.
  24. FreeChunkSet free_chunks;
  25. Bin(BFCAllocator* allocator, size_t bs)
  26. : bin_size(bs), free_chunks(ChunkComparator(allocator)) {}
  27. };
  • AllocationRegion
    • AllocationRegion 给一个连续的内存区域做指针到 ChunkHandle 的映射。
  • RegionManager
    • RegionManager 聚集了一个或多个 AllocationRegion,并提供一个从指针到基础 ChunkHandle 的间接层,这个间接层可在多个不连续的内存区域进行分配。

3. 分配大小

  • 将每次分配的内存大小调整为 kMinAllocationSize 的 N 倍,这样所有内存地址都是很好地按字节对齐了。
  1. // kMinAllocationSize = 256
  2. static const size_t kMinAllocationBits = 8;
  3. static const size_t kMinAllocationSize = 1 << kMinAllocationBits;
  4. ...
  5. size_t BFCAllocator::RoundedBytes(size_t bytes) {
  6. size_t rounded_bytes =
  7. (kMinAllocationSize *
  8. ((bytes + kMinAllocationSize - 1) / kMinAllocationSize));
  9. DCHECK_EQ(size_t{0}, rounded_bytes % kMinAllocationSize);
  10. return rounded_bytes;
  11. }

4. 初始化 bin

  1. typedef int BinNum;
  2. static const int kInvalidBinNum = -1;
  3. static const int kNumBins = 21;
  4. ...
  5. // 二进制2^8往左移0,1,2位
  6. // (static_cast<size_t>(256) << 0) = 256
  7. // (static_cast<size_t>(256) << 1) = 512
  8. // (static_cast<size_t>(256) << 2) = 1024
  9. size_t BinNumToSize(BinNum index) {
  10. return static_cast<size_t>(256) << index;
  11. }
  12. ...
  13. char bins_space_[sizeof(Bin) * kNumBins];
  14. // Map from bin size to Bin
  15. Bin* BinFromIndex(BinNum index) {
  16. return reinterpret_cast<Bin*>(&(bins_space_[index * sizeof(Bin)]));
  17. }
  18. ...
  19. // We create bins to fit all possible ranges that cover the
  20. // memory_limit_ starting from allocations up to 256 bytes to
  21. // allocations up to (and including) the memory limit.
  22. for (BinNum b = 0; b < kNumBins; b++) {
  23. size_t bin_size = BinNumToSize(b);
  24. VLOG(1) << "Creating bin of max chunk size "
  25. << strings::HumanReadableNumBytes(bin_size);
  26. new (BinFromIndex(b)) Bin(this, bin_size);
  27. CHECK_EQ(BinForSize(bin_size), BinFromIndex(b));
  28. CHECK_EQ(BinForSize(bin_size + 255), BinFromIndex(b));
  29. CHECK_EQ(BinForSize(bin_size * 2 - 1), BinFromIndex(b));
  30. if (b + 1 < kNumBins) {
  31. CHECK_NE(BinForSize(bin_size * 2), BinFromIndex(b));
  32. }
  33. }

5. 查找 bin

  1. // 求属于第几个bin
  2. BinNum BinNumForSize(size_t bytes) {
  3. uint64 v = std::max<size_t>(bytes, 256) >> kMinAllocationBits;
  4. int b = std::min(kNumBins - 1, Log2FloorNonZero(v));
  5. return b;
  6. }
  7. // 最高位非零的二进制位数,eg: 0001 0101B 为5
  8. inline int Log2FloorNonZero(uint64 n) {
  9. #if defined(__GNUC__)
  10. return 63 ^ __builtin_clzll(n);
  11. #elif defined(PLATFORM_WINDOWS)
  12. unsigned long index;
  13. _BitScanReverse64(&index, n);
  14. return index;
  15. #else
  16. int r = 0;
  17. while (n > 0) {
  18. r++;
  19. n >>= 1;
  20. }
  21. return r;
  22. #endif
  23. }

6. 查找 Chunk

  1. // 先加锁
  2. mutex_lock l(lock_);
  3. void* ptr = FindChunkPtr(bin_num, rounded_bytes, num_bytes);
  4. if (ptr != nullptr) {
  5. return ptr;
  6. }
  7. // FindChunkPtr函数内部
  8. void* BFCAllocator::FindChunkPtr(BinNum bin_num, size_t rounded_bytes,
  9. size_t num_bytes) {
  10. // First identify the first bin that could satisfy rounded_bytes.
  11. for (; bin_num < kNumBins; bin_num++) {
  12. // Start searching from the first bin for the smallest chunk that fits
  13. // rounded_bytes.
  14. Bin* b = BinFromIndex(bin_num);
  15. for (auto citer = b->free_chunks.begin(); citer != b->free_chunks.end();
  16. ++citer) {
  17. // 从之前得到的Bin索引开始,查找合适的空闲Chunk:
  18. const BFCAllocator::ChunkHandle h = (*citer);
  19. BFCAllocator::Chunk* chunk = ChunkFromHandle(h);
  20. DCHECK(!chunk->in_use());
  21. if (chunk->size >= rounded_bytes) {
  22. // We found an existing chunk that fits us that wasn't in use, so remove
  23. // it from the free bin structure prior to using.
  24. RemoveFreeChunkIterFromBin(&b->free_chunks, citer);
  25. // If we can break the size of the chunk into two reasonably
  26. // large pieces, do so.
  27. //
  28. // TODO(vrv): What should be the criteria when deciding when
  29. // to split?
  30. // 具体实现后面会分析
  31. if (chunk->size >= rounded_bytes * 2) {
  32. SplitChunk(h, rounded_bytes);
  33. chunk = ChunkFromHandle(h); // Update chunk pointer in case it moved
  34. }
  35. // The requested size of the returned chunk is what the user
  36. // has allocated.
  37. chunk->requested_size = num_bytes;
  38. // Assign a unique id and increment the id counter, marking the
  39. // chunk as being in use.
  40. chunk->allocation_id = next_allocation_id_++;
  41. // Update stats.
  42. ++stats_.num_allocs;
  43. stats_.bytes_in_use += chunk->size;
  44. stats_.max_bytes_in_use =
  45. std::max(stats_.max_bytes_in_use, stats_.bytes_in_use);
  46. stats_.max_alloc_size =
  47. std::max<std::size_t>(stats_.max_alloc_size, chunk->size);
  48. VLOG(4) << "Returning: " << chunk->ptr;
  49. if (VLOG_IS_ON(4)) {
  50. LOG(INFO) << "A: " << RenderOccupancy();
  51. }
  52. return chunk->ptr;
  53. }
  54. }
  55. }
  56. return nullptr;
  57. }

7. 拆分 Chunk

  • 如果 Chunk 的大小大于等于申请内存大小的 2 倍,那么将该 Chunk 拆分成 2 个:第一个 Chunk 的大小等于申请内存大小,第二个 Chunk 作为它的直接后继。
  1. if (chunk->size >= rounded_bytes * 2) {
  2. SplitChunk(h, rounded_bytes);
  3. chunk = ChunkFromHandle(h); // Update chunk pointer in case it moved
  4. }
  5. void BFCAllocator::SplitChunk(BFCAllocator::ChunkHandle h, size_t num_bytes) {
  6. // Allocate the new chunk before we do any ChunkFromHandle
  7. ChunkHandle h_new_chunk = AllocateChunk();
  8. Chunk* c = ChunkFromHandle(h);
  9. CHECK(!c->in_use() && (c->bin_num == kInvalidBinNum));
  10. // Create a new chunk starting num_bytes after c
  11. BFCAllocator::Chunk* new_chunk = ChunkFromHandle(h_new_chunk);
  12. new_chunk->ptr = static_cast<void*>(static_cast<char*>(c->ptr) + num_bytes);
  13. region_manager_.set_handle(new_chunk->ptr, h_new_chunk);
  14. // Set the new sizes of the chunks.
  15. new_chunk->size = c->size - num_bytes;
  16. c->size = num_bytes;
  17. // The new chunk is not in use.
  18. new_chunk->allocation_id = -1;
  19. // Maintain the pointers.
  20. // c <-> c_neighbor becomes
  21. // c <-> new_chunk <-> c_neighbor
  22. BFCAllocator::ChunkHandle h_neighbor = c->next;
  23. new_chunk->prev = h;
  24. new_chunk->next = h_neighbor;
  25. c->next = h_new_chunk;
  26. if (h_neighbor != kInvalidChunkHandle) {
  27. Chunk* c_neighbor = ChunkFromHandle(h_neighbor);
  28. c_neighbor->prev = h_new_chunk;
  29. }
  30. // Add the newly free chunk to the free bin.
  31. InsertFreeChunkIntoBin(h_new_chunk);
  32. }

8. 回收 chunk

  • 加锁,获得 ChunkHandle

    1. mutex_lock l(lock_);
    2. BFCAllocator::ChunkHandle h = region_manager_.get_handle(ptr);
    3. FreeAndMaybeCoalesce(h);
  • FreeAndMaybeCoalesce

  1. void BFCAllocator::FreeAndMaybeCoalesce(BFCAllocator::ChunkHandle h) {
  2. Chunk* c = ChunkFromHandle(h);
  3. CHECK(c->in_use() && (c->bin_num == kInvalidBinNum));
  4. // Mark the chunk as no longer in use
  5. c->allocation_id = -1;
  6. // Updates the stats.
  7. stats_.bytes_in_use -= c->size;
  8. // This chunk is no longer in-use, consider coalescing the chunk
  9. // with adjacent chunks.
  10. ChunkHandle chunk_to_reassign = h;
  11. // If the next chunk is free, coalesce the two
  12. if (c->next != kInvalidChunkHandle) {
  13. Chunk* cnext = ChunkFromHandle(c->next);
  14. if (!cnext->in_use()) {
  15. // VLOG(8) << "Chunk at " << cnext->ptr << " merging with c " <<
  16. // c->ptr;
  17. chunk_to_reassign = h;
  18. // Deletes c->next
  19. RemoveFreeChunkFromBin(c->next);
  20. Merge(h, ChunkFromHandle(h)->next);
  21. }
  22. }
  23. // If the previous chunk is free, coalesce the two
  24. c = ChunkFromHandle(h);
  25. if (c->prev != kInvalidChunkHandle) {
  26. Chunk* cprev = ChunkFromHandle(c->prev);
  27. if (!cprev->in_use()) {
  28. // VLOG(8) << "Chunk at " << c->ptr << " merging into c->prev "
  29. // << cprev->ptr;
  30. chunk_to_reassign = c->prev;
  31. // Deletes c
  32. RemoveFreeChunkFromBin(c->prev);
  33. Merge(ChunkFromHandle(h)->prev, h);
  34. c = ChunkFromHandle(h);
  35. }
  36. }
  37. InsertFreeChunkIntoBin(chunk_to_reassign);
  38. }
  • Merge
  1. // Merges h1 and h2 when Chunk(h1)->next is h2 and Chunk(h2)->prev is c1.
  2. // We merge Chunk(h2) into Chunk(h1).
  3. void BFCAllocator::Merge(BFCAllocator::ChunkHandle h1,
  4. BFCAllocator::ChunkHandle h2) {
  5. Chunk* c1 = ChunkFromHandle(h1);
  6. Chunk* c2 = ChunkFromHandle(h2);
  7. // We can only merge chunks that are not in use.
  8. CHECK(!c1->in_use() && !c2->in_use());
  9. // c1's prev doesn't change, still points to the same ptr, and is
  10. // still not in use.
  11. // Fix up neighbor pointers
  12. //
  13. // c1 <-> c2 <-> c3 should become
  14. // c1 <-> c3
  15. BFCAllocator::ChunkHandle h3 = c2->next;
  16. c1->next = h3;
  17. CHECK(c2->prev == h1);
  18. if (h3 != kInvalidChunkHandle) {
  19. BFCAllocator::Chunk* c3 = ChunkFromHandle(h3);
  20. c3->prev = h1;
  21. }
  22. // Set the new size
  23. c1->size += c2->size;
  24. DeleteChunk(h2);
  25. }

参考资料:

原文:https://blog.csdn.net/qq_33096883/article/details/77479647