WechatIMG44.jpeg

Hash函数

  • 数组是按顺序排列,因此按下标可以快速查找取值,但插入、删除要移动整个数组,非常耗时
  • 链表是按链式存储,插入、删除很方便,但查找具体某一个值很耗时
  • Hash函数则结合了二者的优点,通过取余或者除 等操作快速定位下标,然后出现hash冲突时用链式存取

image.png

LLDB调试

首先建立Person类,获取其Class

  1. LGPerson *p = [LGPerson alloc];
  2. Class pClass = [LGPerson class];

通过LLDB,得到cache_t结构,cache_t是缓存的话,一般我们要找到其缓存的SELIMP

  1. (lldb) p/x pClass
  2. (Class) $0 = 0x0000000100008400 LGPerson
  3. (lldb) p (cache_t *)0x0000000100008410 // 位移16 0x10
  4. (cache_t *) $1 = 0x0000000100008410
  5. (lldb) p *$1
  6. (cache_t) $2 = {
  7. _bucketsAndMaybeMask = {
  8. std::__1::atomic<unsigned long> = {
  9. Value = 4436902816
  10. }
  11. }
  12. = {
  13. = {
  14. _maybeMask = {
  15. std::__1::atomic<unsigned int> = {
  16. Value = 0
  17. }
  18. }
  19. _flags = 32808
  20. _occupied = 0
  21. }
  22. _originalPreoptCache = {
  23. std::__1::atomic<preopt_cache_t *> = {
  24. Value = 0x0000802800000000
  25. }
  26. }
  27. }
  28. }

cache_t源码中,看到了一个bucket_t的结构体,点进去查看其结构,即包含了SELIMP

  1. #if __arm64__
  2. explicit_atomic<uintptr_t> _imp;
  3. explicit_atomic<SEL> _sel;
  4. #else
  5. explicit_atomic<SEL> _sel;
  6. explicit_atomic<uintptr_t> _imp;
  7. #endif

cache_t数据结构

  • cache存储的是cache_t地址指针,即首地址指针

未命名文件-6.jpg

  1. _bucketsAndMaybeMask同时存储了buckets和mask,方便存取
    1. struct cache_t {
    2. private:
    3. explicit_atomic<uintptr_t> _bucketsAndMaybeMask;
    4. union {
    5. struct {
    6. explicit_atomic<mask_t> _maybeMask;
    7. #if __LP64__ //代表Macos或Unix架构 X86_64
    8. uint16_t _flags;
    9. #endif
    10. uint16_t _occupied;
    11. };
    12. explicit_atomic<preopt_cache_t *> _originalPreoptCache;
    13. };
    14. ...
    15. }

接着进行上面的LLDB调试,发现执行p $2._bucketsAndMaybeMask()等语法都无法进行下一步调试,这时候就要去源码里找方法,在源码中有如下方法返回结构体指针

  1. struct bucket_t *buckets() const;

因此,可以p $2.buckets(),此时得到结果Value都是nil,因为没有调用方法。

  1. (lldb) p $2.buckets()
  2. (bucket_t *) $3 = 0x000000010875c3a0
  3. (lldb) p *$3
  4. (bucket_t) $4 = {
  5. _sel = {
  6. std::__1::atomic<objc_selector *> = (null) {
  7. Value = nil
  8. }
  9. }
  10. _imp = {
  11. std::__1::atomic<unsigned long> = {
  12. Value = 0
  13. }
  14. }
  15. }

在调用方法后, cache_t有值,但是buckets没有值,因为bucketshash,无序的,其0位不一定有值,经测试在p $10.buckets()[6]后拿到含值得数据结构

  1. (lldb) p *$8
  2. (cache_t) $10 = {
  3. _bucketsAndMaybeMask = {
  4. std::__1::atomic<unsigned long> = {
  5. Value = 4442067904
  6. }
  7. }
  8. = {
  9. = {
  10. _maybeMask = {
  11. std::__1::atomic<unsigned int> = {
  12. Value = 7
  13. }
  14. }
  15. _flags = 32808
  16. _occupied = 1
  17. }
  18. _originalPreoptCache = {
  19. std::__1::atomic<preopt_cache_t *> = {
  20. Value = 0x0001802800000007
  21. }
  22. }
  23. }
  24. }
  25. (lldb) p $10.buckets()
  26. (bucket_t *) $11 = 0x0000000108c493c0
  27. (lldb) p *$11
  28. (bucket_t) $12 = {
  29. _sel = {
  30. std::__1::atomic<objc_selector *> = (null) {
  31. Value = nil
  32. }
  33. }
  34. _imp = {
  35. std::__1::atomic<unsigned long> = {
  36. Value = 0
  37. }
  38. }
  39. }
  40. (lldb) p $10.buckets()[6]
  41. (bucket_t) $19 = {
  42. _sel = {
  43. std::__1::atomic<objc_selector *> = "" {
  44. Value = ""
  45. }
  46. }
  47. _imp = {
  48. std::__1::atomic<unsigned long> = {
  49. Value = 47248
  50. }
  51. }
  52. }

接着执行源码中的sel()方法,拿到saySomething函数, imp函数得到其实现

  1. (lldb) p $19.sel()
  2. (SEL) $20 = "saySomething"
  3. (lldb) p $19.imp(nil, pClass)
  4. (IMP) $21 = 0x0000000100003c90 (KCObjcBuild`-[LGPerson saySomething])

LLDB在调试过程中出错的话,就要重新来一次,很不方便,我们可以模拟底层自建类_objc_class

  1. typedef uint32_t mask_t; // x86_64 & arm64 asm are less efficient with 16-bits
  2. struct kc_bucket_t {
  3. SEL _sel;
  4. IMP _imp;
  5. };
  6. struct kc_cache_t {
  7. struct kc_bucket_t *_bukets; // 8
  8. mask_t _maybeMask; // 4
  9. uint16_t _flags; // 2
  10. uint16_t _occupied; // 2
  11. };
  12. struct kc_class_data_bits_t {
  13. uintptr_t bits;
  14. };
  15. // cache class
  16. struct kc_objc_class {
  17. Class isa;
  18. Class superclass;
  19. struct kc_cache_t cache; // formerly cache pointer and vtable
  20. struct kc_class_data_bits_t bits;
  21. };

pClass赋值给自定义的kc_objc_class, 并循环打印buckets

  1. LGPerson *p = [LGPerson alloc];
  2. Class pClass = p.class; // objc_clas
  3. [p say1];
  4. [p say2];
  5. [p say3];
  6. [p say4];
  7. [p say1];
  8. [p say2];
  9. [pClass sayHappy];
  10. struct kc_objc_class *kc_class = (__bridge struct kc_objc_class *)(pClass);
  11. NSLog(@"%hu - %u",kc_class->cache._occupied,kc_class->cache._maybeMask);
  12. for (mask_t i = 0; i<kc_class->cache._maybeMask; i++) {
  13. struct kc_bucket_t bucket = kc_class->cache._bukets[i];
  14. NSLog(@"%@ - %pf",NSStringFromSelector(bucket._sel),bucket._imp);
  15. }

打印结果如下,4-7是什么意思?答案就在底层源码

  1. LGPerson say : -[LGPerson say1]
  2. LGPerson say : -[LGPerson say2]
  3. LGPerson say : -[LGPerson say3]
  4. LGPerson say : -[LGPerson say4]
  5. LGPerson say : -[LGPerson say1]
  6. LGPerson say : -[LGPerson say2]
  7. 4 - 7

在函数调用时,底层会调用void insert(SEL sel, IMP imp, id receiver),即对消息接受者插入一个sel和imp

  1. 初始occupied为0,第一次调用时:
    1. occupied值变为1,同时capacity容积为1<<2等于4,然后创建一个桶,将桶的指针存储在cache_t中
  2. occupied每次进来都会+1,当newOccpied+1小于最大容积的3/4时,进行两倍扩容。
    1. 扩容首先开辟2倍内存,比如原来是3(原有内存已经开辟好,不能修改),那么新开辟8大小,数组复制非常消耗性能,通过垃圾袋将3(原来)中的buckets回收,再将新的bucket缓存到新开辟的内存中,再次调用被回收的方法时,会重新hash插入桶子
  3. 通过cache_hash得到下标位置,do while循环找空地址插入桶子

    1. void cache_t::insert(SEL sel, IMP imp, id receiver)
    2. {
    3. runtimeLock.assertLocked();
    4. // Use the cache as-is if until we exceed our expected fill ratio.
    5. mask_t newOccupied = occupied() + 1; // occupied初始为0, 调用时+1
    6. unsigned oldCapacity = capacity(), capacity = oldCapacity;
    7. if (slowpath(isConstantEmptyCache())) { // 第一次调用
    8. // Cache is read-only. Replace it.
    9. if (!capacity) capacity = INIT_CACHE_SIZE; // 1<<2 得到4
    10. reallocate(oldCapacity, capacity, /* freeOld */false); //创建一个桶, 将桶的指针存在cache_t中
    11. }
    12. else if (fastpath(newOccupied + CACHE_END_MARKER <= cache_fill_ratio(capacity))) { //保证缓存表存的方法数量小于等于容量的3/4
    13. // Cache is less than 3/4 or 7/8 full. Use it as-is.
    14. }
    15. #if CACHE_ALLOW_FULL_UTILIZATION
    16. else if (capacity <= FULL_UTILIZATION_CACHE_SIZE && newOccupied + CACHE_END_MARKER <= capacity) {
    17. // Allow 100% cache utilization for small buckets. Use it as-is.
    18. }
    19. #endif
    20. else { // 4 * 2 = 8
    21. capacity = capacity ? capacity * 2 : INIT_CACHE_SIZE; //两倍扩容
    22. if (capacity > MAX_CACHE_SIZE) {
    23. capacity = MAX_CACHE_SIZE;
    24. }
    25. reallocate(oldCapacity, capacity, true);
    26. }
    27. bucket_t *b = buckets();
    28. mask_t m = capacity - 1; // 4 - 1 = 3
    29. mask_t begin = cache_hash(sel, m); // hash得到开始地址
    30. mask_t i = begin;
    31. // Scan for the first unused slot and insert there.
    32. // There is guaranteed to be an empty slot.
    33. do { // 循环插入SEL和IMP
    34. if (fastpath(b[i].sel() == 0)) {
    35. incrementOccupied();
    36. b[i].set<Atomic, Encoded>(b, sel, imp, cls());
    37. return;
    38. }
    39. if (b[i].sel() == sel) {
    40. // The entry was added to the cache by some other thread
    41. // before we grabbed the cacheUpdateLock.
    42. return;
    43. }
    44. } while (fastpath((i = cache_next(i, m)) != begin));
    45. bad_cache(receiver, (SEL)sel);
    46. #endif // !DEBUG_TASK_THREADS
    47. }

    总结

    ```objectivec cache_t 的工作流程

当前查找的 IMP 没有被缓存,调用 reallocate 方法进行创建-分配内存,然后使用 bucket 的 set 方法进行填充缓存。

当前查找的 IMP 已经被缓存了,然后判断缓存容量是否已经达到 3/4 的临界点。

如果已经到了临界点,则需要进行扩容,扩容大小为原来缓存大小的 2 倍。扩容后处于效率的考虑,会清空之前的内容,然后把当前要查找的 IMP 通过 bucket 的 set 方法缓存起来。

如果没有到临界点,那么直接进行缓存。 ``` Cooci 关于Cache_t原理分析图.png