What exactly is a Block, let's start with the C++ code
Start with the simplest block structure
clang -rewrite-objc main.m -o main.cpp && open main.cpp
To make it easier to read, let's simplify the code
In order to facilitate further reading, the naming is simplified here, refer to the following simple process
-
Combined with clang to compile the intermediate C++ code, through the creation of block, combined with the above picture, first outline a sketch in mind
-
Create a two-tier structure
-
BlockCreate structure
-
Block structure, a member of BlockCreate
-
-
Construct parameters through BlockCreate, instantiate BlockCreate member Block::block
-
The final return is a BlockCreate structure pointer
-
Through the first address of the BlockCreate structure, we can get the member Block::block, and the first address of BlockCreate is the same as the first address of the member block, because the block is located at the beginning of the BlockCreate memory space
Since you can get the first address (member block address), you can also get the address of the member Desc through the memory offset
-
By getting the address of the member Block::block, you can call the member method FuncPtr of Block::block, and FuncPtr is precisely the entry address of the fun function assigned when the Block::block member is instantiated through the BlockCreate construct.
-
-
Be sure to understand this two-layer structure. Although it is not the real source code, it is very helpful for us to analyze the source code later.
The previous example does not use variables, we can operate it again in the previous way, and compare the difference
When accessing a local variable outside the custom structure
You will find that the c++ code generated by clang has changed. Compare the instantiation process above.
-
There is one more member int a in the BlockCreate structure
-
The BlockCreate construction also has an additional parameter
-
The variable a is accessed inside the func function through BlockCreate::*self of func(BlockCreate *self) to get a copy
-
You will find that there are 3 places where the variable a exists
-
The local variable a inside the main function
-
BlockCrete 结构体内的成员变量a
-
func方法内部的局部变量a
其实这3个变量a分别是3个不同的变量了
-
把局部变量a改为static修饰,继续clang c++查看
用static修饰变量a,不一样了
BlockCreate构造传参,此时传递的是 a的地址,而BlockCreate成员 a也变成了 指针, func内部的局部变量a 也变成了 指针,func内部的a是通过 BlockCreate::*self 的指针a 赋值 给func内部的局部变量 指针a
所以static修饰a后,func内部访问的a其实还是 main函数内部的 指针a
把局部变量a改为 __block修饰,继续clang c++查看
希望你不会觉得懵,这次复杂了些
-
出现了一个结构 __Block_byref_a_0
-
BlockCreate 成员Desc的结构内部多了两个 函数 copy & dispose
这里简单解释下
-
普通的局部变量a 变成了一个结构 __Block_byref_a_0, a是这个结构的成员
-
成员 void *__isa
-
成员 __block_byref_a_0 *__forwarding;
-
成员 int __flags;
-
成员 int __size
-
成员 int a
在main里声明的__block修饰的局部变量, 地址赋值给了 __forwarding, 值赋给了 Block_byref结构里的成员a,注意这个设定, 虽然成员也叫a,只是起到一个接收值的作用,关键在于__forwarding 拿到了原来的a的指针
先看下__block修饰的a究竟是怎么访问的
-
__forwarding 类型 __Block_byref_a_0 *,类似于链表节点,所以也是一个指向 __Block_byref_a_0 结构的指针 至于有什么用,暂存疑,后面源码接着分析
对比着看,其实很明显,不难理解
block源码 - libclosure-79 查看
源码入口该怎么查看呢,我们先通过汇编看下
既然retainBlock,说明block开辟了空间,进入查看
继续跳转 br x16
目前找到了_Block_copy这样一个符号,然后进入源码查看
你会看到一个结构Block_layout
Block_layout 就是前面通过clang c++代码 分析出的 两层结构BlockCreate成员 Block::block
__block 修饰变量 测试代码放进 block源码进行调试
这段代码是在block源码中测试的
这其实就是依照Block_layout 栈上的空间结构,在堆区创建了一个Block_layout结构
同时 新开辟的Block_layout结构->invoke 从原来栈上Block_layout->invoke拷贝过来
既然是堆上开辟空间创建的Block_layout结构,自然isa 指向 _NSConcreteMallocBlock (堆block)
block分析源码遇到问题
现在还有两块没探索到源码,就是 前面通过clang 编译生成的c++代码中__Block_byref_a_0这样的结构,还有一块是BlockCreate构造逻辑部分
那么接下来该何去何从?
我选择最原始的方式 汇编 + 下符号断点 + 结合clang c++代码分析
先把代码断到此处,防止dyld其他流程干扰
下符号断点 同时把前面分析过的 _Block_copy 符号也下下来,为了方便分析流程
跟着调试 进入 _Block_object_dispose:
回到之前clang编译出的c++代码看下
既然下到了符号_Block_object_dispose 那么同样也把符号 _Block_object_copy下下来继续调试
没有的话 就试试 _Block_object_assign, 之所以没有找到 _Block_object_copy符号,是因为那是由编译器决定的
成功断点符号 _Block_object_assign
找到头绪,自然我们又回到了源码
-
看下源码注释
When Blocks or Block_byrefs hold objects then their copy routine helpers use this entry point to do the assignment.
当Blocks(可以理解为前面的有成员func的那个结构) 或者 Block_byref持有对象时候,这个入口就会被触发 执行赋值操作
-
__block int a = 10 类型为 BLOCK_FIELD_IS_BYREF | BLOCK_FIELD_IS_WEAK or BLOCK_FIELD_IS_BYREF
执行 _Block_byref_copy()
_Block_byref_copy
在分析_Block_byref_copy流程之前,我们需要了解下Block_byref 是什么
从前面clang编译拿到的c++代码,可以看到,Block_byref 是对常规变量的封装,封装结构里还多了isa,__forwarding成员
源码中还存在 Block_byref_2 Block_byref_3 两个结构,暂且不表,后面会继续说明
我们可以做个假设,目前我们测试的实例 是block引用外部 __block修饰的变量,我们也是这么用的,既然block内部访问外部变量,那么也会对于这个变量的引用计数产生影响 flags
就是存储引用计数的
_Block_byref_copy翻译
如果源byref结构已经在heap上,则不需要执行拷贝,引用计数+1
中间有一段内存偏移的代码,还没解析,继续
从源码中我们看到
Block_byref_2 *src2 = src + 1
Block_byref_3 *src3 = src2 + 2
那么 Block_byref Block_byref_2 Block_byref_3 是连续的内存结构,根据条件判断是否解析 Block_byref_2 Block_byref_3
认知遗留问题
找遍了源码 clang编译出的c++代码里 __main_block_impl_0 这样的结构并没有发现
byref_keep byref_destroy 究竟实现了什么功能
因为我们用的常规变量a测试 我们换成object看下
将变量a换为object测试
clang c++代码
从源码得知
编译阶段,Block_byref结构 flag被设置为 1 << 25, 标识是有 Block_byref_2结构的
131有什么意义
两个参数 + 40 什么意思
按照编译的逻辑,byref_keep 就是 object类型的对象的 拷贝
但是运行时会做修正 流程有差别
同样 byref_destroy:
以上为 Block_byref 逻辑,再通过clang得到的c++ 看下 Block_layout 的处理
再确认下 __block修饰的 object对象,在block体里 究竟是如何访问的
总结
-
__block 修饰变量之后,编译器会在栈上构建一个 栈Block_byref(包含变量指针)
-
定义block,可以理解为编译器生成一个中间结构BlockCreate(这个名字是特意起的,知道是个结构,为了便于理解,你可以这么理解)
- 同时编译器会在栈上初始化构建一个 栈Block_layout(包含func成员)
-
执行BlockCreate构造方法
-
通过Block_layout首地址偏移 得到 Block_copy函数地址, 执行Block_copy,把 栈Block_byref 拷贝 到堆Block_byref
-
构造参数 栈Block_byref,通过Block_byref首地址偏移 得到 Block_byref_2(包含_Block_byref_copy 即byref拷贝函数)首地址, 执行 _Block_byref_copy函数, 把栈Block_byref 拷贝到 堆Block_byref
-
继续上一步的位置 内存偏移 8字节,得到堆上开辟的 object内存空间首地址, 这里当然就存放 object对象了
-
需要注意的一个细节
栈Block_byref 拷贝到 堆Block_byref之后,由于堆上是新的内存空间,那么栈与堆不就两个空间了吗,如何保障访问的是同一块内存?I think the solution is to point the forwarding in the stack Block_byref and heap Block_byref to the heap Block_byref after copying, that is, the heap forwarding points to itself again
After __block modifies a variable, whether it is accessing the variable within the block block or accessing the variable outside the block, it accesses the heap space through forwarding, and then accesses the variable in the target space, thus ensuring that the accessed variable is the same piece of memory space
-
-
The life cycle of the variable held by Block_byref ends, execute _Block_object_dispose
- Execute the _Block_byref_release function, find the first address of Block_byref_2 according to the offset of the first address of Block_byref, continue to offset 8 bytes to get byref_destroy Execute the destructor to reclaim the heap memory space
-
Block_layout scope ends or life cycle ends, execute _Block_release
-
Find the first address of Block_descriptor_2 according to the first address offset of Block_layout, continue to offset 8 bytes, and then dispose executes the destructor to reclaim the Block_layout heap memory space opened on the heap
-
read register view
Symbolic breakpoint_Block_copy
Before _Block_copy is executed, register rax receives parameters (arm64 reads register x1)
After execution, ret returns, and the rax register stores the return value
- The variable a is changed to __block modification
Because of the __block modification, the copy function address appears in Block_layout, and through copy, _Block_copy is executed
Without __block modification, there is no copy dispose function, and _Block_copy is executed by default
This difference is caused by the difference in flags when constructing parameters. Before __block modification, it is 0, and after __block modification, 1 << 25