Go language scheduler source code scenario analysis ten: thread local storage

The following content is reproduced from  https://mp.weixin.qq.com/s/-tiXJpH0IrJw-RH4x5SRdQ

Awa love to write original programs Zhang  source Travels  2019-04-27

This article is the tenth section of the preliminary knowledge of the first chapter of the "Go Scheduler Source Code Scenario Analysis" series, and it is also the last section of the preliminary knowledge.

Thread local storage is also called thread local storage. Its English name is Thread Local Storage, or TLS for short . It seems to be a very big thing, but it is actually a global variable that is private to the thread.

Readers who have multi-threaded programming must know that ordinary global variables are shared in multiple threads. One thread modifies it, and all threads can see the modification. However, thread-private global variables are different from ordinary global variables. , Thread private global variables are the private property of the thread. Each thread has its own copy. Modifications made by a thread will only be modified to its own copy, and not to the copies of other threads.

Let's use an example to illustrate the difference between multi-threaded shared global variables and thread private global variables, and make a simple analysis of gcc's thread-local storage.

First look at ordinary global variables

#include <stdio.h> 
#include <pthread.h> 

int g = 0; // 1, define global variable g and assign initial value 0 

void *start(void *arg) 
{ 
printf("start, g[%p ]: %d\n", &g, g); // 4, print the address and value of the global variable g in the child thread 

g++; // 5, modify the global variable 

return NULL; 
} 

int main(int argc, char *argv []) 
{ 
pthread_t tid; 

g = 100; // 2. The main thread assigns a value of 100 to the global variable g 

pthread_create(&tid, NULL, start, NULL); // 3. Create a child thread and execute the start() function 
pthread_join(tid , NULL); // 6, wait for the end of the child thread run 

printf("main, g[%p]: %d\n", &g, g); // 7, print the address and value of the global variable g 

return 0; 
}

To explain briefly, this program defines a global variable g in note 1 and sets its initial value to 0. After the program runs, the main thread first changes g to 100 (note 2), and then creates a sub-thread to execute start () function (note 3), start() function first prints the value of g (note 4) make sure that the main thread can see the modification of g in the child thread, and then modify the value of g (note 5) after the thread ends running After the main thread waits for the end of the sub-thread at note 6, print the value of g in note 7 to confirm that the modification of g by the sub-thread can also affect the reading of g by the main thread.

Compile and run the program:

bobo@ubuntu:~/study/c$ gcc thread.c -o thread -lpthread
bobo@ubuntu:~/study/c$ ./thread
start, g[0x601064] : 100
main, g[0x601064] : 101

It can be seen from the output result that the address of the global variable g in the two threads is the same, any thread can read the modification of the global variable g by the other thread, which realizes multiple threads of the global variable g Sharing in.

After understanding the common global variables, let's look at the thread-private global variables implemented by thread local storage (TLS). This program is almost the same as the above program, the only difference is that the __thread keyword is added when the global variable g is defined, so that g becomes a thread-private global variable.

#include <stdio.h> 
#include <pthread.h> 

__thread int g = 0; // 1. Here the __thread keyword is added to define g as a private global variable. Each thread has a g variable 

void *start(void *arg) 
{ 
printf("start, g[%p]: %d\n", &g, g); // 4, print the address and value of the private global variable g of this thread 

g++; // 5 , Modify the value of the private global variable g of this thread 

return NULL; 
} 

int main(int argc, char *argv[]) 
{ 
pthread_t tid; 

g = 100; // 2, the main thread assigns the value of 100 to the private global variable 

pthread_create(&tid , NULL, start, NULL); // 3, create a child thread and execute the start() function 
pthread_join(tid, NULL); // 6, wait for the end of the child thread 

printf("main, g[%p]: %d\ n", &g, g); // 7, print the address and value of the private global variable g of the main thread 

return 0; 
}

Run the program to see the effect:

bobo@ubuntu:~/study/c$ gcc -g thread.c -o thread -lpthread
bobo@ubuntu:~/study/c$ ./thread
start, g[0x7f0181b046fc] : 0
main, g[0x7f01823076fc] : 100

It can be seen from the output results: First, the address of the global variable g in the two threads is not the same; secondly, the value assigned by the main function to the global variable g does not affect the value of g in the child thread, and the child thread pairs g All changes have been made, and the value of g in the main thread is not affected. This result is exactly what we expect. This shows that each thread has its own private global variable g.

This seems amazing. It is obvious that both threads use the same global variable name to access variables but they are as if they are accessing different variables.

Let's analyze what black magic gcc uses to achieve this feature. How do we start researching features like this implemented by the compiler? The fastest and most direct way is to use debugging tools to debug the operation of the program, here we use gdb to debug.

bobo@ubuntu:~/study/c$ gdb ./thread

First, place a breakpoint on the 20th line of the source code (corresponding to g = 100 in the source code), and then run the program. The program stops at the breakpoint. Disassemble the main function:

(gdb) b thread.c:20
Breakpoint 1 at 0x400793: file thread.c, line 20.
(gdb) r
Starting program: /home/bobo/study/c/thread

Breakpoint 1, at thread.c:20
20g = 100;
(gdb) disass
Dump of assembler code for function main:
  0x0000000000400775 <+0>:push   %rbp
  0x0000000000400776 <+1>:mov   %rsp,%rbp
  0x0000000000400779 <+4>:sub   $0x20,%rsp
  0x000000000040077d <+8>:mov   %edi,-0x14(%rbp)
  0x0000000000400780 <+11>:mov   %rsi,-0x20(%rbp)
  0x0000000000400784 <+15>:mov   %fs:0x28,%rax
  0x000000000040078d <+24>:mov   %rax,-0x8(%rbp)
  0x0000000000400791 <+28>:xor   %eax,%eax
=> 0x0000000000400793 <+30>:movl   $0x64,%fs:0xfffffffffffffffc
  0x000000000040079f <+42>:lea   -0x10(%rbp),%rax
  0x00000000004007a3 <+46>:mov   $0x0,%ecx
  0x00000000004007a8 <+51>:mov   $0x400736,%edx
  0x00000000004007ad <+56>:mov   $0x0,%esi
  0x00000000004007b2 <+61>:mov   %rax,%rdi
  0x00000000004007b5 <+64>:callq 0x4005e0 <pthread_create@plt>
  0x00000000004007ba <+69>:mov   -0x10(%rbp),%rax
  0x00000000004007be <+73>:mov   $0x0,%esi
  0x00000000004007c3 <+78>:mov   %rax,%rdi
  0x00000000004007c6 <+81>:callq 0x400620 <pthread_join@plt>
  0x00000000004007cb <+86>:mov   %fs:0xfffffffffffffffc,% eax 
  0x00000000004007d3 <+94>: mov% eax,% esi
  0x00000000004007d5 <+96>:mov   $0x4008df,%edi
  0x00000000004007da <+101>:mov   $0x0,%eax
  0x00000000004007df <+106>:callq 0x400600 <printf@plt>
  ......

The program stops at the line g = 100, take a look at the assembly instructions,

=> 0x0000000000400793 <+30>:movl   $0x64,%fs:0xfffffffffffffffc

This assembly instruction means to copy the constant 100 (0x64) to the memory at the address %fs:0xfffffffffffffffc, it can be seen that the address of the global variable g is %fs:0xfffffffffffffffffc, fs is the segment register, and 0xfffffffffffffffc is the signed number- 4. So the address of the global variable g is:

fs segment base address-4

Earlier when we talked about segment registers, we said that the segment base address is the starting address of the segment. In order to verify that the address of g is indeed the base address of the fs segment-4, we need to know what the base address of the fs segment is, although we can use the gdb command to view The value of the fs register, but the segment selector is stored in the fs register instead of the starting address of the segment. In order to get the base address, we need to add a little code to get it. The modified code is as follows:

#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <asm/prctl.h>
#include <sys/prctl.h>

__thread int g = 0;

void print_fs_base()
{
       unsigned long addr;
       int ret = arch_prctl(ARCH_GET_FS, &addr);  //获取fs段基地址
       if (ret < 0) {
               perror("error");
               return;
      }

       printf("fs base addr: %p\n", (void *)addr); //打印fs段基址

       return;
}

void *start(void *arg)
{
   print_fs_base(); //子线程打印fs段基地址
printf("start, g[%p] : %d\n", &g, g);

g++;

return NULL;
}

int main(int argc, char *argv[])
{
pthread_t tid;

g = 100;

pthread_create(&tid, NULL, start, NULL);
pthread_join(tid, NULL);

   print_fs_base(); //main线程打印fs段基址
printf("main, g[%p] : %d\n", &g, g);

return 0;
}

In the code, both the main thread and the sub-thread call the print_fs_base() function to print the base address of the fs segment. Run the program to see:

fs base addr: 0x7f36757c8700
start, g[0x7f36757c86fc] : 0
fs base addr: 0x7f3675fcb700
main, g[0x7f3675fcb6fc] : 100

can be seen:

  • The base address of the sub-thread fs segment is 0x7f36757c8700, and the address of g is 0x7f36757c86fc, which happens to be the base address -4

  • The base address of the fs segment of the main thread is 0x7f3675fcb700, and the address of g is 0x7f3675fcb6fc, which is also the base address -4

It can be concluded that the gcc compiler (in fact, thread library and kernel support) uses the CPU's fs segment register to implement thread local storage . The base address of the fs segment in different threads is different, so it seems to be the same A global variable has different memory addresses in different threads, realizing thread-private global variables.

Here we briefly analyze the implementation of thread-local storage by gcc under the AMD64 Linux platform. In the following chapters, we will also see how the go runtime uses thread-local storage to associate running goroutines with worker threads.

At this point, the main content of the first part of the preliminary knowledge has been introduced. We started with assembly instructions and discussed registers, memory, stack, function call process, operating system kernel scheduling of threads and thread local storage, etc. We believe that readers have a good grasp of these basic knowledge, then let us Let's lift the mystery of the goroutine scheduler together!


Finally, if you think this article is helpful to you, please help me click on the “Looking” at the bottom right corner of the article or forward it to the circle of friends, thank you very much!

image

Guess you like

Origin blog.csdn.net/pyf09/article/details/115238634