Andrew's Blog / "Working With Ruby Threads" study notes

Introduction

why care?

  • On multi-core CPU, code must be built on in order to take advantage of multicore architectures run faster

The promise of multi-threading

  • Copy memory multi-process, multi-threaded shared memory
  • Multithreading is smaller than the multi-process overhead, multiple threads can have more concurrent unit
  • It must be based on multi-threaded thread-safe

第1章: You’re Always in a Thread

$ irb
> Thread.main
=> #<Thread:0x007fdc830677c0 run>
> Thread.current == Thread.main => true
  • Thread.main Always point to the main thread
  • The main thread exits, other threads will be terminated, and the process will exit ruby
$ irb
> Thread.main
=> #<Thread:0x007fdc830677c0 run>
> Thread.current == Thread.main => true

第2章:Threads of Execution

Shared address space

  • Threads share a scope
  • All ruby ​​thread is mapped to a native, operating system thread
$ top -l1 -pid 8409 -stats pid,th

The above command to view the number of threads for the process id of 8409

Non-deterministic context switching (non-deterministic context switching)

In order to provide fair access, the thread scheduler can 'pause' a thread at any time, suspending its current state in order to provide fair access, thread scheduling can "pause" a thread at any time, suspend its current status

||=Statement is not thread-safe, because the threads can be blocked at any time, if A thread running ||=won the initial value and pause, case B lost the thread assignment may occur

# This statement
results ||= Queue.new

# when broken down, becomes something like
if @results.nil? 
 temp = Queue.new 
 @results = temp
end

A race condition involves two threads racing to perform an operation on some shared state. A race condition in the shared state, comprises two threads compete to perform a similar operation

  • 重要原则:Any time that you have two or more threads trying to modify the same thing at the same time, you’re going to have issues.
  • This is because the thread scheduler can interrupt a thread at any time.

For two important principles of the policy:

1) don’t allow concurrent modification 2) protect concurrent modification

Chapter III: Lifecycle of a Thread

Thread.new

Thread.new { ... }
Thread.fork { ... } 
Thread.start(1, 2) { |x, y| x + y }

Thread.new aliases method

Thread#join

  • Once you’ve spawned a thread, you can use #join to wait for it to finish
  • Without #join, the main thread would exit before the sub-thread can execute its block. Using #join provides a guarantee in this situation.
  • Calling #join on the spawned thread will join the current thread of execution with the spawned one
  • Use #join when the exception was thrown when #join, the main thread will execute but will not immediately output

Thread#status

Thread#value Several possible values:

  • run: Threads currently running have this status.
  • sleep: Threads currently sleeping, blocked waiting for a mutex, or waiting on IO, have this status (current thread to sleep, block waiting for a synchronization lock, or IO).
  • false: Threads that finished executing their block of code, or were successfully killed, have this status (thread has finished, successfully or be killed).
  • nil: Threads that raised an unhandled exception have this status. wwrt 33 (thread throws an unhandled exception, will return to the state)
  • aborting: Threads that are currently running, yet dying, have this status (currently running thread, but it died a)

Thread.stop

  • This method will make the thread into the sleep state, and then tell the thread scheduler to execute another thread
  • Thread has been in a sleep state, until the call Thread#wakeup
require 'thread'

thread = Thread.new do
  Thread.stop
  puts "Hello there"
end

# wait for the thread trigger its stop
puts "----" until thread.status == 'sleep'

thread.wakeup
thread.join

# 输出------
# ----
# ----
...
# ----
# ----
# ----
# Hello there
# [Finished in 1.6s]

Thread.pass

Thread.passSimilar to Thread.stopbut he just let the thread scheduler to schedule another thread, the current thread is not in sleep

Avoid Thread#raise

  • This method is not recommended, because there will be serious problems

Avoid Thread#kill

  • We do not recommend using this method, with Thread#raisethe same, there will be serious problems

Chapter IV: Concurrent = Parallel!

  • concurrent and parallel are not the same thing
    1. Do multiple threads run your code concurrently? Yes.
    2. Do multiple threads run your code in parallel? Maybe.
  • Single-core CPU to perform multiple tasks concurrently, but not necessarily a fast order execution
  • Multi-core CPU to perform multiple tasks in parallel, but the problem can also occur because of a task, and then taken over by another thread or process
  • Parallel must be concurrent, not necessarily parallel concurrent

You can’t guarantee anything will be parallel

  • making it execute in parallel is out of your hands. That responsibility is left to the underlying thread scheduler (you personally make parallel execution, but the specific responsibility of parallel to the underlying scheduler is executed)
  • Multi-core CPU systems to perform multi-threaded program, there may be executed in one CPU core, which is determined by the thread scheduler
  • Thread using fair queuing manner, all threads are more or less use of available resources, but can not have the code to determine

Further reading

  • https://blog.golang.org/concurrency-is-not-parallelism
  • https://blog.engineyard.com/2011/ruby-concurrency-and-you

Chapter V: The GIL and MRI

  • MRI allows concurrent execution of Ruby code, but prevents parallel execution of Ruby code

The global lock

GITGlobal Interpreter LockAlias: GVL (Global VM Lock), ,Global Lock

  • Each MRI processes are only one GIL, more than one process has its ownGIL
  • The process of generating a plurality of threads that share GIL
  • ruby multithreading, a single thread can be obtained at any given time GIL, other threads need to wait for it to releaseGIT
  • MRI ruby ​​not be achieved in parallel
  • Even if there is no GILlanguage, such as JAVA, using multiple threads will need to access and modify the same public resources, if need parental lock control, we can not take advantage of multi-core parallel to
  • Use multiple processes to achieve parallelism, ruby ​​is a common way

The special case: blocking IO (special case: IO blocking)

  • There ruby GILprevents parallel execution, but IO 阻塞will releaseGIL
  • MRI does not let a thread hog the GIL when it hits blocking IO (when it triggers the blocking IO, MRI will not let greed take up the thread GIL)
  • Because there are ruby GIL, it is equal to the operating system in addition to the ability to perform parallel, but equal in all cases can not be parallelized

Code:chapter05/block_io_demo1.rb

require 'open-uri'
3.times.map do 
  Thread.new do
    open('http://zombo.com') 
  end
end.each(&:value)

Run the above code, we assume that all the threads have been generated, they are trying to get GILto execute code, Thread A get up GIL, it creates a socket and attempts to open a connection to zombo.comthis thread A is waiting for a response, and the release of GIL, thread B will receive GILand thread a and perform the same steps

Why?

There are three reasons that the GIL exists (GIL several reasons exist):

  1. In order to protect the core component of MRI under competitive conditions, competitive conditions can cause a lot of problems, this same problem occurs in MRI of the C core, the easiest way is to reduce the amount of competition, to prevent multiple threads to run simultaneously
  2. To facilitate the C extension API (C extension for ease of use API) as long as the C language code block uses the API extensions, GIL can block the operation of other code, because C extension may not thread-safe, to ensure the presence of thread safety GIL
  3. To reduce the likelihood of race conditions in your Ruby code (to reduce as much as possible race condition)

Misconceptions

Error 1: Myth: the GIL guarantees your code will be thread-safe.(GIL ensure your code is thread-safe)

  • This view is wrong
  • GIL only greatly reduce the possibility of parallel, but it does not prevent the occurrence of competitive conditions, so GIL will not guarantee thread safety

Code:chapter04/unsafe_counter.rb

counter = 0
5.times.map do
  Thread.new do
    temp = @counter

    # 加入以下这行,将会导致结果出错,因为 IO 阻塞时,线程会释放 GIL,导致两个线程的 @counter 值相同
    # puts  temp 
    temp = temp + 1
    @counter = temp
  end
end.each(&:join)
puts @counter

  • For the above code is equivalent to an increase @counter +=
  • There are two threads may also compete for @counter assignment, the results may be less than 5, there will be particularly at ambient JRuby and Rubinius have IO obstruction, or

Error 2:Myth: the GIL prevents concurrency

  • GIL prevents parallel (Parallel) ruby ​​code execution, but does not prevent concurrent, this is an error term
  • Concurrency is likely to occur, even in single-core CPU environment will allocate resources to each thread
  • Important point: GIL allows multiple threads simultaneously blocking IO, which means parallel in the case of `IO-bound` of executing code

第六章:Real Parallel Threading with JRuby and Rubinius

  • JRuby and Rubinius don't have a GIL JRuby and Rubinius no GIL

Proof

See the code chapter06/prime.rbcalculation primes, MRI is not fast JRuby and Rubinius

Using version 1.8.7

require 'benchmark'

def prime_sieve_upto(n)
  all_nums = (0..n).to_a
  all_nums[0] = all_nums[1] = nil
  all_nums.each do |p|

    #jump over nils
    next unless p

    #stop if we're too high already
    break if p * p > n

    #kill all multiples of this number
    (p*p).step(n, p){ |m| all_nums[m] = nil }
  end

  #remove unwanted nils
  all_nums.compact
end


primes = 1_000_000
iterations = 10
num_threads = 5
iterations_per_thread = iterations / num_threads

Benchmark.bm(15) do |x|
  x.report('single-threaded') do
    iterations.times do
      prime_sieve_upto(primes)
    end
  end
  x.report('multi-threaded') do
    num_threads.times.map do
      Thread.new do
        iterations_per_thread.times do
          prime_sieve_upto(primes)
        end
      end
    end.each(&:join)
  end
end

ree-1.8.7-2012.02

                     user     system      total        real
single-threaded  5.660000   0.060000   5.720000 (  5.725174)
multi-threaded   6.110000   0.110000   6.220000 (  6.228208)

MRI ruby 1.9.3-p551

                      user     system      total        real
single-threaded   3.450000   0.060000   3.510000 (  3.531772)
multi-threaded    3.660000   0.080000   3.740000 (  3.760532)

MRI ruby 2.0.0-p598

                      user     system      total        real
single-threaded   3.630000   0.080000   3.710000 (  3.726324)
multi-threaded    3.680000   0.090000   3.770000 (  3.808694)

MRI ruby 2.0.0-p648

                      user     system      total        real
single-threaded   3.210000   0.060000   3.270000 (  3.276048)
multi-threaded    3.330000   0.080000   3.410000 (  3.402474)

MRI ruby 2.1.0

                      user     system      total        real
single-threaded   2.360000   0.070000   2.430000 (  2.422242)
multi-threaded    2.390000   0.070000   2.460000 (  2.462325)

MRI ruby 2.2.3:

                      user     system      total        real
single-threaded   2.300000   0.070000   2.370000 (  2.361750)
multi-threaded    2.410000   0.080000   2.490000 (  2.482332)

jruby-9.0.4.0:

                      user     system      total        real
single-threaded   7.740000   0.280000   8.020000 (  2.676519)
multi-threaded   11.760000   0.230000  11.990000 (  3.064823)

MRI ruby ​​or has been in progress, Rubinius have not tested, very slow installed, hateful GFW

So… how many should you use?

Really applications may not be very clear, it may be somewhere IO-bound, somewhere is CPU-bound, it may not be, but memory-bound, or it could be anywhere also did not maximize consumption of resources

Examples of the rails application to:

  • Communication between the database and the client to communicate with an external service call, most of the opportunity is the emergence of IO-bound.
  • On the other hand calls the CPU, such as rendering HTML template, or convert data to JSON file

the only way to a surefire answer is to measure: Through different number of threads to run the code, and then analyze the results of measurements, not by measuring, we can not find the answer in the fight

Chapter VII: How Many Threads Are Too Many?

In order to benefit from concurrent, we have to split a problem into smaller tasks that can be run at the same time, if there is a problem integral and important task, you can not have more concurrent performance gain

  • It is the only guaranteed method of measure (measure) and compare (comparison):
  • The method is: try a thread on a single-core CPU, and then try five threads, comparing the results of the two, then it improved
  • Newcomers will generally solve tasks will think more threads will be faster

ALL the threads

1.upto(10_000) do |i|
  Thread.new { sleep }
  puts i
end

以上代码输出:
1
2
...
2043
2044
2045
2046
chapter06/spawning_threads.rb:2:in `initialize': can't create Thread: Resource temporarily unavailable (ThreadError)
  • This is because the thread of a ruby ​​process may have produced a number of hard limit
  • ree-1.8.7 as well as on the linux system, it can produce a thread at least 10,000, but we are not likely to use

Context Switching

  • Although each thread requires very little memory overhead, 4-core CPU can execute four threads in parallel, there will be a lot of threads blocking on IO and a large number of threads in an idle state, the thread scheduler overhead required to manage these threads
  • Although the need to increase spending, but produce more threads than the number of cores also makes sense, because IO-bound will release the GIL, allow parallel execution, and CPU-bound environment in the non-MRI can be performed in parallel, and single-core CPU also can achieve concurrent

IO-bound

  • If the code is executed external web service call requests, better network connection speed, the program will be faster
  • If the code is hard to read and write a large number of operations, there is support for faster reading and writing of the hard disk, the program will be faster because the hardware upgrade
  • Is the case where two or more IO-bound because of the need to wait for a response from the IO device to produce more than the kernel thread is significant

Code examples, see:./chapter06/io_bound.rb

  • If the IO operation delay is relatively high, we need more threads to solve the sweet spot, because the multi-thread, blocking the waiting time will be longer if the IO latency is low, then we need fewer threads to solve the sweet spot, because waiting less time, will soon release a thread

CPU-bound

… to be continued

Chapter VIII: Thread safety

What’s really at stake?

When your code is not thread-safe, the worst that can happen is that your underlying data becomes incorrect when your code is not thread-safe, which is the worst happens, you become incorrect basic data

  • If your code is ‘thread-safe,’ that means that you can run your code in a multi- threaded context and your underlying data will be safe.(基础数据安全)
  • If your code is ‘thread-safe,’ that means that you can run your code in a multi- threaded context and your underlying data remains consistent.(基础数据保持一致)
  • If your code is ‘thread-safe,’ that means that you can run your code in a multi- threaded context and the semantics of your program are always correct.(程序在语义上正确)

The computer is oblivious

  • The computer is unaware of thread-safety issues.

Is anything thread-safe by default?

  • any concurrent modifications to the same object are not thread- safe.

Chapter IX: Protecting Data with Mutexes

Mutual exclusion

  • If you wrap some section of your code with a mutex, you guarantee that no two threads can enter that section at the same time. (If you include a piece of code with a mutex, no two threads at the same time at the same time to enter)
  • Until the owning thread unlocks the mutex, no other thread can lock it (until the current thread belongs to unlock, no other thread can be locked)
# 通用的 mutex 使用方式
mutex.synchronize do 
  shared_array << nil
end

The contract

  • Note: the mutex is shared among all the threadsmutually exclusive shared in all threads

Making key operations atomic

  • Exclusive use of the mutex operations need be atomic, otherwise errors will result

Mutexes and memory visibility

  • AN with Carry the Implicit mutexes memory barrier(mutex to achieve memory barriers)
  • Program at run-time memory access order and the actual program code written access order is not necessarily the same, this is out of order memory access, Memory barrierallowing CPU or compiler ordered on memory access

Mutex performance

  • mutexes inhibit parallelism (mutex inhibit parallel)
  • GIL And acts as a mutex, at the same time only one thread to execute code
  • restrict the critical section to be as small as possible, while still preserving the safety of your data ( exclusive of the restricted section should as small as possible, and at the same time to ensure data security), limiting smaller portions, you can make other, more and more parallel execution of code, the so-called finer-grained mutexfine-grained mutex

第 10 章: Signaling Threads with Condition Variables

The API by example

  • ConditionVariable#wait It will unlock mutex, and thread enters sleep
  • ConditionVariable#signal After the signal, waiting for the first thread acquires the mutex, and continue

Code: chapter10 / xkcd_printer.rb

Broadcast

  • ConditionVariable#signal

Waiting for the reopening of a state variable thread. Thread will attempt to re-open the mutex lock ConditionVariable # wait referring to. If the waiting thread state, then it is returned to the thread. In addition to the returns nil.

  • ConditionVariable#broadcast

All threads are waiting for the reopening of state variables. Thread will attempt to re-open the ConditionVariable#waitmutex lock within the meaning of

第 11 章: Thread-safe Data Structures

  • Blocking queue using shared objects placed inside the mutex, rather than global, shared objects to each thread, each shared object to ensure that there are concurrent read and write their own right
  • The book's BlockingQueueuse ConditionVariable, in the case if the queue is empty, so that the thread into the sleep
  • QueueIs the only thread-safe data structure ruby standard library, it is through the require 'thread'load, it is also blocking queue
  • The ruby Arrayand Hashnot thread-safe, Jruby and java is not that the use of thread-safe data structure can degrade performance in a single thread, but there are alternatives java
  • In ruby, to use thread-safe Array and Hash, you can thread_saferubygem inThreadSafe::Array ThreadSafe::Hash

第 12 章:Writing Thread-safe Code

  • Idiomatic Ruby code is most often (Idiomatic) Ruby code is thread-safe Ruby code is often customary thread-safe code
  • Avoid mutating globals avoid modify global, global variables are shared between all threads
    • Only one example of shared anything is global. For example: Constants (constant), AST (abstract syntax trees), class variables, class methods
    • modifying the AST at runtime is almost always a bad idea, especially when multiple threads are involved. (AST modified at run-time is often a bad idea, especially in a multi-threaded environment)
    • In other words, it's expected that (In other words, AST preferably modified program starts) the AST will be modified at startup time
  • Create more objects, rather than sharing one (create more objects, rather than sharing a)
    • Thread-locals: multiple threads to create multiple connections for a small number of multi-threading is appropriate, but for concurrent multi-threading is not suitable for a high, overhead is too large, use the thread pool is more appropriate
      # Instead of
       $redis = Redis.new
       # use
       Thread.current[:redis] = Redis.new
      
    • Resource pools: a pool of threads to open multiple connections, or the need to share resources in a multi-threaded, a thread when you want to use a connection, it will be asked to come up with a connection pool connection, thread pool responsible for checking whether the connection is available and threads available to ensure the security thread, a thread of execution when completed, will connect back into the connection pool connection_pool rubygem: https: //github.com/mperham/connection_pool
    • Avoid lazy loading (avoid lazy loading): autoload delay and loaded at runtime, the MRI ruby is not thread-safe, Jruby is thread-safe rails3 in autoloadis not thread-safe, you need to enable config.threadsafe!, in a thread-safe in rails4
    • The Data Structures mutexes over the prefer: (priority thread-safe data structure, rather than a mutex) mutex mutex is difficult to use good, you need to decide a lot of questions:
      • Mutually exclusive size thickness
      • What code should be a key part of
      • It will not lead to deadlock
      • The need for a single global instance locks or lock most programmers are not familiar with the mutex, so use thread-safe data structure can avoid the use of mutex many concerns, you simply don't need tocreate any mutexes in your code.(you do not need to create any mutex in your code lock)
      • Finding bugs: Although you have to follow all the best practices, but there was still a baffling bug occurs, and can be very difficult to track or reproduce, the best bet is to read the source code for the most common problem is a reference to the global, so you can try to use two threads simultaneously access, through such practices cause of the problem may suddenly emerge

... unfinished

Original: Big Box  Andrew's Blog / "Working With Ruby Threads" study notes


Guess you like

Origin www.cnblogs.com/petewell/p/11607061.html