docker学习（八）—— docker的系统资源限制及验证（基础篇）

限制容器的资源：

默认情况下，一个容器是没有任何资源限制的，可以几乎耗尽内核可分配给当前容器的所有资源，宿主机的调度器能调度多少资源，容器就可以用多少资源（高负载的情况下）

docker提供了下面的途径：如何限制内存，CPU，磁盘IO等，

内存是非可压缩资源，CPU是可压缩资源，依赖于linux的一些深层知识

memory hogs

oom obj

oom score

非常非常重要的容器在创建时就应该调整它的oom obj

docker run或docker create时限制容器的使用资源

内存资源控制：https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt

默认情况下，每个容器都可以使用宿主机的所有CPU

用户可以设置不同的容器去限制每个容器的CPU

Docker1.13以上，you can also configure the realtime scheduler.

Most user use and configure the default CFS scheduler

进程：CPU密集型，IO密集型

OOME

在linux主机上，当内核探测到宿主机没有足够的内存可用于系统重要的功能，就会抛出一个OOME（Out Of Memory Exception）异常，并开始kill某些进程以释放内存资源

一旦发生OOME，任何进程都有可能被杀死，包括docker daemon在内
为此，Docker特地调整了docker daemon的OOM优先级，以免他被“正法”，但容器的优先级并未被调整

lscpu

[root@node1 ~]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                1
On-line CPU(s) list:   0
Thread(s) per core:    1
Core(s) per socket:    1
座：                 1
NUMA 节点：         1
厂商 ID：           GenuineIntel
CPU 系列：          6
型号：              42
型号名称：        Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
步进：              7
CPU MHz：             3801.000
BogoMIPS：            7602.00
虚拟化：           VT-x
超管理器厂商：  VMware
虚拟化类型：     完全
L1d 缓存：          32K
L1i 缓存：          32K
L2 缓存：           256K
L3 缓存：           8192K
NUMA 节点0 CPU：    0
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc eagerfpu pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx hypervisor lahf_lm tpr_shadow vnmi ept vpid tsc_adjust arat

docker run --help

[root@node1 ~]# docker run --help

Usage:	docker run [OPTIONS] IMAGE [COMMAND] [ARG...]

Run a command in a new container

Options:
      --add-host list                  Add a custom host-to-IP mapping (host:ip)
  -a, --attach list                    Attach to STDIN, STDOUT or STDERR
      --blkio-weight uint16            Block IO (relative weight), between 10 and 1000, or 0 to disable (default 0)
      --blkio-weight-device list       Block IO weight (relative device weight) (default [])
      --cap-add list                   Add Linux capabilities
      --cap-drop list                  Drop Linux capabilities
      --cgroup-parent string           Optional parent cgroup for the container
      --cidfile string                 Write the container ID to the file
      --cpu-period int                 Limit CPU CFS (Completely Fair Scheduler) period
      --cpu-quota int                  Limit CPU CFS (Completely Fair Scheduler) quota
      --cpu-rt-period int              Limit CPU real-time period in microseconds
      --cpu-rt-runtime int             Limit CPU real-time runtime in microseconds
  -c, --cpu-shares int                 CPU shares (relative weight)
      --cpus decimal                   Number of CPUs
      --cpuset-cpus string             CPUs in which to allow execution (0-3, 0,1)
      --cpuset-mems string             MEMs in which to allow execution (0-3, 0,1)
  -d, --detach                         Run container in background and print container ID
      --detach-keys string             Override the key sequence for detaching a container
      --device list                    Add a host device to the container
      --device-cgroup-rule list        Add a rule to the cgroup allowed devices list
      --device-read-bps list           Limit read rate (bytes per second) from a device (default [])
      --device-read-iops list          Limit read rate (IO per second) from a device (default [])
      --device-write-bps list          Limit write rate (bytes per second) to a device (default [])
      --device-write-iops list         Limit write rate (IO per second) to a device (default [])
      --disable-content-trust          Skip image verification (default true)
      --dns list                       Set custom DNS servers
      --dns-option list                Set DNS options
      --dns-search list                Set custom DNS search domains
      --entrypoint string              Overwrite the default ENTRYPOINT of the image
  -e, --env list                       Set environment variables
      --env-file list                  Read in a file of environment variables
      --expose list                    Expose a port or a range of ports
      --group-add list                 Add additional groups to join
      --health-cmd string              Command to run to check health
      --health-interval duration       Time between running the check (ms|s|m|h) (default 0s)
      --health-retries int             Consecutive failures needed to report unhealthy
      --health-start-period duration   Start period for the container to initialize before starting health-retries countdown
                                       (ms|s|m|h) (default 0s)
      --health-timeout duration        Maximum time to allow one check to run (ms|s|m|h) (default 0s)
      --help                           Print usage
  -h, --hostname string                Container host name
      --init                           Run an init inside the container that forwards signals and reaps processes
  -i, --interactive                    Keep STDIN open even if not attached
      --ip string                      IPv4 address (e.g., 172.30.100.104)
      --ip6 string                     IPv6 address (e.g., 2001:db8::33)
      --ipc string                     IPC mode to use
      --isolation string               Container isolation technology
      --kernel-memory bytes            Kernel memory limit
  -l, --label list                     Set meta data on a container
      --label-file list                Read in a line delimited file of labels
      --link list                      Add link to another container
      --link-local-ip list             Container IPv4/IPv6 link-local addresses
      --log-driver string              Logging driver for the container
      --log-opt list                   Log driver options
      --mac-address string             Container MAC address (e.g., 92:d0:c6:0a:29:33)
  -m, --memory bytes                   Memory limit
      --memory-reservation bytes       Memory soft limit
      --memory-swap bytes              Swap limit equal to memory plus swap: '-1' to enable unlimited swap
      --memory-swappiness int          Tune container memory swappiness (0 to 100) (default -1)
      --mount mount                    Attach a filesystem mount to the container
      --name string                    Assign a name to the container
      --network string                 Connect a container to a network (default "default")
      --network-alias list             Add network-scoped alias for the container
      --no-healthcheck                 Disable any container-specified HEALTHCHECK
      --oom-kill-disable               Disable OOM Killer
      --oom-score-adj int              Tune host's OOM preferences (-1000 to 1000)
      --pid string                     PID namespace to use
      --pids-limit int                 Tune container pids limit (set -1 for unlimited)
      --privileged                     Give extended privileges to this container
  -p, --publish list                   Publish a container's port(s) to the host
  -P, --publish-all                    Publish all exposed ports to random ports
      --read-only                      Mount the container's root filesystem as read only
      --restart string                 Restart policy to apply when a container exits (default "no")
      --rm                             Automatically remove the container when it exits
      --runtime string                 Runtime to use for this container
      --security-opt list              Security Options
      --shm-size bytes                 Size of /dev/shm
      --sig-proxy                      Proxy received signals to the process (default true)
      --stop-signal string             Signal to stop a container (default "SIGTERM")
      --stop-timeout int               Timeout (in seconds) to stop a container
      --storage-opt list               Storage driver options for the container
      --sysctl map                     Sysctl options (default map[])
      --tmpfs list                     Mount a tmpfs directory
  -t, --tty                            Allocate a pseudo-TTY
      --ulimit ulimit                  Ulimit options (default [])
  -u, --user string                    Username or UID (format: <name|uid>[:<group|gid>])
      --userns string                  User namespace to use
      --uts string                     UTS namespace to use
  -v, --volume list                    Bind mount a volume
      --volume-driver string           Optional volume driver for the container
      --volumes-from list              Mount volumes from the specified container(s)
  -w, --workdir string                 Working directory inside the container

压测的镜像：stress-ng

https://hub.docker.com/r/lorel/docker-stress-ng

[root@node1 ~]# docker pull lorel/docker-stress-ng

[root@node1 ~]# docker run --name stress -it --rm lorel/docker-stress-ng:latest stress --help

[root@node1 ~]# docker run --name stress -it --rm lorel/docker-stress-ng:latest stress --help
stress-ng, version 0.03.11

Usage: stress-ng [OPTION [ARG]]
 --h,  --help             show help
       --affinity N       start N workers that rapidly change CPU affinity
       --affinity-ops N   stop when N affinity bogo operations completed
       --affinity-rand    change affinity randomly rather than sequentially
       --aio N            start N workers that issue async I/O requests
       --aio-ops N        stop when N bogo async I/O requests completed
       --aio-requests N   number of async I/O requests per worker
 -a N, --all N            start N workers of each stress test
 -b N, --backoff N        wait of N microseconds before work starts
 -B N, --bigheap N        start N workers that grow the heap using calloc()
       --bigheap-ops N    stop when N bogo bigheap operations completed
       --bigheap-growth N grow heap by N bytes per iteration
       --brk N            start N workers performing rapid brk calls
       --brk-ops N        stop when N brk bogo operations completed
       --brk-notouch      don't touch (page in) new data segment page
       --bsearch          start N workers that exercise a binary search
       --bsearch-ops      stop when N binary search bogo operations completed
       --bsearch-size     number of 32 bit integers to bsearch
 -C N, --cache N          start N CPU cache thrashing workers
       --cache-ops N      stop when N cache bogo operations completed (x86 only)
       --cache-flush      flush cache after every memory write (x86 only)
       --cache-fence      serialize stores
       --class name       specify a class of stressors, use with --sequential
       --chmod N          start N workers thrashing chmod file mode bits 
       --chmod-ops N      stop chmod workers after N bogo operations
 -c N, --cpu N            start N workers spinning on sqrt(rand())
       --cpu-ops N        stop when N cpu bogo operations completed
 -l P, --cpu-load P       load CPU by P %%, 0=sleep, 100=full load (see -c)
       --cpu-method m     specify stress cpu method m, default is all
 -D N, --dentry N         start N dentry thrashing processes
       --dentry-ops N     stop when N dentry bogo operations completed
       --dentry-order O   specify dentry unlink order (reverse, forward, stride)
       --dentries N       create N dentries per iteration
       --dir N            start N directory thrashing processes
       --dir-ops N        stop when N directory bogo operations completed
 -n,   --dry-run          do not run
       --dup N            start N workers exercising dup/close
       --dup-ops N        stop when N dup/close bogo operations completed
       --epoll N          start N workers doing epoll handled socket activity
       --epoll-ops N      stop when N epoll bogo operations completed
       --epoll-port P     use socket ports P upwards
       --epoll-domain D   specify socket domain, default is unix
       --eventfd N        start N workers stressing eventfd read/writes
       --eventfd-ops N    stop eventfd workers after N bogo operations
       --fault N          start N workers producing page faults
       --fault-ops N      stop when N page fault bogo operations completed
       --fifo N           start N workers exercising fifo I/O
       --fifo-ops N       stop when N fifo bogo operations completed
       --fifo-readers N   number of fifo reader processes to start
       --flock N          start N workers locking a single file
       --flock-ops N      stop when N flock bogo operations completed
 -f N, --fork N           start N workers spinning on fork() and exit()
       --fork-ops N       stop when N fork bogo operations completed
       --fork-max P       create P processes per iteration, default is 1
       --fstat N          start N workers exercising fstat on files
       --fstat-ops N      stop when N fstat bogo operations completed
       --fstat-dir path   fstat files in the specified directory
       --futex N          start N workers exercising a fast mutex
       --futex-ops N      stop when N fast mutex bogo operations completed
       --get N            start N workers exercising the get*() system calls
       --get-ops N        stop when N get bogo operations completed
 -d N, --hdd N            start N workers spinning on write()/unlink()
       --hdd-ops N        stop when N hdd bogo operations completed
       --hdd-bytes N      write N bytes per hdd worker (default is 1GB)
       --hdd-direct       minimize cache effects of the I/O
       --hdd-dsync        equivalent to a write followed by fdatasync
       --hdd-noatime      do not update the file last access time
       --hdd-sync         equivalent to a write followed by fsync
       --hdd-write-size N set the default write size to N bytes
       --hsearch          start N workers that exercise a hash table search
       --hsearch-ops      stop when N hash search bogo operations completed
       --hsearch-size     number of integers to insert into hash table
       --inotify N        start N workers exercising inotify events
       --inotify-ops N    stop inotify workers after N bogo operations
 -i N, --io N             start N workers spinning on sync()
       --io-ops N         stop when N io bogo operations completed
       --ionice-class C   specify ionice class (idle, besteffort, realtime)
       --ionice-level L   specify ionice level (0 max, 7 min)
 -k,   --keep-name        keep stress process names to be 'stress-ng'
       --kill N           start N workers killing with SIGUSR1
       --kill-ops N       stop when N kill bogo operations completed
       --lease N          start N workers holding and breaking a lease
       --lease-ops N      stop when N lease bogo operations completed
       --lease-breakers N number of lease breaking processes to start
       --link N           start N workers creating hard links
       --link-ops N       stop when N link bogo operations completed
       --lsearch          start N workers that exercise a linear search
       --lsearch-ops      stop when N linear search bogo operations completed
       --lsearch-size     number of 32 bit integers to lsearch
 -M,   --metrics          print pseudo metrics of activity
       --metrics-brief    enable metrics and only show non-zero results
       --memcpy N         start N workers performing memory copies
       --memcpy-ops N     stop when N memcpy bogo operations completed
       --mmap N           start N workers stressing mmap and munmap
       --mmap-ops N       stop when N mmap bogo operations completed
       --mmap-async       using asynchronous msyncs for file based mmap
       --mmap-bytes N     mmap and munmap N bytes for each stress iteration
       --mmap-file        mmap onto a file using synchronous msyncs
       --mmap-mprotect    enable mmap mprotect stressing
       --msg N            start N workers passing messages using System V messages
       --msg-ops N        stop msg workers after N bogo messages completed
       --mq N             start N workers passing messages using POSIX messages
       --mq-ops N         stop mq workers after N bogo messages completed
       --mq-size N        specify the size of the POSIX message queue
       --nice N           start N workers that randomly re-adjust nice levels
       --nice-ops N       stop when N nice bogo operations completed
       --no-madvise       don't use random madvise options for each mmap
       --null N           start N workers writing to /dev/null
       --null-ops N       stop when N /dev/null bogo write operations completed
 -o,   --open N           start N workers exercising open/close
       --open-ops N       stop when N open/close bogo operations completed
 -p N, --pipe N           start N workers exercising pipe I/O
       --pipe-ops N       stop when N pipe I/O bogo operations completed
 -P N, --poll N           start N workers exercising zero timeout polling
       --poll-ops N       stop when N poll bogo operations completed
       --procfs N         start N workers reading portions of /proc
       --procfs-ops N     stop procfs workers after N bogo read operations
       --pthread N        start N workers that create multiple threads
       --pthread-ops N    stop pthread workers after N bogo threads created
       --pthread-max P    create P threads at a time by each worker
 -Q,   --qsort N          start N workers exercising qsort on 32 bit random integers
       --qsort-ops N      stop when N qsort bogo operations completed
       --qsort-size N     number of 32 bit integers to sort
 -q,   --quiet            quiet output
 -r,   --random N         start N random workers
       --rdrand N         start N workers exercising rdrand instruction (x86 only)
       --rdrand-ops N     stop when N rdrand bogo operations completed
 -R,   --rename N         start N workers exercising file renames
       --rename-ops N     stop when N rename bogo operations completed
       --sched type       set scheduler type
       --sched-prio N     set scheduler priority level N
       --seek N           start N workers performing random seek r/w IO
       --seek-ops N       stop when N seek bogo operations completed
       --seek-size N      length of file to do random I/O upon
       --sem N            start N workers doing semaphore operations
       --sem-ops N        stop when N semaphore bogo operations completed
       --sem-procs N      number of processes to start per worker
       --sendfile N       start N workers exercising sendfile
       --sendfile-ops N   stop after N bogo sendfile operations
       --sendfile-size N  size of data to be sent with sendfile
       --sequential N     run all stressors one by one, invoking N of them
       --sigfd N          start N workers reading signals via signalfd reads 
       --sigfd-ops N      stop when N bogo signalfd reads completed
       --sigfpe N         start N workers generating floating point math faults
       --sigfpe-ops N     stop when N bogo floating point math faults completed
       --sigsegv N        start N workers generating segmentation faults
       --sigsegv-ops N    stop when N bogo segmentation faults completed
 -S N, --sock N           start N workers doing socket activity
       --sock-ops N       stop when N socket bogo operations completed
       --sock-port P      use socket ports P to P + number of workers - 1
       --sock-domain D    specify socket domain, default is ipv4
       --stack N          start N workers generating stack overflows
       --stack-ops N      stop when N bogo stack overflows completed
 -s N, --switch N         start N workers doing rapid context switches
       --switch-ops N     stop when N context switch bogo operations completed
       --symlink N        start N workers creating symbolic links
       --symlink-ops N    stop when N symbolic link bogo operations completed
       --sysinfo N        start N workers reading system information
       --sysinfo-ops N    stop when sysinfo bogo operations completed
 -t N, --timeout N        timeout after N seconds
 -T N, --timer N          start N workers producing timer events
       --timer-ops N      stop when N timer bogo events completed
       --timer-freq F     run timer(s) at F Hz, range 1000 to 1000000000
       --tsearch          start N workers that exercise a tree search
       --tsearch-ops      stop when N tree search bogo operations completed
       --tsearch-size     number of 32 bit integers to tsearch
       --times            show run time summary at end of the run
 -u N, --urandom N        start N workers reading /dev/urandom
       --urandom-ops N    stop when N urandom bogo read operations completed
       --utime N          start N workers updating file timestamps
       --utime-ops N      stop after N utime bogo operations completed
       --utime-fsync      force utime meta data sync to the file system
 -v,   --verbose          verbose output
       --verify           verify results (not available on all tests)
 -V,   --version          show version
 -m N, --vm N             start N workers spinning on anonymous mmap
       --vm-bytes N       allocate N bytes per vm worker (default 256MB)
       --vm-hang N        sleep N seconds before freeing memory
       --vm-keep          redirty memory instead of reallocating
       --vm-ops N         stop when N vm bogo operations completed
       --vm-locked        lock the pages of the mapped region into memory
       --vm-method m      specify stress vm method m, default is all
       --vm-populate      populate (prefault) page tables for a mapping
       --wait N           start N workers waiting on child being stop/resumed
       --wait-ops N       stop when N bogo wait operations completed
       --zero N           start N workers reading /dev/zero
       --zero-ops N       stop when N /dev/zero bogo read operations completed

Example: stress-ng --cpu 8 --io 4 --vm 2 --vm-bytes 128M --fork 4 --timeout 10s

Note: Sizes can be suffixed with B,K,M,G and times with s,m,h,d,y

使用stress镜像压测玩一下：

-m 256m表示该容器最大使用256m内存

--vm 2 表示启动两个进程，每个进程默认256m内存，但容器只给分配了256m

[root@node1 ~]# docker run --name stress -it --rm -m 256m lorel/docker-stress-ng:latest stress  --vm 2 
stress-ng: info: [1] defaulting to a 86400 second run per stressor
stress-ng: info: [1] dispatching hogs: 2 vm

docker top stress查看stress容器占了多少资源，有4个子进程

[root@node1 ~]# docker top stress
UID                 PID                 PPID                C                   STIME               TTY                 TIME                CMD
root                24334               24320               0                   21:49               ?                   00:00:00            /usr/bin/stress-ng stress --vm 2
root                24371               24334               0                   21:49               ?                   00:00:00            /usr/bin/stress-ng stress --vm 2
root                24372               24334               0                   21:49               ?                   00:00:00            /usr/bin/stress-ng stress --vm 2
root                24374               24372               26                  21:49               ?                   00:00:00            /usr/bin/stress-ng stress --vm 2
root                24379               24371               51                  21:49               ?                   00:00:00            /usr/bin/stress-ng stress --vm 2

docker stats监控容器资源消耗

这个是实施刷新的

证实，容器内存资源可以限制：只分配256m，容器需要512m，没有，哈哈

stress压测CPU

--cpus 1限制容器只能最大使用一个核心

--cpu 8表示启动8个进程，但由于限制容器只能使用一个核。所以CPU最大利用率不超过百分之百

[root@docker1 harbor]# docker container run --name stress2 --rm -it --cpus 1 lorel/docker-stress-ng --cpu 8
stress-ng: info: [1] defaulting to a 86400 second run per stressor
stress-ng: info: [1] dispatching hogs: 8 cpu

docker stats查看容器资源消耗情况

如果不限制容器的内核，那么两个容器的CPU使用率会是加起来小于百分之二百（两核）

[root@docker1 harbor]# docker container run --name stress2 --rm -it lorel/docker-stress-ng --cpu 8
stress-ng: info: [1] defaulting to a 86400 second run per stressor
stress-ng: info: [1] dispatching hogs: 8 cpu

CPU负载很高：

指定容器运行在某个核心上

docker2机器有4个核心，不限制cpu的话单个容器cpu使用率可以高达差不多百分之400

--cpuset-cpus 0,1表示容器在0和1核心上运行，使用率自然约为百分之二百

[root@docker2 ~]# docker container run --name stress2 --rm -it --cpuset-cpus 0,1 lorel/docker-stress-ng --cpu 8
stress-ng: info: [1] defaulting to a 86400 second run per stressor
stress-ng: info: [1] dispatching hogs: 8 cpu

--cpu-shares int CPU shares (relative weight)

设置CPU利用率的权重

第一个容器权重设置为100并查看容器资源利用率：

[root@docker2 ~]# docker container run --name stress2 --rm -it --cpu-shares 100 lorel/docker-stress-ng --cpu 8
stress-ng: info: [1] defaulting to a 86400 second run per stressor
stress-ng: info: [1] dispatching hogs: 8 cpu

再启动第二个容器，CPU利用率权重为50

[root@docker2 ~]# docker container run --name stress3 --rm -it --cpu-shares 50 lorel/docker-stress-ng --cpu 8
stress-ng: info: [1] defaulting to a 86400 second run per stressor
stress-ng: info: [1] dispatching hogs: 8 cpu

CPU使用率基本上是2:1

然后电脑就卡得不要不要的，直到我杀死容器

（未完待续）。。。