1. Introduction to Lmbench
Lmbench
It is a simple and portable memory test tool. Its main functions include bandwidth evaluation (read cache files, copy memory, read/write memory, pipeline, TCP), delay evaluation (context switching, network, file system establishment) and deletion, process creation, signal processing, upper-level system calls, memory reading reaction time) and other functions.
2. Download and install
Official website address: http://www.bitmover.com/lmbench/
Download link: lmbench-3.0
imaginemiracle:Downloads$ unzip lmbench-3.0-a9.zip
It should be noted that all the files lmbench
in cannot be executed, and a series of errors suchmake
as there will be seen after the direct execution of the compilation .Permission denied
Here you first need to change the permissions of all files:
imaginemiracle:Downloads$ sudo chmod 777 -R lmbench-3.0-a9/
Enter lmbench
the directory , its directory structure is as follows.
imaginemiracle:Downloads$ cd lmbench-3.0-a9/
imaginemiracle:lmbench-3.0-a9$ ls
ACKNOWLEDGEMENTS CHANGES COPYING doc Makefile results src
bin ChangeSet COPYING-2 hbench-REBUTTAL README scripts
3. Use Lmbench test
Execute make results
, after execution, the following options will be prompted to be set:
- MULTIPLE COPIES: Run parallel tests at the same time, corresponding to
scal load
items ; - Job placement selection: job scheduling control method, selected by default
1
, indicating that job placement is allowed; - Options to control job placement: selected by default
1
; - Memory: Set it to be greater than
4
timescache size
, the larger the value, the more accurate the result, and the longer the running time; - SUBSET: The subset to run, including
ALL / HARWARE / OS / DEVELOPMENT
, default selectedall
; - FASTMEM, SLOWFS, DISKS, REMOTE... and other options can be kept as default.
After the setup is complete, the test program starts to run. You need to pay attention to the long running time, and you need to wait patiently, or do other things first 10 min
and then .
4. View the results
Execute make see
to view the running results. If only two lines of commands appear, indicating that the running results are output summary.out
to the file, you can directly view the file. cat ./results/summary.out
.
You will see the following output:
4.1. Basic system information
The basic parameter information of the system starts to be displayed in the output result.
in:
- tlb: Indicates the number of pages in the translation look-aside cache;
- cache line bytes: cache line bytes
- mem par: memory hierarchical parallelization;
- scal load:
Lmbench
the number of parallel executions .
4.2. Processor Performance
The units of the following output results are all us
, and the smaller the value, the better the performance.
- null call:
getppid
the time required to execute ; - null I/O:
/dev/zero
the time to read a byte fromt1
, the time to write a byte/dev/null
tot2
,t1、t2
and the average value is the result of this item; - stat:
stat
the time required for a file (that is, to get the information of a file); - open clos:
open
the total time it takes to open a file and thenclose
delete the file (excluding the time for reading directories and nodes); - slct TCP: the time consumed by selecting a file descriptor through
TCP
a network connection;100
- sig inst:
install signal
the time spent; - sig hndl:
handler signal
the time spent; - fork proc:
fork
a completely the sameprocess
, and the total time consumed byprocess
turning off ; - exec proc: Simulate the working process of a
shell
process :fork
the time it takes for a new process to execute a new command. - sh proc:
fork
A process that also asks the system how long it takesshell
to find and run a new program.
4.3. Mathematical operations
The units of the following output results are all ns
, and the smaller the value, the better the performance.
(1) Integer calculation
(2) Unsigned integer calculation
(3) Floating point calculation
(4) Double-precision floating-point calculation
4.4. Context switching
The units of the following output results are all us
, and the smaller the value, the better the performance.
Multiple processes are connected with unix pipe
a ring , and each process reads from its own pipe token
, performs tasks, and then token
writes to the next process.
context swithing
The time includes: the time to switch processes, plus the time to restore all state of the process (including the restored cache
state ).
- 2p/0k: Each process
size
is0
(does not perform any tasks), and the time consumed by context switching2
when ; - 2p/16k: Each process
size
is16K
(executing tasks), and the time consumed by context switching2
when ;
Subsequent test items and so on.
4.5. Local communication delay
The units of the following output results are all us
, and the smaller the value, the better the performance.
- 2p/0k: Each process
size
is0
(does not perform any tasks), and the time consumed by context switching2
when ; - Pipe: The so-called
hot potato
test , usingpipe
communication between two processes without specific tasks,token
one is passed back and forth between the two processes, and the average time spent back and forth is passed; - AF UNIX: the same as
Pipe
the test item, but the inter-process communication usessocket
communication; - UDP: Same as
Pipe
the test item, but the inter-process communication usesUDP/IP
communication; - RPC/UDP: Same as
Pipe
the test item, but the inter-process communication usessun RPC
the communication, and by default, the protocolRPC
is used for transmission;UDP
- TCP: Same as
Pipe
the test item, but the inter-process communication usesTCP/IP
communication; - RPC/TCP: Same as
Pipe
the test item, but the inter-process communication usessun RPC
the communication,RPC
specifyingTCP
the protocol transmission; - TCP conn: The time it takes to create
socket
the descriptor and establish the connection.
4.6. File and memory delay
The units of the following output results are all us
, and the smaller the value, the better the performance.
- 0K File Create:
0K
the time used to create the file; - 0K File Delete:
0K
the time used for file deletion; - 10K File Create:
10K
the time used to create the file; - 10K File Delete:
10K
the time used for file deletion; - Mmap Latency: Put
n
themmap
into the memory, and then recordunmap
the total consumption time of each and to get the maximum value of each consumption time;mmap
unmap
- Port Fault: Protection page delay time;
- Page Faule: page fault delay time;
- 100fd selct:
100
Configure the time for file descriptorsselect
.
4.7. Local communication bandwidth
The units of the following output results are all MB/s
, and the larger the value, the better the performance.
- Pipe: When two processes are established
pipe
,pipe
eachchunk
is the time it takes64K
to move50MB
data ; - AF UNIX: Establish
unix stream socket
a connection , eachchunk
is the time it takes to transmit64K
data through this ;socket
10MB
- TCP: Same as
Pipe
the test item, butTCP/IP socket
communication is used between processes, and the amount of transmitted data is3MB
; - File reread: the time taken to read the file and put it together;
- Mmap reread: the time it takes to put the file
mmap
into the memory, read the file from the memory and summarize it together; - Bcopy(libc):
do bw_mem $i bcopy
, the speed of copying the specified number of bytes from a specified memory area to another specified memory area; - Bcopy(hand):
do bw_mem %i fcp
, the time it takes to copy data from one location on the disk to another; - Mem read:
bw_mem $i frd
, accumulate the integer values in the array, and test the bandwidthprocessor
of ; - Mem write:
do bw_mem $i fwr
, set each member of the integer array to1
test the bandwidth of writing data to memory.
4.8. Memory Operation Latency
The units of the following output results are all ns
, and the smaller the value, the better the performance.
The local test execution accumulates the value of eachlat_mem_rd
element in the integer array ; the test is the bandwidth of reading data to .4
processor
- L1: Cache 1
- L2: Cache 2
- Main Mem: continuous memory
- Rand Mem: memory random access latency
- Guesses:
IfL1
andL2
are similar, it will be displayed“No L1 cache?”
IfL2
andMain Mem
are similar , it will be displayed“No L2 cache?”