Benchmark

A, About Benchmark
Benchmark is a way to evaluate a performance test, it has long applied throughout the computer field. Benchmark in computer applications is the most successful performance test,
the execution time of the main test load, transmission speed, throughput, resource utilization and so on.
Benchmark performance tuning is two weapon and profile tools. Benchmark pressure testing excavation performance status of the entire system, and profile tools to maximize the rendering system of
operational status and performance, user-friendly diagnose performance problems and tuning.
Second, the composition of the Benchmark
core 3 parts by Benchmark
1, a data set
of data into a structured data type, semi-structured and unstructured data. Since the complex data types in a large data environment, load varied, so large data need Benchmark
generate three types of data and the corresponding load.
1) Data Structure: traditional relational data model that can be represented by a two-dimensional table structure. A typical scene have electricity trading business, financial systems, medical HIS databases, government information systems, etc.
2) semi-structured data: similar to XML, HTML and the like, self-describing, mixed together data structure and content. Typical application scenarios the mail system, Web search engine stores, teaching
learning resource library, file systems, etc., consider using Hbase and other typical KeyValue storage;
3) unstructured data: all kinds of documents, pictures, video and audio. Typical applications are video sites, photo album, video traffic monitoring and so on.
2, the workload
Internet field huge data, user capacity, a natural soil of big data problems. Workloads can be understood from the following design and dimensions of view
1) Type computation-intensive: the CPU-intensive calculations, the IO intensive computing, network-intensive computing;
2) computing paradigm: the SQL, batch, flow calculation, a calculation, a machine learning;
3) calculating a delay: calculating online, offline calculation , real-time computing;
4) application: search engines, social networks, e-commerce, location, media, games.
3, metrics
two weapon Benchmark performance is overvalued and Profile tool. Benchmark test pressure excavation performance status of the entire system, while the maximum presentation tool Profile system
running the system state and performance indicators, a user diagnose performance problems and facilitate tuning.
1) the use of tools
a) At the architectural level: perf, nmon other tools and commands;
B) in the JVM level: btrace, Jconsole, JVisualVM, JMap , JStack other tools and commands;
C) in Spark level: web ui, console log , you can also modify the print log Spark source for performance monitoring.
2) Metrics
a) architecture from the perspective of measurement: Density floating point operations, integer operations density, an interrupt instruction, cache hit ratio, a TLB miss;
B) from the execution time and system throughput Spark angle measurement: Job job execution time, Job throughput, the stage execution time, the stage throughput, the Task execution time, the Task throughput
throughput;
C) from the perspective of system resource utilization metrics Spark: CPU utilization in the specified period of time, memory utilization in the specified period of time , disk usage at the specified time period, the network
network bandwidth utilization in a specified time period;
d) From the measure of scalability: the amount of data expansion, data expansion cluster nodes (scale out), single performance enhancement (scale up).
Third, the use of Benchmark
1, Hibench: developed by Intel benchmarking tool for Hadoop, the open source, users can go to download Github repository
2, Berkeley BigDataBench: With the introduction of Spark by AMPLab development of a large data base testing tools, official website
3, Hadoop GridMix: Hadoop comes with Benchmark, as Hadoop comes with easy to use testing tools, load classic, widely
4, Bigbench: a Teradata, University of Toronto, InfoSizing, Oracle development, its design and using the extended research value, you can see the paper
Bigbench: Towards an industry standard benchmark for big data analytics.
5, BigDataBenchmark: developed by the Chinese Academy of Sciences, the official presentation
6, TPC-DS: Widely used in SQL on Hadoop product review
7. Other Benchmark: Malstone, Cloud Harmony, YCSB , SWIM, LinkBench, DFSIO, Hive performance Benchmark (Pavlo) etc.
, etc.

Published 17 original articles · won praise 4 · Views 2062

Guess you like

Origin blog.csdn.net/myITliveAAA/article/details/89333516