---------------------------------------
QPS是每秒钟处理完请求的次数。
响应时间即RT
QPS和RT的关系
--------单线程场景
假设我们的服务端只有一个线程,那么所有的请求都是串行执行,我们可以很简单的算出系统的QPS,也就是:QPS = 1000ms/RT。假设一个RT过程中CPU计算的时间为49ms,CPU Wait Time 为200ms,那么QPS就为1000/49+200 = 4.01。
---------多线程场景
我们接下来把服务端的线程数提升到2,那么整个系统的QPS则为:2 *(1000/49+200)=8.02。可见QPS随着线程的增加而线性增长,那QPS上不去就加线程呗,听起来很有道理,公式也说得通,但是往往现实并非如此,后面会聊这个问题。
----------最佳线程数
从上面单线程场景来看,CPU Wait time为200ms,你可以理解为CPU这段时间什么都没做,是空闲的,显然我们没把CPU利用起来,这时候我们需要启多个线程去响应请求,把这部分利用起来,那么启动多少个线程呢?我们可以估算一下 空闲时间200ms,我们要把这部分时间转换为CPU Time,那么就是200+49/49 = 5.08个,不考虑上下文切换的话,约等于5个线程。同时还要考虑CPU的核心数和利用率问题,那么我们得到了最佳线程数计算的公式:RT/CPU Time * coreSize * cupRatio
RT = 200(线程等待时间) +49
CPU Time(线程cpu时间) = 49
服务器端最佳线程数量=((线程等待时间+线程cpu时间)/线程cpu时间) * cpu数量
-----------最大QPS
QPS = Thread num * 单线程QPS
= ((线程等待时间+线程cpu时间)/线程cpu时间) * cpu数量 * cupRatio * (1000ms/(线程等待时间+线程cpu时间))
= 1000ms/(线程cpu时间) * cpu数量 * cupRatio
============================================================
搞事情
=====================================================================
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
epoll是linux下高并发服务器的完美方案,因为是基于事件触发的,所以比select快的不只是一个数量级。
单线程epoll,触发量可达到15000,但是加上业务后,因为大多数业务都与数据库打交道,所以就会存在阻塞的情况,这个时候就必须用多线程来提速。
epoll在线程池内,测试结果2000个/s
增加了网络断线后的无效socket检测。
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
4G内存服务器epoll并发量最大能达到多少
2017年05月31日 17:59:52 libaineu2004 阅读数:2322
文章来源:http://www.jb51.net/LINUXjishu/346080.html
这篇文章主要介绍了4G内存服务器epoll并发量最大能达到多少?,本文总结了一些计算的公式,仅做参考,需要的朋友可以参考下
按照题主的意思 是根据内存去算一个最大并发的连接数. 那么首先要找出来单个连接消耗内存的地方.
第一个首先是socket buffer. read 和write 分别有一个, 默认大小在
复制代码
代码如下:
/proc/sys/net/ipv4/tcp_rmem (for read)
/proc/sys/net/ipv4/tcp_wmem (for write)
默认大小都是87K和16K, 最低是4K和4K, 最高是2M,2M, 实际使用默认值最低也要保留8K,8K.
然后是逻辑IO缓冲区
就是比如你监听了recv事件 事件来了 你要有内存可用(一般都是socket建立起就分配好,断开才会释放的).
这个内存是自己写socket程序时候自己控制的, 最低也要4K,4K, 实际使用8K,8K至少.
现在设定一个优化方案和使用场景, 首先假设4G内存全部为空闲(系统和其他进程也要内存的….
假如网络包的大小都可以控制在4K以下, 假设所有连接的网络都不会拥堵, 或者拥堵时候的总量在4K以下:
一个连接的内存消耗是4+4+4+4=16K
4G/16K=26.2万并发
假如网络包的大小都可以控制在8K以下, 假设所有连接的网络都不会拥堵, 或者拥堵时候的总量在8K以下
一个socket的内存占用介于 24K ~ 32K之间, 保守的按照32K算
4G/32K=13.1万并发, 这个在生产环境作为一个纯网络层面的内存消耗, 是可以作为参考的.
假如使用默认配置, 假如所有连接的网络都出现严重拥堵, 不考虑逻辑上的发送队列的占用,
使用默认配置是2M+2M+8+8 ~= 4M
4G/4M=1024并发 ( …
如果考虑到发送队列也拥堵的话 自己脑补.
如果只是为了跑分 为了并发而优化, 没有常驻的逻辑缓冲区 并且socket的网络吞吐量很小并且负载平滑, 把socket buffer size设置系统最低.
那么是
4G/8K = 52.4万并发 这个应该是极限值了
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
============================================================
搞事情
=====================================================================
在linux的网络编程中,很长的时间都在使用select来做事件触发。在linux新的内核中,有了一种替换它的机制,就是epoll。
相比于select,epoll最大的好处在于它不会随着监听fd数目的增长而降低效率。因为在内核中的select实现中,它是采用轮询来处理的,轮询的fd数目越多,自然耗时越多。并且,在linux/posix_types.h头文件有这样的声明:
#define __FD_SETSIZE 1024
表示select最多同时监听1024个fd,当然,可以通过修改头文件再重编译内核来扩大这个数目,但这似乎并不治本。
---------------------------------------------
从上面的调用方式就可以看到epoll比select/poll的优越之处:因为后者每次调用时都要传递你所要监控的所有socket给select/poll系统调用,这意味着需要将用户态的socket列表copy到内核态,如果以万计的句柄会导致每次都要copy几十几百KB的内存到内核态,非常低效。而我们调用epoll_wait时就相当于以往调用select/poll,但是这时却不用传递socket句柄给内核,因为内核已经在epoll_ctl中拿到了要监控的句柄列表。
结论
明显网络触发量不是问题。还是cpu利用率的问题。咳咳咳
如何获取程序的CPU等待时间和计算时间
最后来一个“Dark Magic”估算方法(因为我暂时还没有搞懂它的原理),使用下面的类:
1,Target queue memory usage (bytes): 100000(任务队列总大小不超过100,000字节)
2,createTask() produced pool_size_calculate.AsyncIOTask which took 40 bytes in a queue
AsyncTask,即异步任务
3,Formula(公式): 100000 / 40
* Recommended queue capacity (bytes): 2500
Number of CPU: 4
Target utilization(利用): 1
Elapsed time(总运行时间) (nanos): 3000000000
Compute time(计算时间) (nanos): 47181000
Wait time(等待时间) (nanos): 2952819000
Formula: 4 * 1 * (1 + 2952819000 / 47181000)
* Optimal thread count: 256
推荐的任务队列大小为2500,线程数为256,有点出乎意料之外。我可以如下构造一个线程池:
ThreadPoolExecutor pool =
new ThreadPoolExecutor(256, 256, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue(2500));
========================================
======================================================
================================================================
最后来一个“Dark Magic”估算方法(因为我暂时还没有搞懂它的原理),使用下面的类:
package pool_size_calculate;
import java.math.BigDecimal;
import java.math.RoundingMode;
import java.util.Timer;
import java.util.TimerTask;
import java.util.concurrent.BlockingQueue;
/**
*
* A class that calculates the optimal thread pool boundaries. It takes the
*
* desired target utilization and the desired work queue memory consumption as
*
* input and retuns thread count and work queue capacity.
*
*
*
* @author Niklas Schlimm
*
*
*/
public abstract class PoolSizeCalculator {
/**
*
* The sample queue size to calculate the size of a single {@link Runnable}
*
* element.
*/
private final int SAMPLE_QUEUE_SIZE = 1000;
/**
*
* Accuracy of test run. It must finish within 20ms of the testTime
*
* otherwise we retry the test. This could be configurable.
*/
private final int EPSYLON = 20;
/**
*
* Control variable for the CPU time investigation.
*/
private volatile boolean expired;
/**
*
* Time (millis) of the test run in the CPU time calculation.
*/
private final long testtime = 3000;
/**
*
* Calculates the boundaries of a thread pool for a given {@link Runnable}.
*
*
*
* @param targetUtilization
*
* the desired utilization of the CPUs (0 <= targetUtilization <=
* 1)
* @param targetQueueSizeBytes
* the desired maximum work queue size of the thread pool (bytes)
* */
protected void calculateBoundaries(BigDecimal targetUtilization,
BigDecimal targetQueueSizeBytes) {
calculateOptimalCapacity(targetQueueSizeBytes);
Runnable task = creatTask();
start(task);
start(task); // warm up phase
long cputime = getCurrentThreadCPUTime();
start(task); // test intervall
cputime = getCurrentThreadCPUTime() - cputime;
long waittime = (testtime * 1000000) - cputime;
calculateOptimalThreadCount(cputime, waittime, targetUtilization);
}
private void calculateOptimalCapacity(BigDecimal targetQueueSizeBytes) {
long mem = calculateMemoryUsage();
BigDecimal queueCapacity = targetQueueSizeBytes.divide(new BigDecimal(
mem), RoundingMode.HALF_UP);
System.out.println("Target queue memory usage (bytes): "
+ targetQueueSizeBytes);
System.out.println("createTask() produced "
+ creatTask().getClass().getName() + " which took " + mem
+ " bytes in a queue");
System.out.println("Formula: " + targetQueueSizeBytes + " / " + mem);
System.out.println("* Recommended queue capacity (bytes): "
+ queueCapacity);
}
/**
* * Brian Goetz' optimal thread count formula, see 'Java Concurrency in *
* Practice' (chapter 8.2) * * @param cpu * cpu time consumed by considered
* task * @param wait * wait time of considered task * @param
* targetUtilization * target utilization of the system
*/
private void calculateOptimalThreadCount(long cpu, long wait,
BigDecimal targetUtilization) {
BigDecimal waitTime = new BigDecimal(wait);
BigDecimal computeTime = new BigDecimal(cpu);
BigDecimal numberOfCPU = new BigDecimal(Runtime.getRuntime()
.availableProcessors());
BigDecimal optimalthreadcount = numberOfCPU.multiply(targetUtilization)
.multiply(
new BigDecimal(1).add(waitTime.divide(computeTime,
RoundingMode.HALF_UP)));
System.out.println("Number of CPU: " + numberOfCPU);
System.out.println("Target utilization: " + targetUtilization);
System.out.println("Elapsed time (nanos): " + (testtime * 1000000));
System.out.println("Compute time (nanos): " + cpu);
System.out.println("Wait time (nanos): " + wait);
System.out.println("Formula: " + numberOfCPU + " * "
+ targetUtilization + " * (1 + " + waitTime + " / "
+ computeTime + ")");
System.out.println("* Optimal thread count: " + optimalthreadcount);
}
/**
* * Runs the {@link Runnable} over a period defined in {@link #testtime}. *
* Based on Heinz Kabbutz' ideas *
* (http://www.javaspecialists.eu/archive/Issue124.html). * * @param task *
* the runnable under investigation
*/
public void start(Runnable task) {
long start = 0;
int runs = 0;
do {
if (++runs > 5) {
throw new IllegalStateException("Test not accurate");
}
expired = false;
start = System.currentTimeMillis();
Timer timer = new Timer();
timer.schedule(new TimerTask() {
public void run() {
expired = true;
}
}, testtime);
while (!expired) {
task.run();
}
start = System.currentTimeMillis() - start;
timer.cancel();
} while (Math.abs(start - testtime) > EPSYLON);
collectGarbage(3);
}
private void collectGarbage(int times) {
for (int i = 0; i < times; i++) {
System.gc();
try {
Thread.sleep(10);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
break;
}
}
}
/**
*
* Calculates the memory usage of a single element in a work queue. Based on
*
* Heinz Kabbutz' ideas
*
* (http://www.javaspecialists.eu/archive/Issue029.html).
*
*
*
* @return memory usage of a single {@link Runnable} element in the thread
*
* pools work queue
*/
public long calculateMemoryUsage() {
BlockingQueue queue = createWorkQueue();
for (int i = 0; i < SAMPLE_QUEUE_SIZE; i++) {
queue.add(creatTask());
}
long mem0 = Runtime.getRuntime().totalMemory()
- Runtime.getRuntime().freeMemory();
long mem1 = Runtime.getRuntime().totalMemory()
- Runtime.getRuntime().freeMemory();
queue = null;
collectGarbage(15);
mem0 = Runtime.getRuntime().totalMemory()
- Runtime.getRuntime().freeMemory();
queue = createWorkQueue();
for (int i = 0; i < SAMPLE_QUEUE_SIZE; i++) {
queue.add(creatTask());
}
collectGarbage(15);
mem1 = Runtime.getRuntime().totalMemory()
- Runtime.getRuntime().freeMemory();
return (mem1 - mem0) / SAMPLE_QUEUE_SIZE;
}
/**
*
* Create your runnable task here.
*
*
*
* @return an instance of your runnable task under investigation
*/
protected abstract Runnable creatTask();
/**
*
* Return an instance of the queue used in the thread pool.
*
*
*
* @return queue instance
*/
protected abstract BlockingQueue createWorkQueue();
/**
*
* Calculate current cpu time. Various frameworks may be used here,
*
* depending on the operating system in use. (e.g.
*
* http://www.hyperic.com/products/sigar). The more accurate the CPU time
*
* measurement, the more accurate the results for thread count boundaries.
*
*
*
* @return current cpu time of current thread
*/
protected abstract long getCurrentThreadCPUTime();
}
然后自己继承这个抽象类并实现它的三个抽象方法,比如下面是我写的一个示例(任务是请求网络数据),其中我指定期望CPU利用率为1.0(即100%),任务队列总大小不超过100,000字节:
package pool_size_calculate;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.lang.management.ManagementFactory;
import java.math.BigDecimal;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedBlockingQueue;
public class SimplePoolSizeCaculatorImpl extends PoolSizeCalculator {
@Override
protected Runnable creatTask() {
return new AsyncIOTask();
}
@Override
protected BlockingQueue createWorkQueue() {
return new LinkedBlockingQueue(1000);
}
@Override
protected long getCurrentThreadCPUTime() {
return ManagementFactory.getThreadMXBean().getCurrentThreadCpuTime();
}
public static void main(String[] args) {
PoolSizeCalculator poolSizeCalculator = new SimplePoolSizeCaculatorImpl();
poolSizeCalculator.calculateBoundaries(new BigDecimal(1.0),
new BigDecimal(100000));
}
}
/**
* 自定义的异步IO任务
*
* @author Will
*
*/
class AsyncIOTask implements Runnable {
@Override
public void run() {
HttpURLConnection connection = null;
BufferedReader reader = null;
try {
String getURL = "http://baidu.com";
URL getUrl = new URL(getURL);
connection = (HttpURLConnection) getUrl.openConnection();
connection.connect();
reader = new BufferedReader(new InputStreamReader(
connection.getInputStream()));
String line;
while ((line = reader.readLine()) != null) {
// empty loop
}
}
catch (IOException e) {
} finally {
if (reader != null) {
try {
reader.close();
} catch (Exception e) {
}
}
connection.disconnect();
}
}
}