最佳线程数 6666 epoll触发量

---------------------------------------
QPS是每秒钟处理完请求的次数。
响应时间即RT

QPS和RT的关系
--------单线程场景
假设我们的服务端只有一个线程，那么所有的请求都是串行执行，我们可以很简单的算出系统的QPS，也就是：QPS = 1000ms/RT。假设一个RT过程中CPU计算的时间为49ms，CPU Wait Time 为200ms，那么QPS就为1000/49+200 = 4.01。

---------多线程场景
我们接下来把服务端的线程数提升到2，那么整个系统的QPS则为：2 *（1000/49+200）=8.02。可见QPS随着线程的增加而线性增长，那QPS上不去就加线程呗，听起来很有道理，公式也说得通，但是往往现实并非如此，后面会聊这个问题。

----------最佳线程数
从上面单线程场景来看，CPU Wait time为200ms,你可以理解为CPU这段时间什么都没做，是空闲的，显然我们没把CPU利用起来，这时候我们需要启多个线程去响应请求，把这部分利用起来，那么启动多少个线程呢？我们可以估算一下空闲时间200ms，我们要把这部分时间转换为CPU Time,那么就是200+49/49 = 5.08个，不考虑上下文切换的话，约等于5个线程。同时还要考虑CPU的核心数和利用率问题，那么我们得到了最佳线程数计算的公式：RT/CPU Time * coreSize * cupRatio
RT = 200(线程等待时间) +49
CPU Time(线程cpu时间) = 49

服务器端最佳线程数量=((线程等待时间+线程cpu时间)/线程cpu时间) * cpu数量

-----------最大QPS
QPS = Thread num * 单线程QPS
= ((线程等待时间+线程cpu时间)/线程cpu时间) * cpu数量 * cupRatio * (1000ms/(线程等待时间+线程cpu时间))

= 1000ms/(线程cpu时间) * cpu数量 * cupRatio

============================================================
搞事情
=====================================================================
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
epoll是linux下高并发服务器的完美方案，因为是基于事件触发的，所以比select快的不只是一个数量级。
单线程epoll，触发量可达到15000，但是加上业务后，因为大多数业务都与数据库打交道，所以就会存在阻塞的情况，这个时候就必须用多线程来提速。

epoll在线程池内，测试结果2000个/s
增加了网络断线后的无效socket检测。

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
4G内存服务器epoll并发量最大能达到多少
2017年05月31日 17:59:52 libaineu2004 阅读数：2322

文章来源：http://www.jb51.net/LINUXjishu/346080.html

这篇文章主要介绍了4G内存服务器epoll并发量最大能达到多少？,本文总结了一些计算的公式,仅做参考,需要的朋友可以参考下

按照题主的意思是根据内存去算一个最大并发的连接数. 那么首先要找出来单个连接消耗内存的地方.

第一个首先是socket buffer. read 和write 分别有一个, 默认大小在

复制代码
代码如下:

/proc/sys/net/ipv4/tcp_rmem (for read)
/proc/sys/net/ipv4/tcp_wmem (for write)

默认大小都是87K和16K, 最低是4K和4K, 最高是2M,2M, 实际使用默认值最低也要保留8K,8K.

然后是逻辑IO缓冲区

就是比如你监听了recv事件事件来了你要有内存可用(一般都是socket建立起就分配好,断开才会释放的).
这个内存是自己写socket程序时候自己控制的, 最低也要4K,4K, 实际使用8K,8K至少.

现在设定一个优化方案和使用场景, 首先假设4G内存全部为空闲(系统和其他进程也要内存的….

假如网络包的大小都可以控制在4K以下, 假设所有连接的网络都不会拥堵, 或者拥堵时候的总量在4K以下:
一个连接的内存消耗是4+4+4+4=16K
4G/16K=26.2万并发

假如网络包的大小都可以控制在8K以下, 假设所有连接的网络都不会拥堵, 或者拥堵时候的总量在8K以下
一个socket的内存占用介于 24K ~ 32K之间, 保守的按照32K算
4G/32K=13.1万并发, 这个在生产环境作为一个纯网络层面的内存消耗, 是可以作为参考的.

假如使用默认配置, 假如所有连接的网络都出现严重拥堵, 不考虑逻辑上的发送队列的占用,
使用默认配置是2M+2M+8+8 ~= 4M
4G/4M=1024并发 ( …
如果考虑到发送队列也拥堵的话自己脑补.

如果只是为了跑分为了并发而优化, 没有常驻的逻辑缓冲区并且socket的网络吞吐量很小并且负载平滑, 把socket buffer size设置系统最低.
那么是
4G/8K = 52.4万并发这个应该是极限值了
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD

============================================================
搞事情
=====================================================================

在linux的网络编程中，很长的时间都在使用select来做事件触发。在linux新的内核中，有了一种替换它的机制，就是epoll。
相比于select，epoll最大的好处在于它不会随着监听fd数目的增长而降低效率。因为在内核中的select实现中，它是采用轮询来处理的，轮询的fd数目越多，自然耗时越多。并且，在linux/posix_types.h头文件有这样的声明：
#define __FD_SETSIZE 1024
表示select最多同时监听1024个fd，当然，可以通过修改头文件再重编译内核来扩大这个数目，但这似乎并不治本。

---------------------------------------------
从上面的调用方式就可以看到epoll比select/poll的优越之处：因为后者每次调用时都要传递你所要监控的所有socket给select/poll系统调用，这意味着需要将用户态的socket列表copy到内核态，如果以万计的句柄会导致每次都要copy几十几百KB的内存到内核态，非常低效。而我们调用epoll_wait时就相当于以往调用select/poll，但是这时却不用传递socket句柄给内核，因为内核已经在epoll_ctl中拿到了要监控的句柄列表。

结论

明显网络触发量不是问题。还是cpu利用率的问题。咳咳咳

如何获取程序的CPU等待时间和计算时间

最后来一个“Dark Magic”估算方法（因为我暂时还没有搞懂它的原理），使用下面的类：

1，Target queue memory usage (bytes): 100000（任务队列总大小不超过100,000字节）
2，createTask() produced pool_size_calculate.AsyncIOTask which took 40 bytes in a queue
AsyncTask,即异步任务
3,Formula(公式): 100000 / 40
* Recommended queue capacity (bytes): 2500
Number of CPU: 4
Target utilization(利用): 1
Elapsed time(总运行时间) (nanos): 3000000000
Compute time(计算时间) (nanos): 47181000
Wait time(等待时间) (nanos): 2952819000
Formula: 4 * 1 * (1 + 2952819000 / 47181000)
* Optimal thread count: 256

推荐的任务队列大小为2500，线程数为256，有点出乎意料之外。我可以如下构造一个线程池：

ThreadPoolExecutor pool =
new ThreadPoolExecutor(256, 256, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue(2500));

========================================
======================================================
================================================================

最后来一个“Dark Magic”估算方法（因为我暂时还没有搞懂它的原理），使用下面的类：

package pool_size_calculate;

import java.math.BigDecimal;

import java.math.RoundingMode;

import java.util.Timer;

import java.util.TimerTask;

import java.util.concurrent.BlockingQueue;

/**
*
* A class that calculates the optimal thread pool boundaries. It takes the
*
* desired target utilization and the desired work queue memory consumption as
*
* input and retuns thread count and work queue capacity.
*
*
*
* @author Niklas Schlimm
*
*
*/

public abstract class PoolSizeCalculator {

   /**
   *
   * The sample queue size to calculate the size of a single {@link Runnable}
   *
   * element.
   */

private final int SAMPLE_QUEUE_SIZE = 1000;

   /**
   *
   * Accuracy of test run. It must finish within 20ms of the testTime
   *
   * otherwise we retry the test. This could be configurable.
   */

private final int EPSYLON = 20;

   /**
   *
   * Control variable for the CPU time investigation.
   */

private volatile boolean expired;

   /**
   *
   * Time (millis) of the test run in the CPU time calculation.
   */

private final long testtime = 3000;

   /**
   *
   * Calculates the boundaries of a thread pool for a given {@link Runnable}.
   *
   *
   *
   * @param targetUtilization
   *
   *            the desired utilization of the CPUs (0 <= targetUtilization <=
   *            1)
   * @param targetQueueSizeBytes
   *            the desired maximum work queue size of the thread pool (bytes)
   * */
   protected void calculateBoundaries(BigDecimal targetUtilization,
           BigDecimal targetQueueSizeBytes) {
       calculateOptimalCapacity(targetQueueSizeBytes);
       Runnable task = creatTask();
       start(task);
       start(task); // warm up phase
       long cputime = getCurrentThreadCPUTime();
       start(task); // test intervall
       cputime = getCurrentThreadCPUTime() - cputime;
       long waittime = (testtime * 1000000) - cputime;
       calculateOptimalThreadCount(cputime, waittime, targetUtilization);
   }

   private void calculateOptimalCapacity(BigDecimal targetQueueSizeBytes) {
       long mem = calculateMemoryUsage();
       BigDecimal queueCapacity = targetQueueSizeBytes.divide(new BigDecimal(
               mem), RoundingMode.HALF_UP);
       System.out.println("Target queue memory usage (bytes): "
               + targetQueueSizeBytes);
       System.out.println("createTask() produced "
               + creatTask().getClass().getName() + " which took " + mem
               + " bytes in a queue");
       System.out.println("Formula: " + targetQueueSizeBytes + " / " + mem);
       System.out.println("* Recommended queue capacity (bytes): "
               + queueCapacity);
   }

   /**
   * * Brian Goetz' optimal thread count formula, see 'Java Concurrency in *
   * Practice' (chapter 8.2) * * @param cpu * cpu time consumed by considered
   * task * @param wait * wait time of considered task * @param
   * targetUtilization * target utilization of the system
   */
   private void calculateOptimalThreadCount(long cpu, long wait,
           BigDecimal targetUtilization) {
       BigDecimal waitTime = new BigDecimal(wait);
       BigDecimal computeTime = new BigDecimal(cpu);
       BigDecimal numberOfCPU = new BigDecimal(Runtime.getRuntime()
               .availableProcessors());
       BigDecimal optimalthreadcount = numberOfCPU.multiply(targetUtilization)
               .multiply(
                       new BigDecimal(1).add(waitTime.divide(computeTime,
                               RoundingMode.HALF_UP)));
       System.out.println("Number of CPU: " + numberOfCPU);
       System.out.println("Target utilization: " + targetUtilization);
       System.out.println("Elapsed time (nanos): " + (testtime * 1000000));
       System.out.println("Compute time (nanos): " + cpu);
       System.out.println("Wait time (nanos): " + wait);
       System.out.println("Formula: " + numberOfCPU + " * "
               + targetUtilization + " * (1 + " + waitTime + " / "
               + computeTime + ")");
       System.out.println("* Optimal thread count: " + optimalthreadcount);
   }

   /**
   * * Runs the {@link Runnable} over a period defined in {@link #testtime}. *
   * Based on Heinz Kabbutz' ideas *
   * (http://www.javaspecialists.eu/archive/Issue124.html). * * @param task *
   * the runnable under investigation
   */
   public void start(Runnable task) {
       long start = 0;
       int runs = 0;
       do {
           if (++runs > 5) {

throw new IllegalStateException("Test not accurate");

}

expired = false;

start = System.currentTimeMillis();

Timer timer = new Timer();

timer.schedule(new TimerTask() {

public void run() {

expired = true;

}

}, testtime);

while (!expired) {

task.run();

}

start = System.currentTimeMillis() - start;

timer.cancel();

} while (Math.abs(start - testtime) > EPSYLON);

collectGarbage(3);

}

private void collectGarbage(int times) {

for (int i = 0; i < times; i++) {

System.gc();

try {

Thread.sleep(10);

} catch (InterruptedException e) {

Thread.currentThread().interrupt();

break;

}

   /**
   *
   * Calculates the memory usage of a single element in a work queue. Based on
   *
   * Heinz Kabbutz' ideas
   *
   * (http://www.javaspecialists.eu/archive/Issue029.html).
   *
   *
   *
   * @return memory usage of a single {@link Runnable} element in the thread
   *
   *         pools work queue
   */

public long calculateMemoryUsage() {

BlockingQueue queue = createWorkQueue();

for (int i = 0; i < SAMPLE_QUEUE_SIZE; i++) {

queue.add(creatTask());

}

long mem0 = Runtime.getRuntime().totalMemory()

- Runtime.getRuntime().freeMemory();

long mem1 = Runtime.getRuntime().totalMemory()

- Runtime.getRuntime().freeMemory();

queue = null;

collectGarbage(15);

mem0 = Runtime.getRuntime().totalMemory()

- Runtime.getRuntime().freeMemory();

queue = createWorkQueue();

for (int i = 0; i < SAMPLE_QUEUE_SIZE; i++) {

queue.add(creatTask());

}

collectGarbage(15);

mem1 = Runtime.getRuntime().totalMemory()

- Runtime.getRuntime().freeMemory();

return (mem1 - mem0) / SAMPLE_QUEUE_SIZE;

}

   /**
   *
   * Create your runnable task here.
   *
   *
   *
   * @return an instance of your runnable task under investigation
   */

protected abstract Runnable creatTask();

   /**
   *
   * Return an instance of the queue used in the thread pool.
   *
   *
   *
   * @return queue instance
   */

protected abstract BlockingQueue createWorkQueue();

   /**
   *
   * Calculate current cpu time. Various frameworks may be used here,
   *
   * depending on the operating system in use. (e.g.
   *
   * http://www.hyperic.com/products/sigar). The more accurate the CPU time
   *
   * measurement, the more accurate the results for thread count boundaries.
   *
   *
   *
   * @return current cpu time of current thread
   */

protected abstract long getCurrentThreadCPUTime();

}

然后自己继承这个抽象类并实现它的三个抽象方法，比如下面是我写的一个示例（任务是请求网络数据），其中我指定期望CPU利用率为1.0（即100%），任务队列总大小不超过100,000字节：

package pool_size_calculate;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.lang.management.ManagementFactory;
import java.math.BigDecimal;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedBlockingQueue;

public class SimplePoolSizeCaculatorImpl extends PoolSizeCalculator {

   @Override
   protected Runnable creatTask() {
       return new AsyncIOTask();
   }

   @Override
   protected BlockingQueue createWorkQueue() {
       return new LinkedBlockingQueue(1000);
   }

   @Override
   protected long getCurrentThreadCPUTime() {
       return ManagementFactory.getThreadMXBean().getCurrentThreadCpuTime();
   }

   public static void main(String[] args) {
       PoolSizeCalculator poolSizeCalculator = new SimplePoolSizeCaculatorImpl();
       poolSizeCalculator.calculateBoundaries(new BigDecimal(1.0),
               new BigDecimal(100000));
   }

}

/**
* 自定义的异步IO任务
*
* @author Will
*
*/
class AsyncIOTask implements Runnable {

   @Override
   public void run() {
       HttpURLConnection connection = null;
       BufferedReader reader = null;
       try {
           String getURL = "http://baidu.com";
           URL getUrl = new URL(getURL);

           connection = (HttpURLConnection) getUrl.openConnection();
           connection.connect();
           reader = new BufferedReader(new InputStreamReader(
                   connection.getInputStream()));

           String line;
           while ((line = reader.readLine()) != null) {
               // empty loop
           }
       }

catch (IOException e) {

       } finally {
           if (reader != null) {
               try {
                   reader.close();
               } catch (Exception e) {

               }
           }
           connection.disconnect();
       }

}

最佳线程数 6666 epoll触发量

猜你喜欢