Large cross-border e-commerce JVM tuning experience



Premise: A
large-scale cross-border e-commerce business is developing very fast, and online machine expansion is also frequent. However, for the operation of online machines, especially the situation of jvm memory, there has never been a unified standard for each application service. owner. After the 618 promotion, I discussed with the operation and maintenance classmates, hoping to standardize the jvm parameters of the online server, which can be given to each application in a unified way, improve the stability of the online server, and reduce the need for everyone to adjust. The time of the jvm parameter.
Referring to the experience of the company that worked in Taobao and Tmall before: After everyone's discussion, according to the jdk version and online machine configuration, a recommended default jvm template was determined:

The final recommended jvm template:
jdk version machine configuration recommended jvm parameter remarks
jdk1.7 6V8G -server -Xms4g -Xmx4g -Xmn2g -Xss768k -XX:PermSize=512m -XX:MaxPermSize=512m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:+DisableExplicitGC -XX:+ UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=68 -verbose:gc -XX:+PrintGCDetails -Xloggc:{CATALINA_BASE}/logs/gc.log -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath={CATALINA_BASE}/logs
jdk1.7 8V8G -server -Xms4g -Xmx4g -Xmn2g -Xss768k -XX:PermSize=512m -XX:MaxPermSize=512m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:+DisableExplicitGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=68 -verbose:gc -XX:+PrintGCDetails -Xloggc:{CATALINA_BASE}/logs/gc.log -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath={CATALINA_BASE}/logs 前台
jdk1.7 4V8G -server -Xms4g -Xmx4g -Xmn2g -Xss768k -XX:PermSize=512m -XX:MaxPermSize=512m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:+DisableExplicitGC -XX:+ UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=68 -verbose:gc -XX:+PrintGCDetails -Xloggc:{CATALINA_BASE}/logs/gc.log -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath={CATALINA_BASE}/logs
Frontend jdk1 .7 6V8G -server -Xms4g -Xmx4g -XX:MaxPermSize=512m \
-verbose:gc -XX:+PrintGCDetails -Xloggc{CATALINA_BASE}/logs/gc.log -XX:+PrintGCTimeStamps \


An Internet (bat) company in the background Recommended configuration:
 

 

Configuration instructions:
1. Heap settings
o -Xms: initial heap size
o -Xmx: maximum heap size
o -XX:NewSize=n: set young generation size
o -XX:NewRatio=n: set young generation and year The ratio of the old generation. For example, it is 3, which means that the ratio of young generation to old generation is 1:3, and the young generation accounts for 1/4 of the sum of the young generation and the old generation.
o -XX:SurvivorRatio=n: The ratio of the Eden area to the two Survivor areas in the young generation. Note that the Survivor area has two. For example: 3, it means Eden: Survivor=3:2, a Survivor area occupies 1/5 of the entire young generation
o -XX:MaxPermSize=n: Set the persistent generation size
2. Collector settings
o -XX:+UseSerialGC: Set the string Line collector
o -XX:+UseParallelGC: Set parallel collector
o -XX:+UseParalledlOldGC: Set parallel old generation collector
o -XX:+UseConcMarkSweepGC: Set concurrent collector
3. Garbage collection statistics
 -XX:+PrintGC
 -XX:+PrintGCDetails
 -XX:+PrintGCTimeStamps
 -Xloggc:filename
"
4. Parallel collector settings
 -XX:ParallelGCThreads=n: Set the number of CPUs used by the parallel collector for collection. The number of parallel collection threads.
 -XX:MaxGCPauseMillis=n : Set the maximum pause time for parallel collection
 -XX:GCTimeRatio=n: Set the percentage of garbage collection time in the program running time. The formula is 1/(1+n)
5. Concurrent collector settings
-XX:+CMSIncrementalMode: Set to increment Mode. Suitable for single CPU situation.
 -XX:ParallelGCThreads=n: Set the number of CPUs used when the young generation collection mode of the concurrent collector is parallel collection. Number of parallel collection threads.
(4)

Parameter explanation:

-Xms3072m -Xmx3072m
is set for the JVM heap, and the minimum and maximum values ​​are limited by -Xms -Xmx
-Xmn1024m to set the young generation size to 1024m
The entire JVM memory size = young generation size + old generation size + persistence Generation size (perm).

-Xss768k Sets the stack size per thread. After JDK5.0, the stack size of each thread is 1M, and the previous stack size of each thread is 256K. Adjust the memory size required by the threads of the application. In the same physical memory, reducing this value can generate more threads. However, the operating system still has a limit on the number of threads in a process, which cannot be generated indefinitely, and the experience value is around 3000~5000.

-XX:PermSize=512m -XX:MaxPermSize=512m
The persistent generation generally has a fixed size of 64m, so after increasing the young generation, the size of the old generation will be reduced. This value has a great impact on system performance, and Sun officially recommends setting it to 3/8 of the entire heap.
Set the initial value of non-heap memory, the default is 1/64 of physical memory; the maximum non-heap memory size is set by XX:MaxPermSize, the default is 1/4 of physical memory

-XX:+UseConcMarkSweepGC
CMS collector is also called short pause Concurrent collector. It is garbage collected on the old generation. The CMS collector performs garbage collection concurrently through multiple threads to minimize pauses caused by garbage collection. The CMS collector uses the same algorithm for young generation garbage collection as the Parallel collector. This garbage collector is suitable for applications that cannot tolerate long pauses and require fast responses.

-XX:+UseParNewGC uses multi-threaded parallel collection for the young generation, so the collection is faster;

-XX:+CMSClassUnloadingEnabled
If you enable CMSClassUnloadingEnabled, garbage collection will clean up the persistent generation, removing classes that are no longer used. This parameter is only useful if UseConcMarkSweepGC is also enabled.

-XX:+DisableExplicitGC prohibits System.gc(), lest the programmer mistakenly call the gc method to affect performance;

-XX:+UseCMSInitiatingOccupancyOnly
flag to instruct the JVM not to start the CMS garbage collection cycle based on the data collected at runtime. Instead, when this flag is turned on, the JVM does every CMS collection through the value of CMSInitiatingOccupancyFraction, not just the first time. However, keep in mind that in most cases, the JVM can make better garbage collection decisions than us. Therefore, this flag should only be used when we have a good reason (such as testing) and have a deep understanding of the life cycle of the objects produced by the application.

-XX:CMSInitiatingOccupancyFraction=68 The
default CMS is to start CMS collection when the tenured generation (old generation) occupies 68%. If your old generation is not growing so fast and you want to reduce the number of CMS, you can increase it appropriately This value;

-XX:+UseParNewGC: Use multi-threaded parallel recycling for the young generation, so the collection is faster;


-XX:HeapDumpPath
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Xloggc:/usr/aaa/dump/heap_trace.txt
The above parameters play Heap Dump information
   
"-XX:+HeapDumpOnOutOfMemoryError
This parameter can control the information of the heap printed during OutOfMemoryError. You


may have noticed that the cms method is recommended for garbage collection;
CMS is a collector with the goal of obtaining the shortest collection pause time, which can effectively reduce the number of servers Pause time;
The GC thread of CMS will have a relatively high CPU usage, but it still shows superior features on multi-core servers, and it is currently deployed on major domestic e-commerce websites. So it is highly recommended here!

cms Concept: The
CMS collector is also known as a short-pause concurrent collector. It performs garbage collection on the old generation. The CMS collector performs garbage collection concurrently through multiple threads to minimize the pauses caused by garbage collection. The algorithm used for garbage collection in the young generation is the same as that of the Parallel collector. This garbage collector is suitable for applications that cannot tolerate long pauses and require fast response. CMS adopts a variety of methods to minimize the GC pause time and reduce user program pauses. While the pause time is reduced, the CPU throughput is sacrificed. This is a trade-off between pause time and performance, which can be simply understood as "space (performance)" for time.

Adjustment rhythm:
Because of fear of affecting online applications, adjust The steps are divided into three steps: the
first step: partially affect a small number of machine pilots, compare the unadjusted machines, and observe the adjusted results; the
second step: adjust the parameters of some applications, carry out stress testing, and observe the effect after high concurrency stress testing ;
The third step: adjust the jvm parameters of some core applications, and use the 818 promotion to actually test the effect;
the current 818 promotion has been completed. Just make a summary.

One: long-term performance,
the first change: the number of fgc is reduced, It has been reduced by more than double; for the
mobile project, there are basically 1-2 vehicles per day before the adjustment, and basically once every 2-3 days after the adjustment:


 
online (another project): It can be clearly seen that the statistical frequency of fgc is much less;


 


the second change: the time reduction of


 

 



fgc It used to be nearly 500ms for one fgc, but now it only takes less than 100ms.
It also proves that the biggest benefit of cms is to reduce the pause time of fgc.

2: The time of stress test and big promotion performance
fgc is basically greatly shortened, the time of yanggc is longer, and the number of times does not change much;
data source: the test team's stress test summary

xxxx-online4.server.org
CMS

xxxx-online1.server.org
CMS

xxxx-online34.server.org
default garbage collector

Description

 

fullgc times

1

1

1

 

fullgc total time

343

250

1219

 

Default garbage collector /CMS fullgc time

3.55

4.88

 

The CMS fullgc time is significantly less than the default garbage collector time .

fullgc time point

2:48:36

3:14:36

5:30:36

 

CPU% usage during fullgc

40%

10%

16%

 

load Average at fullgc

1.19

0.49

1.21

 
         

Total number of younggc

1094

1098

1078

 

younggc total time

44093

44632

30387

 

younggc average time

40.30

40.65

28.19

 

younggc max time

1332

1268

928

 

CMS/ default garbage collector ( younggc total time )

1.45

1.47

 

CMS younggc time is longer than default garbage collector

CMS/ default garbage collector ( younggc average time )

1.43

1.44

 

CMS younggc time is longer than default garbage collector

CMS/ default garbage collector ( younggc max time )

1.44

1.37

 

CMS younggc time is worse than default garbage collector worst case

<!--EndFragment-->

三:关于哨兵上统计full gc的次数的解释,哨兵上
我们可以安全的说:
1. Full GC == Major GC指的是对老年代/永久代的stop the world的GC
2. Full GC的次数 = 老年代GC时 stop the world的次数
3. Full GC的时间 = 老年代GC时 stop the world的总时间
4. CMS 不等于Full GC,我们可以看到CMS分为多个阶段,只有stop the world的阶段被计算到了Full GC的次数和时间,而和业务线程并发的GC的次数和时间则不被认为是Full GC

Full GC的次数说的是stop the world的次数,所以一次CMS至少会让Full GC的次数+2,因为CMS Initial mark和remark都会stop the world,记做2次。而CMS可能失败再引发一次Full GC
如果CMS并发GC过程中出现了concurrent mode failure的话那么接下来就会做一次mark-sweep-compact的full GC,这个是完全stop-the-world的。

正是这个特征,使得CMS的每个并发GC周期总共会更新full GC计数器两次,initial mark与final re-mark各一次;如果出现concurrent mode failure,则接下来的full GC自己算一次。

四:遇到的几个问题:
问题一:堆栈溢出;
-Xss256k这个参数调整了,远涛反馈可能会影响trace的调用。 报如下错误:
Java.lang.StackOverflowError
at net.sf.jsqlparser.util.deparser.ExpressionDeParser.visitBinaryExpression(ExpressionDeParser.java:278)
at net.sf.jsqlparser.util.deparser.ExpressionDeParser.visit(ExpressionDeParser.java:246)
at net.sf.jsqlparser.expression.operators.conditional.OrExpression.accept(OrExpression.java:37)
at net.sf.jsqlparser.util.deparser.ExpressionDeParser.visitBinaryExpression(ExpressionDeParser.java:278)
at net.sf.jsqlparser.util.deparser.ExpressionDeParser.visit(ExpressionDeParser.java:246)
因为这个参数是设置每个线程的堆栈大小。JDK5.0以后每个线程堆栈大小为1M,以前每个线程堆栈大小为256K。在相同物理内存下,减小这个值能生成更多的线程。
所以今天去掉某台inventory机器的-Xss256k参数,看一下是不是这个导致的

问题二:初始化标记阶段耗时过长:
一般的建议是cms阶段两次STW的时间不超过200ms,如果是CMS Initial mark阶段导致的时间过长:
在初始化标记阶段(CMS Initial mark),为了最大限度地减少STW的时间开销,我们可以使用:
-XX:+CMSParallelInitialMarkEnabled
开启初始标记过程中的并行化,进一步提升初始化标记效率;
问题三:remark阶段stw的时间过长
如下图:



 
可以采用的方式是:
   在CMS GC前启动一次ygc,目的在于减少old gen对ygc gen的引用,降低remark时的开销-----一般CMS的GC耗时 80%都在remark阶段
-XX:+CMSScavengeBeforeRemark
jmap分析:
 

 

问题四:nio框架占用DirectMemory导致的OutOfMemoryError
处理方式:使用XX:+DisableExplicitGC
增加DirectMemory的大小;
1、DirectMemory不属于java堆内存、分配内存其实是调用操作系统的Os:malloc()函数。
2、容量可通过-XX:MaxDirectMemorySize指定,如果不指定,则默认与Java堆的最大值(-Xmx指定)一样。注意 ibm jvm默认Direct Memory与-Xmx无直接关系。
3、Direct Memory 内存的使用避免Java堆和Native堆中来回复制数据。从某些场景中提高性能。
4、直接ByteBuffer对象会自动清理本机缓冲区,但这个过程只能作为Java堆GC的一部分来执行,因此它们不会自动响应施加在本机堆上的压力。
5、GC仅在Java堆被填满,以至于无法为堆分配请求提供服务时发生,或者在Java应用程序中显示调用System.gc()函数来释放内存(一些NIO框架就是用这个方法释放占用的DirectMemory)。
6、该区域使用不合理,也是会引起OutOfMemoryError。
7、在需要频繁创建Buffer的场合,由于创建和销毁DirectBuffer的代价比较高昂,是不宜使用DirectBuffer的,但是如果能将DirectBuffer进行复用,那么 ,在读写频繁的情况下,它完全可以大幅改善性能。(对DirectBuffer的读写比普通Buffer快,但是对他的创建和销毁比普通Buffer慢)。







 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326261251&siteId=291194637