JavaScript V8 engine V8 Lite Mode renovation

Late last year, V8 team started a project called V8 Lite is designed to significantly reduce the V8's memory usage. Initially, the team is ready to V8 Lite as an independent mode of V8 specifically for mobile devices with low memory embedded devices, because these devices are more concerned about is to reduce memory usage rather than execution speed.

In the course of this research and development project, the development team found that memory optimization done specifically for the Lite mode can in fact be migrated to the original V8, direct two flowering. V8 team recently published an article on the detailed process of sharing in some of the key part of the optimization process will build on the existing V8 Lite brought the V8, V8 and the impact on the performance of the actual work load. Below a brief overview.

Lite Mode

After analyzing the V8 how to use the memory as well as a large object types which accounted V8 heap size ratio, V8 team found that a large part devoted to the V8 heap of JavaScript execution is unnecessary objects, such as JavaScript execution and optimization for handle exceptions. Specifically, such optimization; code for determining how to optimize the feedback type code; redundant metadata binding between C ++ and the like JavaScript object.

So from this point to start the team, we would like to increase the memory by using memory allocation significantly reduce these optional objects. At the same time the team made the V8 Lite Mode.

It can be applied directly by configuring existing V8 Lite Mode set some optimizations, such as disabling the V8 TurboFan optimizing compiler, but would like to support other existing V8 Lite Mode optimization requires more consideration.

Such as executing code, V8 Ignition collected in the interpreter is transmitted to various related operations (e.g., +and a o.foo) operand type of feedback to customize these types of subsequent optimization. This information is stored in the feedback vector, which accounts for a large part to the V8 heap memory usage. Lite Mode is not optimized code, so you can not directly assign these feedback vectors, but the V8 interpreter caching infrastructure inline desired feedback vector can be used, and therefore requires considerable reconstruction Lite Mode to support the implementation of this without feedback.

Lite Mode launch V8 v7.3 compared with V8 v7.1, by disabling code optimization, and does not allocate the feedback vector bytecode execution aging rarely executed, the page heap size typically 22%.

At the same time in the process, the team also found that Lite Mode can be achieved by making the most of the memory-saving V8 more "Lazy", without affecting performance.

Lazy feedback allocation 

Completely disable feedback vector distribution will not only prevent the TurboFan V8 compiler to optimize the code also prevents V8 to inline cache for common operations, such as object attributes Ignition interpreter loaded. Consequently, this will lead to the execution time V8 significant rollback, page load time decreased by 12% typical scenario V8 interactive Web pages using CPU time increased by 120%.

For without these fallback situation will save most of the memory to the conventional V8, the development team designed to perform a function after a certain amount of bytecode (currently 1KB), delayed allocation feedback vector. Since most functions do not often performed, thus avoiding the use of feedback vector distribution in most cases, but quickly assign them to avoid performance rollback, and still allow the optimized code where needed.

Using this method will produce another problem. Feedback vector will form a tree structure, a feedback function is retained inside the vector feedback vector is a function of the external entry. This is necessary so that the newly created function closure same feedback vector array created for a function to receive all other closures. In the case where the delay distribution feedback vectors, the vector can not be formed using feedback tree, because there is no guarantee on the external function is assigned to be allocated in the vector prior to its internal feedback function.

The following diagram, in order to solve this problem, the team created a new ClosureFeedbackCellArray to maintain this tree, when a function becomes Hot ClosureFeedbackCellArray it with a full FeedbackVector exchange swap.

Experiments show desktop delayed performance feedback does not appear rollback, and in the mobile terminal, due to the reduction in garbage collection, performance has improved, so all V8 versions have enabled the feedback delay distribution.

Lazy source positions

When compiling the bytecode JavaScript, it generates source position table, the binding position of the character sequence of bytecodes JavaScript source code. However, only represents an exception or developer to perform tasks (such as the debugger) when you need this information, and therefore rarely used.

To avoid this waste, now not collected, compiled bytecode source position (assuming no additional debugger or analyzer). Only when the collecting source location stack trace actually generated, e.g. Error.stack call or exception stack trace when printing to the console. It takes some costs, because the need to re-generate the source location analysis and compiled functions, but most sites will not be symbolic of the stack trace in production, and therefore will not see any performance impact.

Bytecode flushing

JavaScript compiled from the source byte code, including metadata related to occupy a large amount of heap space V8, typically about 15%. But there are many function only during initialization execution, or rarely used after compilation. So, V8 for the case not recently performed during garbage collection added support refresh the compiled byte code from a function function.

The specific mechanism, the tracking function bytecode age, each primary (mark-compact) garbage collection age incremented and reset it to zero when the function is executed. Anything over aging threshold bytecode can be recycled for the next garbage collection, if it is recovered and then perform again, it will be recompiled.

Difficulties such a design is to ensure that bytecode can only be refreshed in case no longer needed. For example, if function A calls function B another long-running A function may still be in it at the time of aging on the stack. At this time we do not want to refresh the byte-code function A, even if it reaches the threshold of aging, because it still needs to be returned when the function B long-running return.

解决这个问题的方法是,当字节码达到其老化阈值时,将字节码视为弱保持状态(weakly held),但是在堆栈或其它地方对字节码的任何引用将强烈保持(strongly held),并且只在没有强链接的情况下才可以刷新代码。

除了刷新字节码,同时还刷新与这些刷新函数相关的反馈向量。但是,无法在与字节码相同的 GC 周期内刷新反馈向量,因为它们不会被同一对象保留。字节码由本地上下文独立的 SharedFunctionInfo 保留,而反馈向量由本地上下文相关的 JSFunction 保留。因此,会在随后的 GC 循环中刷新反馈向量。

Additional optimizations

此外开发团队还通过减小 FunctionTemplateInfo 对象的大小和对 TurboFan 优化代码进行去优化等方面的优化减少了内存使用。

Results

目前已经在 V8 的最新七个版本中发布了上述优化,通常它们会先应用在 Lite Mode 中,然后再进入 V8 的默认配置。

经过测试,在一系列典型网站上将 V8 堆大小平均减少了 18%,相当于低端 AndroidGo 移动设备平均减少 1.5 MB。

通过禁用功能优化,Lite Mode 可以以一定的开销进一步为 JavaScript 执行吞吐量提供内存节省。平均而言,Lite Mode 可节省 22% 的内存,有些页面可节省高达 32% 的内存。这相当于 AndroidGo 设备上 V8 堆大小减少了 1.8 MB。

具体分析每一种优化技术带来的影响,结果如下:

完整优化介绍查看原博客:

https://v8.dev/blog/v8-lite

Guess you like

Origin www.oschina.net/news/110032/a-lighter-v8
v8