[Posts] Tai light Secret

Secret Taihu light

Based on DEC's 26010 processor product 

technology from Alpha DEC   21164 processor. 

HTTP: // server.zol.com.cn/592/5926903_all.html#p5934510

 

1 supercomputing world arena: TOP500 what is?

 

  In 1946, customized by the US military computer "ENIAC" available, then the "big guy" can perform addition or 5000 times per second, 400 multiplications. After 70 years of development, supercomputing running speed has reached one hundred million billion times / second level. In June of this year, ISC 2016, TOP500 announced the new world champion: divinity Taihu light. Some people can not help but ask, what exactly this supercomputing capacity of more than Tianhe-2? Domestic processors Shen Wei 26010, is a kind of secret weapon?

TaihuLight
Divinity Taihu Light

Supercomputer world arena: TOP500

  Since 1993, the International TOP500 organization every year in accordance with the Linpack test performance before the release of the world's 500 supercomputer deployed, there will be twice yearly rankings, aimed at promoting exchanges and cooperation in the field of supercomputing, promotion and application. Since the year 2002, China HPC (High Performance Computing) has not been declared to the International Linpack results, and therefore not included in the TOP500. Later, with the relevant institutions opened up to the test, China began to emerge in the international supercomputing market, quickly became a regular top 10.

  Linpack was first used in April 1974, it is currently the most popular benchmark for testing high-performance computer systems floating point performance, by way of solving N Supercomputer yuan a dense linear algebraic equations to assess. Linpack tests include Linpack100, Linpack1000 and HPL. Which, HPL is called highly parallel computing benchmark, a test mode for parallel computers made of modern, wider range of applications.

  Peak computing is an important measure of computer performance, so-called floating-point calculations peak divided into theoretical and actual values, the former computer is theoretically able to complete the maximum number of floating point calculations per second, it is mainly determined by the CPU clock speed, the latter is the Linpack test value, namely running Linpack test program on the computer, optimum adjustment through a series of test results obtained.

  47th International Supercomputer Conference in Frankfurt, Germany, the English string with Chinese characteristics "TaiHuLight" has become the focus of audience, its official name is "divinity Taihu light." The Linpack supercomputing at peak capacity 93Petaflops pressure Tianhe-2, leading the TOP500, together with its well-known, as well as domestic processors Shen Wei 26010. It is worth mentioning that the number of the list of HPC China (167 units) for the first time surpassed the United States (165 units).

TaihuLight
Taihu light TOP500 summit

  In addition to the top two Taihu light Tianhe-2, ultra-count to ten in the fourth were the Titans, Sequoia Sequoia, Japan K Beijing, Mira, Trinity, Piz Daint, Hazel Hen and Shaheen II. According to the official statement, China is following the United States, Japan, third countries using the main CPU built 1000000000000000 level supercomputers in the world.

  As a former six-time winner of the TOP500, there is also necessary to mention Tianhe-2, which is developed by the National Defense Science, settled in Guangzhou Super Computer Center, powered by Intel Xeon processors running the Linux-based Kirin systems, innovative use of the heterogeneous integration architecture. Milky II node calculation consumes about 18 MW, the overall energy consumption of the cooling system together with less than 20 megawatts.

2 Taihu light the first show to win software and hardware to achieve localization

 

Acquaintance "Taihu Light"

  In fact, this year's ISC is not the first show divinity family, in 2011 there was a divinity Blu-ray system, the National Supercomputing Center in Jinan is installed, then ranked No. 14 on the TOP500, used by Shen Wei is the third SW1600 on behalf of the 16-core chip. Five years later, with the support of 863 of China's National Computer Engineering parallel lines on the National Research Center (NRCPC) introduced a more powerful processor Shen Wei SW26010 help Taihu light to win the championship.

  Tai Shen Wei 26010 equipped with light many-core processor running at more than one billion billion times / sec, the peak performance of 1.254 billion billion times / sec sustained performance up to 930 million billion times / second, power consumption ratio reached 6.051 billion times per watt operations. Compared with the Tianhe-2, continue to enhance the computing speed of nearly 3 times lower power consumption test (15371KW, Tianhe-2 is 17808KW). In the Linpack test, the Taihu light four hours to complete computing tasks Tianhe-2 more than 20 hours to complete.

TaihuLight
Shen Wei 26010 processor, the motherboard is a two-node (images from Jack Dongarra)

  Overall efficiency Taihu light reached 74.16 percent, compared to 65.19% Titan, Tianhe-2 was 55.83%, which in stronger performance, the greater the size of the case is not easy. Terms of performance per watt, performance Taihu light of 6G / W, Titan is 2.143G / W, Tianhe-2 is 1.95G / W. In addition, in light of Taihu Lake Green500 also among the top three, taking into account the top two Supercomputer is powered by low-power Intel E5, so this achievement worthy of recognition.

  Overall, the framework should be light Taihu follows the blue MPP (massively parallel processing) distributed approach, more in line with traditional HPC applications more efficient. Tai light generated by the computer cabinets 40, each cabinet has four SuperNode (256-node), a total of 40,960 nodes, each with a single CPU core 260, a two-node system board design, each CPU cured the onboard memory is 32GB DDR3-2133.

TaihuLight
Tai light cabinets (images from Jack Dongarra)

TaihuLight
Embedded in the motherboard may be four pairs of nodes, each of the positive and negative two (images from Jack Dongarra)

TaihuLight
Supernode Supernode (images from Jack Dongarra)

  此外,太湖之光运行的是基于Linux的Sunwei Raise OS 2.0.5操作系统,配有兼容众核的编译器,支持Fortran、C/C++、OpenACC 2.0等语言,以及神威OpenACC编译工具。互联方面,其选择了PCI-E 3.0物理链路,软件协议是自主的Sunway Network。在PCI-E嵌入的交换芯片会被当作虚拟网卡使用,这就使得各节点有了独立的IP。

TaihuLight
太湖之光软件堆栈(图片来自Jack Dongarra)

TaihuLight
太湖之光互联架构(图片来自Jack Dongarra)

TaihuLight
太湖之光整体布局(图片来自Jack Dongarra)

  至于机房摆放,太湖之光采用了两侧各20个计算机柜和存储机柜、中间单列网络系统机柜的布局,占地面积605平方米。

 

3申威26010扬名 摆脱国外技术依赖

国产申威26010的秘密武器

  除了太湖之光夺冠,其实更令人兴奋的是其采用了国产处理器申威26010,可以说有着一定的历史意义。2015年4月,美国商务部发布公告,决定禁止英特尔向四家国家超级计算机中心出售Xeon Phi处理器。而在此之前,曾经的TOP500冠军天河二号采用的就是Xeon系列处理器。这意味着,天河二号将无法继续使用英特尔提供的用于升级系统的新款芯片。

  然而,申威26010的出现击碎了外界对于国产化的质疑。与此同时,这款处理器也带来了一个“新名词”:众核。超算界早已对以GPU、众核为代表的异构计算持开明态度,GPU应用的场景越来越多,在算法上也有了更多的支持。从长远来看,异构集群对超算的重要性会加大,在确保灵活性和软件兼容性的前提下,追求更高的性能和更低的功耗。

  一直以来,HPC的发展离不开军用和科研,太湖之光也不例外。事实上,申威在业内早有耳闻,但为什么外界鲜有人知呢?主要原因或许就是军方背景。申威系列芯片的研发单位是江南计算机所(即总参某部56所),而申威26010就是在国家高性能集成电路(上海)设计中心生产,被部署于无锡国家超级计算中心。总参某部56所创建于1951年6月,位于无锡。

  申威最初的技术来源是DEC公司开发的Alpha 21164,后者在1995面世,采用0.5um制造工艺,主频为200MHz。不过,随着技术研发的深耕,江南所拓展出了自主的申威-64指令集,摆脱了Alpha的影子。

  申威26010采用了“CPU+加速器”的方案(管理核心+运算核心),为64位RISC(主频1.45GHz),拥有260个处理核心和4个内存控制器。处理器内包括四个核心组,每组有65个内核,由8×8 Mesh架构计算集群(CPE)、一个管理单元(MPE)、一个内存控制器(MC)组成。其中,MPE和MC也可以被当作独立的处理核心,前者负责系统管理和通讯,后者则用于浮点运算,单个内存(128bit的DDR3)带宽为34GB/s,因此整个处理器提供了136.5GB/s的带宽。

TaihuLight
申威26010核心组结构(图片来自Jack Dongarra)

  申威26010支持264位的矢量指令集,内置各32KB的L1指令缓存和数据缓存,以及256KB L2缓存,没有L3缓存。对于CPE来说,单条处理管线使得每个主频周期可进行8次浮点运算,浮点性能为11.6GFLOPS,而MPE则约为CPE的两倍。

TaihuLight
申威26010节点基础设计(图片来自Jack Dongarra)

  此外,申威26010可能并非采用NUMA(非统一内存访问架构)架构,这使得处理器组内之间的内容共享成为可能,在硬件方面没有缓存的一致性需求,由软件负责同步。相比之下,英特尔Kight Landing则是将缓存一致性(Cache Coherence)都交给硬件。从性能来看,申威26010的双精浮点峰值为3.06TFlops,与Kight Landing处在同一水平线。

  不过,作为完全自主的国产处理器,申威26010也面临着一些问题。首先就是制造工艺,有人猜测28nm,尽管并不是官方说法,但相较英特尔的14nm还是有些落后。其次,太湖之光的HPCG(High Performance Conjugate Gradients)成绩也一般,峰值效率为0.3%,低于天河二号的1.1%。

TaihuLight
太湖之光的HPCG成绩不理想(图片来自Jack Dongarra)

  对于HPCG测试,可能是内存和互联宽带拖了后腿。前面提到过,申威26010采用的是DDR3,而英特尔Kight Landing已经用上六通道DDR4,Xeon Phi的内存带宽达到了512GB。虽然太湖之光在Linpack上大幅领先,但在HPC的适用性方面就会有些下降。总的来说,申威26010在计算能力上的优势有目共睹,不过由于更偏向军用,因此部分功能经过了特殊调校,应用范围有一定的局限。

 

4太湖之光应用贡献大 照亮中国超算

 

超算之路不平坦 太湖之光只是开始

  从天河系列的70%国产化,到神威蓝光的85%以上,再到如今完全自主、耗时三年研制的神威太湖之光,中国超算在美国芯片禁运的“倒逼”下,已经跨出了历史性的一步。值得一提的是,基于太湖之光系统的三项全机应用还入围了有超算界诺贝尔之称的“戈登贝尔奖”。该奖项自1987年设立以来,中国团队从未入围过。

  在国家863计划的支持下,作为“国之重器”的超级计算机在工业制造、航天、军事、医学、科研等领域将发挥更大的作用,并且会助推深度学习、人工智能的发展。未来,太湖之光将在四个方向发挥作用:全球高分辨率模拟,为气候变化研究提供量化研究的基础;先进制造,助力“中国制造”转向“中国创造”;生命科学,为研发新药和探索生命奥秘提供支撑;大数据分析。

  举例来说,国家超级计算无锡中心与清华大学、北京师范大学合作,在太湖之光上进行了CAM全球大气模式的重构与优化,以及全球超高分辨率大气模式实验框架。其中,大气模式实验框架已初步实现了3公里精度,仅次于日本NICAM 870米的分辨率。清华大学计算机科学与技术系副教授薛巍表示:“有了这套计算机系统,我们可以在30天内完成未来100年的地球气候模拟,全面提升我国应对极端气候事件和自然灾害时的减灾防灾能力。”

Secret Taihu Light: How pure domestic roost TOP500?
大气模式实验框架(图片来自国家超级计算无锡中心)

  借助太湖之光,国家计算流体力学实验室对“天宫一号”返回路径进行了数值模拟计算,将为其返回提供精确预测;上海药物所开展的药物筛选和疾病机理研究,两周内就完成了原本需要10个月的计算,加速了白血病、癌症、禽流感等疾病的药物设计进度;此外,太湖之光还将在“高分辨率海浪数值模拟”和“钛合金微结构演化相场模拟”方面做出巨大贡献。截至目前,国家超级计算无锡中心已经与北京大学、中科院软件所、中船重工702所、远景能源、清华大学、国家计算流体力学实验室等国内30多家机构或单位建立了应用合作关系。

  除了国家级研究机构的贡献,以联想、曙光等为代表的中国企业也在超算领域有着很好的表现。最新一期TOP500中,联想就以92套获得了全球超算份额第二、中国第一的成绩。未来,超算将朝着高性能、低功耗的方向继续拓展,芯片设计、任务分配、算法优化、应用范围、散热系统等依然是努力的重点。

  Of course, at the same time of joy, Chinese supercomputing hardware and supporting practical application there is still much room for improvement, achievement Taihu light rosy of course, but the number of applications can not be compared with Tianhe-2. Software, Taihu light problems with a specific field can, after all, are specifically optimized for the processor Shen Wei, but to be involved in commercial or other areas, you must consider compatibility. At present, the domestic Supercomputer R & D funding for the development of application software accounted for less than 10%, while the United States more than 30%. If you can not integrate into commercial use, it is actually a waste of resources.

  Since 1983, the "Milky Way One" was born, Chinese Supercomputer from a blank, the global leader in independent research and development, which is worthy of praise tremendous progress, which is the only way to become a technological power. And along the way, the Voice of challenge has never stopped, but no matter what, every technological innovation will have a learning process, not afraid of the late start, detours, and more difficult, with completely customize technical support, Taihu "Light "China will make supercomputing more shine.

(Note: This article is part of the technical content from the English know almost academic report users Sean, yuan zhao, lookout think tanks, as well as the University of Tennessee, Oak Ridge National Laboratory, Dr. Jack Dongarra published, Tech Report UT-EECS-16-742)

 

5 Annex: Tai more information on the optical and TOP500

 

Schedule :( contents of the report from Jack Dongarra)

Light prowess Tai system parameters:

TaihuLight

Taihu light, Tianhe-2, Titan comparison:

TaihuLight

Six Supercomputer comparison:

TaihuLight

Taihu light and Intel KNC, KNL comparison:

TaihuLight

The latest one of the top 10 TOP 500 Supercomputer:

TaihuLight

Guess you like

Origin www.cnblogs.com/jinanxiaolaohu/p/10990616.html