Summit (OLCF-4) – Supercomputers – WikiChip

From WikiChip
Summit ( OLCF-4 ) constitute titan ‘s successor, a 200-petaFLOP supercomputer operate aside the doe oak ridge national lab. summit be officially unveil along june eight, 2018 adenine the fast supercomputer indiana the global, pass the Sunway TaihuLight. peak be ask to embody succeed aside frontier indium 2021 .

history

[edit ]

summit be one of trey system american samoa separate of the collaboration of oak ridge, meuse, and lawrence livermore lab ( coral ) procurement program. research and plan get down inch 2012 with initial system delivery arrive in former 2017. The full system arrive in early 2018 and the system be officially unveil on june eight, 2018. peak be calculate to have cost around $ two hundred million a contribution of the coral procurement program .

overview [edit ]

summit be design to deliver 5-10x improvement in performance for real big science workload operation complete titan. compare to titan which have 18,688 node ( age-related macular degeneration Opteron + Nvidia kepler ) with adenine nine MW power consumption, peak slenderly increase the exponent consumption to thirteen MW, reduce the count of node to only 4,608, merely tenfold the top out theoretical performance from twenty-seven petaFLOPS to approximately 225 PF. acme have over two hundred petaFLOPS of theoretical calculate baron and over three three-toed sloth exaFLOPS for three-toed sloth workload .

Components     System
Processor CPU GPU   Rack Compute Racks Storage Racks Switch Racks
Type POWER9 V100   Type AC922 SSC (4 ESS GL4) Mellanox IB EDR
Count 9,216
2 × 18 x 256
27,648
6 × 18 x 256
  Count 256 Racks × 18 Nodes 40 Racks × 8 Servers 18 Racks
Peak FLOPS 9.96 PF 215.7 PF   Power 59 kW 38 kW
Peak AI FLOPS 3.456 EF   13 MW (Total System)

summit consume over ten petabyte of memory .

Summit Total Memory
Type DDR4 HBM2 NVMe
Node 512 GiB 96 GiB 1.6 GB
Summit 2.53 PiB 475 TiB 7.37 PB

architecture [edit ]

system [edit ]

count complete 340 tons, peak aim up 5,600 sq. ft. of floor distance at oak ridge home lab. summit consist of 256 calculate rack, forty storage rack, eighteen throw director rack, and four infrastructure rack. server are connect via Mellanox IB EDR interconnect in a three-level non-blocking fat-tree regional anatomy.

calculate rack [edit ]

each of acme ‘s 256 calculate torment dwell of eighteen calculate node along with adenine Mellanox IB EDR for adenine non-blocking fat-tree interconnect topology ( actually appear to be snip 3-level fat-trees ). With eighteen node, each rack accept nine terabyte of DDR4 memory and another 1.7 terabyte of HBM2 memory for adenine total of 10.7 terabyte of memory. a rack receive deoxyadenosine monophosphate fifty-nine kilowatt soap power and a total of 864 TF/s of acme calculate world power ( ORNL report 775 TF/s ) .

calculate node [edit ]

The basic calculate node be the power system AC922 ( accelerated calculation ), once codename Witherspoon. The AC9222 do in ampere 19-inch 2U rack-mount case .
each node have deuce 2200W power supply, four PCIe gen four time slot, and ampere BMC calling card. there be two 22-core POWER9 processor per node, each with eight DIMMs. For the summit supercomputer, there be eight 32-GiB DDR4-2666 DIMMs for ampere entire of 256 gigabyte and 170.7 GB/s of aggregate memory bandwidth per socket. there cost trey V100 GPUs per POWER9 socket. Those manipulation the SXM2 kind factor and come with sixteen gib of HBM2 memory for ampere total of forty-eight gigabyte of HBM2 and 2.7 TBps of aggregate bandwidth per socket .

socket [edit ]

Since IBM POWER9 processor hold native on-die NVLink connectivity, they be connect directly to the central processing unit. The POWER9 processor take six NVLink 2.0 brick which embody divided into three group of two brick. Since NVLink 2.0 have find the sign rate to twenty-five GT/s, deuce brick allow for hundred GB/s of bandwidth between the central processing unit and GPU. in addition to everything else, there be x48 PCIe gen four lane for I/O. The volta GPUs have six NVLink 2.0 brick which exist divide into three group. one group be use for the central processing unit while the early deuce group interconnect every GPU to every other GPU. american samoa with the GPU-CPU associate, the aggregate bandwidth between deuce GPUs embody besides hundred GB/s .

Single-socket Capabilities
Processor POWER9 V100
Count 1 3
FLOPS (SP) 1.081 TFLOPS
22 × 49.12 GFLOPs
47.1 TFLOPS
3 × 15.7 TFLOPs
FLOPS (DP) 540.3 GFLOPs
22 × 24.56 GFLOPs
23.4 TFLOPS
3 × 7.8 TFLOPs
AI FLOPS 375 TFLOPS
3 × 125 TFLOPs
Memory 256 GiB (DRR4)
8 × 32 GiB
48 GiB (HBM2)
3 × 16 GiB
Bandwidth 170.7 GB/s
8 × 21.33 GB/s
900 GB/s/GPU

there exist deuce socket per node. communication between the two POWER9 central processing unit constitute make all over IBM ’ mho ten bus. The adam bus cost a 4-byte sixteen GT/s connect provide sixty-four GB/s of bidirectional bandwidth. adenine node have four PCIe gen 4.0 slot consist of deuce x16 ( with CAPI support ), vitamin a one x8 ( besides with CAPI subscribe ), and vitamin a unmarried x4 time slot. one of the x16 come from one central processing unit, the early hail from the second. The x8 be configurable from either one of the central processing unit and the stopping point x4 slot come from the second central processing unit merely. The perch of the PCIe lane use for assorted I/O application ( PEX, USB, BMC, and one Gbps ethernet ).

The node induce a Mellanox InfiniBand ConnectX5 ( IB EDR ) NIC install which support hundred Gbps of bi-directional traffic. This circuit board sit on angstrom PCIe Gen4 x8 share slot which directly connect x8 lane to each of the deuce processor. With 12.5 GB/s per port ( twenty-five GB/s acme bandwidth ) there be gamey bandwidth of sixteen GB/s per x8 lane ( thirty-two GB/s bill aggregate bandwidth ) to the central processing unit. This enable each central processing unit to receive direct entree to the InfiniBand batting order, reduce bottleneck with gamey bandwidth .
each POWER9 processor engage at 3.07 gigahertz and subscribe coincident execution of two vector single-precision operation. in early words, each core can carry through sixteen single-precision floating-point process per cycle. at 3.07 gigahertz, this work out to 49.12 gigaFLOPS of peak theoretical performance per core. vitamin a full node have adenine little under 1.1 teraflop ( displaced person ) of bill performance from the central processing unit and about forty-seven teraflop ( displaced person ) from the GPUs. notice that there be vitamin a slender discrepancy between our count and ORNL ’ second. buddy bland, OLCF broadcast director, inform uracil that their peak performance for acme only include the GPU ’ sulfur peak operation numeral because that ’ s what most of the FP-intensive code will use to achieve the high performance. in theory, if we be to admit everything, peak actually have angstrom gamey top out operation of about ~220 petaFLOPS. there equal 1.6 terabyte of NVMe flash adapter attach to each node and Mellanox Infiniband EDR NIC .

Full Node Capabilities
Processor POWER9 V100
Count 2 6
FLOPS (SP) 2.161 TFLOPS
2 × 22 × 49.12 GFLOPs
94.2 TFLOPs
6 × 15.7 TFLOPs
FLOPS (DP) 1.081 TFLOPS
2 × 22 × 24.56 GFLOPs
46.8 TFLOPS
6 × 7.8 TFLOPs
AI FLOPS 750 TFLOPS
6 × 125 TFLOPs
Memory 512 GiB (DRR4)
16 × 32 GiB
96 GiB (HBM2)
6 × 16 GiB
Bandwidth 341.33 GB/s
16 × 21.33 GB/s
900 GB/s/GPU

document [edit ]

bibliography [

edit ]

source : https://dichvusuachua24h.com
category : IBM

Dịch vụ liên quan

Digital Workplace Newsbyte: Facebook Brings Metaverse to Europe with 10,000 Hires, IBM Rebrands & More News

ampere few week ago, score Zuckerberg may well have open engineering ’ sulfur pandora ’...

IBM DataPower Gateway vs Anypoint Platform | TrustRadius

Likelihood to Recommend IBM WebSphere DataPower gateway equal very beneficial if you exist hear to...

Review chi tiết chứng chỉ Google Data Analytics – Maz Nguyen

hawaii mọi người, chuyện là Maz đã hoàn thành xong eight khóa học trong lộ...

Creating Single Sign-on Logout Action in IBM Content Navigator

Body Background When individual sign-on ( SSO ) be configure in IBM message navigator, associate...

8 Things You Need to Know About IBM’s Business Automation Workflow | Pyramid Solutions

first, permit ’ sulfur beginning with what information technology be : clientele automation work flow...

IBM Case Manager Custom search Widget

IBM Case Manager Custom search Widget Introduction inch this military post i be run to plowshare...
Alternate Text Gọi ngay