主页 > 互联网 > 内容页

倚天710性能监控—DDR PMU子系统

2023-05-30 15:09:33 来源:龙蜥社区

1. 倚天710的DDR5子系统

倚天710支持支持最先进的DDR5 DRAM,为云计算和HPC提供巨大的内存带宽。倚天710有8 DDR5通道(channel),每个Die上有4个。每个通道相互独立地服务系统的内存请求,分别支持用于1DPC(DIMM Per Channel)的DDR5-4400和2DPC的DDR5-4000。

1.2 DDR5 Architecture

DDR5的一个主要变化是新的DIMM通道结构(Fig 2中Channel Architecture)。DDR4 DIMM的总线位宽为72比特,由64比特数据位和8比特ECC位组成。DDR5的每个DIMM有两个独立的子通道。两个通道中的总线位宽都为40比特:32比特的数据位和8比特的ECC位。尽管DDR4和DDR5的数据位宽相同(总共64比特),但两个独立通道可以提高内存访问效率并减少延迟。单通道单次任务只能读或写,双通道的DDR5则读写可以同时进行。

1.2 DDR5 理论带宽

倚天2DPC的DDR5-4000的理论带宽为:


(资料图片仅供参考)

4000MHz *32bit / 8 *8 *2 = 128 *10^9 *2 bytes = 128GB/s *2= 256 GB/s内存等效频率(4000MHz)_ 子通道位宽(32 bit)/ 8 _ 子通道数(8)* Die (2)

注意GB和GiB的不同:

1 GB = 1000000000 bytes (= 1000^3 B = 10^9 B)1 GiB = 1073741824 bytes (= 1024^3 B = 2^30 B).

2. 倚天710 DDRSS PMU

倚天710的DDRSS为每个子通道都实现了独立的PMU,用于性能和功能调试,每个子通道的PMU包含16个通用计数器。

带宽计算公式为:

DRAM ReadBandwidth = perf_hif_rd *DDRC_WIDTH *DDRC_Freq / DDRC_CycleDRAM Write Bandwidth = (perf_hif_wr + perf_hif_rmw) *DDRC_WIDTH *DDRC_Freq / DDRC_CycleDDRC_WIDTH: Units of 64 bytes

3. Cloud-kernel对DDRSS PMU的支持

#lscpuArchitecture:          aarch64Byte Order:            Little EndianCPU(s):                128On-line CPU(s) list:   0-127Thread(s) per core:    1Core(s) per socket:    128Socket(s):             1NUMA node(s):          2...

测试环境为1个Socket,2个Die,包含两个NUMA node。

#numactl -Havailable: 2 nodes (0-1)node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63node 0 size: 257416 MBnode 0 free: 187991 MBnode 1 cpus: 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127node 1 size: 257014 MBnode 1 free: 194504 MBnode distances:node   0   1  0:  10  15  1:  15  10

每个NUMA node有 256 GB内存。

#dmidecode|grep -P -A5 "Memorys+Device"|grep Size|grep -v Range        Size: 32 GB        Size: 32 GB        Size: 32 GB        Size: 32 GB        Size: 32 GB        Size: 32 GB        Size: 32 GB        Size: 32 GB        Size: 32 GB        Size: 32 GB        Size: 32 GB        Size: 32 GB        Size: 32 GB        Size: 32 GB        Size: 32 GB        Size: 32 GB        Size: No Module Installed ...#dmidecode -t memory | grep Speed:        Speed: 4000 MHz        Configured Clock Speed: 4000 MHz

2DPC,共插了16根DIMM,每个Die8根DIMM,有效频率为 4000MHz。

#ls /sys/bus/event_source/devices/ | grep drwali_drw_21000ali_drw_21080ali_drw_23000ali_drw_23080ali_drw_25000ali_drw_25080ali_drw_27000ali_drw_27080ali_drw_40021000ali_drw_40021080ali_drw_40023000ali_drw_40023080ali_drw_40025000ali_drw_40025080ali_drw_40027000ali_drw_40027080

2DPC满插时一共16个PMU设备,其中ali_drw_21000ali_drw_21080为Die 0上同一个DIMM的两个子通道,ali_drw_2X000为Die 0的PMU设备,ali_drw_4002X000为Die 1的PMU设备。

4. DDR 带宽准确性验证

4.1 TL;DR

带宽单位:MB/s

可以看到,DDR PMU的带宽统计误差不超过 1%。测试原理,请阅读《倚天710性能监控 —— CMN Flit Traffic Trace with Watchpoint Event》。

4.2 C0M0 rd

# First, run bw_mem as backgroud workload# numactl --cpubind=0 --membind=0 ./bw_mem 40960M rd# Then run perf command in another consoleperf stat   -e ali_drw_21000/perf_hif_wr/   -e ali_drw_21000/perf_hif_rd/   -e ali_drw_21000/perf_hif_rmw/   -e ali_drw_21000/perf_cycle/   -e ali_drw_21080/perf_hif_wr/   -e ali_drw_21080/perf_hif_rd/   -e ali_drw_21080/perf_hif_rmw/   -e ali_drw_21080/perf_cycle/   -e ali_drw_23000/perf_hif_wr/   -e ali_drw_23000/perf_hif_rd/   -e ali_drw_23000/perf_hif_rmw/   -e ali_drw_23000/perf_cycle/   -e ali_drw_23080/perf_hif_wr/   -e ali_drw_23080/perf_hif_rd/   -e ali_drw_23080/perf_hif_rmw/   -e ali_drw_23080/perf_cycle/   -e ali_drw_25000/perf_hif_wr/   -e ali_drw_25000/perf_hif_rd/   -e ali_drw_25000/perf_hif_rmw/   -e ali_drw_25000/perf_cycle/   -e ali_drw_25080/perf_hif_wr/   -e ali_drw_25080/perf_hif_rd/   -e ali_drw_25080/perf_hif_rmw/   -e ali_drw_25080/perf_cycle/   -e ali_drw_27000/perf_hif_wr/   -e ali_drw_27000/perf_hif_rd/   -e ali_drw_27000/perf_hif_rmw/   -e ali_drw_27000/perf_cycle/   -e ali_drw_27080/perf_hif_wr/   -e ali_drw_27080/perf_hif_rd/   -e ali_drw_27080/perf_hif_rmw/   -e ali_drw_27080/perf_cycle/   -a -- sleep 1Performance counter stats for "system wide":             12398      ali_drw_21000/perf_hif_wr/          40160751      ali_drw_21000/perf_hif_rd/               743      ali_drw_21000/perf_hif_rmw/         500620725      ali_drw_21000/perf_cycle/             12252      ali_drw_21080/perf_hif_wr/          40161013      ali_drw_21080/perf_hif_rd/               767      ali_drw_21080/perf_hif_rmw/         500619340      ali_drw_21080/perf_cycle/             11960      ali_drw_23000/perf_hif_wr/          40159522      ali_drw_23000/perf_hif_rd/               737      ali_drw_23000/perf_hif_rmw/         500613505      ali_drw_23000/perf_cycle/             12044      ali_drw_23080/perf_hif_wr/          40159066      ali_drw_23080/perf_hif_rd/               773      ali_drw_23080/perf_hif_rmw/         500607620      ali_drw_23080/perf_cycle/             12698      ali_drw_25000/perf_hif_wr/          40160138      ali_drw_25000/perf_hif_rd/               709      ali_drw_25000/perf_hif_rmw/         500601240      ali_drw_25000/perf_cycle/             12521      ali_drw_25080/perf_hif_wr/          40160169      ali_drw_25080/perf_hif_rd/               727      ali_drw_25080/perf_hif_rmw/         500594755      ali_drw_25080/perf_cycle/             12171      ali_drw_27000/perf_hif_wr/          40159404      ali_drw_27000/perf_hif_rd/               706      ali_drw_27000/perf_hif_rmw/         500589945      ali_drw_27000/perf_cycle/             12290      ali_drw_27080/perf_hif_wr/          40157620      ali_drw_27080/perf_hif_rd/               710      ali_drw_27080/perf_hif_rmw/         500583305      ali_drw_27080/perf_cycle/       1.000923276 seconds time elapsed>>> 40159522*8*64/1000/1000.020561.675# set  CPU and memory to the same NUMA nodenumactl --cpubind=0 --membind=0 ./bw_mem 40960M rd40960.00 20507.82

4.3 C1M1 rd

# First, run bw_mem as backgroud workload# numactl --cpubind=1 --membind=1 ./bw_mem 40960M rd# Then run perf command in another consoleperf stat   -e ali_drw_40021000/perf_hif_wr/   -e ali_drw_40021000/perf_hif_rd/   -e ali_drw_40021000/perf_hif_rmw/   -e ali_drw_40021000/perf_cycle/   -e ali_drw_40021080/perf_hif_wr/   -e ali_drw_40021080/perf_hif_rd/   -e ali_drw_40021080/perf_hif_rmw/   -e ali_drw_40021080/perf_cycle/   -e ali_drw_40023000/perf_hif_wr/   -e ali_drw_40023000/perf_hif_rd/   -e ali_drw_40023000/perf_hif_rmw/   -e ali_drw_40023000/perf_cycle/   -e ali_drw_40023080/perf_hif_wr/   -e ali_drw_40023080/perf_hif_rd/   -e ali_drw_40023080/perf_hif_rmw/   -e ali_drw_40023080/perf_cycle/   -e ali_drw_40025000/perf_hif_wr/   -e ali_drw_40025000/perf_hif_rd/   -e ali_drw_40025000/perf_hif_rmw/   -e ali_drw_40025000/perf_cycle/   -e ali_drw_40025080/perf_hif_wr/   -e ali_drw_40025080/perf_hif_rd/   -e ali_drw_40025080/perf_hif_rmw/   -e ali_drw_40025080/perf_cycle/   -e ali_drw_40027000/perf_hif_wr/   -e ali_drw_40027000/perf_hif_rd/   -e ali_drw_40027000/perf_hif_rmw/   -e ali_drw_40027000/perf_cycle/   -e ali_drw_40027080/perf_hif_wr/   -e ali_drw_40027080/perf_hif_rd/   -e ali_drw_40027080/perf_hif_rmw/   -e ali_drw_40027080/perf_cycle/   -a -- sleep 1 Performance counter stats for "system wide":              2329      ali_drw_40021000/perf_hif_wr/          40071983      ali_drw_40021000/perf_hif_rd/                58      ali_drw_40021000/perf_hif_rmw/         500572165      ali_drw_40021000/perf_cycle/              2374      ali_drw_40021080/perf_hif_wr/          40071737      ali_drw_40021080/perf_hif_rd/                39      ali_drw_40021080/perf_hif_rmw/         500569615      ali_drw_40021080/perf_cycle/              2330      ali_drw_40023000/perf_hif_wr/          40071063      ali_drw_40023000/perf_hif_rd/                74      ali_drw_40023000/perf_hif_rmw/         500565635      ali_drw_40023000/perf_cycle/              2372      ali_drw_40023080/perf_hif_wr/          40070344      ali_drw_40023080/perf_hif_rd/                54      ali_drw_40023080/perf_hif_rmw/         500561355      ali_drw_40023080/perf_cycle/              2362      ali_drw_40025000/perf_hif_wr/          40070906      ali_drw_40025000/perf_hif_rd/                45      ali_drw_40025000/perf_hif_rmw/         500557480      ali_drw_40025000/perf_cycle/              2385      ali_drw_40025080/perf_hif_wr/          40070168      ali_drw_40025080/perf_hif_rd/                46      ali_drw_40025080/perf_hif_rmw/         500552550      ali_drw_40025080/perf_cycle/              2333      ali_drw_40027000/perf_hif_wr/          40069233      ali_drw_40027000/perf_hif_rd/                28      ali_drw_40027000/perf_hif_rmw/         500548745      ali_drw_40027000/perf_cycle/              2211      ali_drw_40027080/perf_hif_wr/          40068227      ali_drw_40027080/perf_hif_rd/                30      ali_drw_40027080/perf_hif_rmw/         500544450      ali_drw_40027080/perf_cycle/       1.000863258 seconds time elapsed>>> 40070906*8*64/1000/1000.020516.303numactl --cpubind=1 --membind=1 ./bw_mem 40960M rd40960.00 20492.53

4.4 C0M0 fwr

# First, run bw_mem as backgroud workload# numactl --cpubind=0 --membind=0 ./bw_mem 40960M fwr# Then run perf command in another consoleperf stat   -e ali_drw_21000/perf_hif_wr/   -e ali_drw_21000/perf_hif_rd/   -e ali_drw_21000/perf_hif_rmw/   -e ali_drw_21000/perf_cycle/   -e ali_drw_21080/perf_hif_wr/   -e ali_drw_21080/perf_hif_rd/   -e ali_drw_21080/perf_hif_rmw/   -e ali_drw_21080/perf_cycle/   -e ali_drw_23000/perf_hif_wr/   -e ali_drw_23000/perf_hif_rd/   -e ali_drw_23000/perf_hif_rmw/   -e ali_drw_23000/perf_cycle/   -e ali_drw_23080/perf_hif_wr/   -e ali_drw_23080/perf_hif_rd/   -e ali_drw_23080/perf_hif_rmw/   -e ali_drw_23080/perf_cycle/   -e ali_drw_25000/perf_hif_wr/   -e ali_drw_25000/perf_hif_rd/   -e ali_drw_25000/perf_hif_rmw/   -e ali_drw_25000/perf_cycle/   -e ali_drw_25080/perf_hif_wr/   -e ali_drw_25080/perf_hif_rd/   -e ali_drw_25080/perf_hif_rmw/   -e ali_drw_25080/perf_cycle/   -e ali_drw_27000/perf_hif_wr/   -e ali_drw_27000/perf_hif_rd/   -e ali_drw_27000/perf_hif_rmw/   -e ali_drw_27000/perf_cycle/   -e ali_drw_27080/perf_hif_wr/   -e ali_drw_27080/perf_hif_rd/   -e ali_drw_27080/perf_hif_rmw/   -e ali_drw_27080/perf_cycle/   -a -- sleep 1 Performance counter stats for "system wide":          42910737      ali_drw_21000/perf_hif_wr/            108397      ali_drw_21000/perf_hif_rd/               495      ali_drw_21000/perf_hif_rmw/         500708510      ali_drw_21000/perf_cycle/          42911223      ali_drw_21080/perf_hif_wr/            117280      ali_drw_21080/perf_hif_rd/               515      ali_drw_21080/perf_hif_rmw/         500706780      ali_drw_21080/perf_cycle/          42910038      ali_drw_23000/perf_hif_wr/            109179      ali_drw_23000/perf_hif_rd/               516      ali_drw_23000/perf_hif_rmw/         500702100      ali_drw_23000/perf_cycle/          42911620      ali_drw_23080/perf_hif_wr/            111038      ali_drw_23080/perf_hif_rd/               523      ali_drw_23080/perf_hif_rmw/         500697340      ali_drw_23080/perf_cycle/          42910435      ali_drw_25000/perf_hif_wr/            111748      ali_drw_25000/perf_hif_rd/               469      ali_drw_25000/perf_hif_rmw/         500692500      ali_drw_25000/perf_cycle/          42908786      ali_drw_25080/perf_hif_wr/            110177      ali_drw_25080/perf_hif_rd/               456      ali_drw_25080/perf_hif_rmw/         500686595      ali_drw_25080/perf_cycle/          42908903      ali_drw_27000/perf_hif_wr/            114093      ali_drw_27000/perf_hif_rd/               490      ali_drw_27000/perf_hif_rmw/         500681405      ali_drw_27000/perf_cycle/          42908156      ali_drw_27080/perf_hif_wr/            109668      ali_drw_27080/perf_hif_rd/               489      ali_drw_27080/perf_hif_rmw/         500676420      ali_drw_27080/perf_cycle/       1.001100811 seconds time elapsed>>> (42908156+489)*8*64/1000/1000.021969.226numactl --cpubind=0 --membind=0 ./bw_mem 40960M fwr40960.00 21936.50

4.5 C1M1 fwr

# First, run bw_mem as backgroud workload# numactl --cpubind=1 --membind=1 ./bw_mem 40960M fwr# Then run perf command in another consoleperf stat   -e ali_drw_40021000/perf_hif_wr/   -e ali_drw_40021000/perf_hif_rd/   -e ali_drw_40021000/perf_hif_rmw/   -e ali_drw_40021000/perf_cycle/   -e ali_drw_40021080/perf_hif_wr/   -e ali_drw_40021080/perf_hif_rd/   -e ali_drw_40021080/perf_hif_rmw/   -e ali_drw_40021080/perf_cycle/   -e ali_drw_40023000/perf_hif_wr/   -e ali_drw_40023000/perf_hif_rd/   -e ali_drw_40023000/perf_hif_rmw/   -e ali_drw_40023000/perf_cycle/   -e ali_drw_40023080/perf_hif_wr/   -e ali_drw_40023080/perf_hif_rd/   -e ali_drw_40023080/perf_hif_rmw/   -e ali_drw_40023080/perf_cycle/   -e ali_drw_40025000/perf_hif_wr/   -e ali_drw_40025000/perf_hif_rd/   -e ali_drw_40025000/perf_hif_rmw/   -e ali_drw_40025000/perf_cycle/   -e ali_drw_40025080/perf_hif_wr/   -e ali_drw_40025080/perf_hif_rd/   -e ali_drw_40025080/perf_hif_rmw/   -e ali_drw_40025080/perf_cycle/   -e ali_drw_40027000/perf_hif_wr/   -e ali_drw_40027000/perf_hif_rd/   -e ali_drw_40027000/perf_hif_rmw/   -e ali_drw_40027000/perf_cycle/   -e ali_drw_40027080/perf_hif_wr/   -e ali_drw_40027080/perf_hif_rd/   -e ali_drw_40027080/perf_hif_rmw/   -e ali_drw_40027080/perf_cycle/   -a -- sleep 1 Performance counter stats for "system wide":          42906048      ali_drw_40021000/perf_hif_wr/             33939      ali_drw_40021000/perf_hif_rd/                76      ali_drw_40021000/perf_hif_rmw/         500629355      ali_drw_40021000/perf_cycle/          42905967      ali_drw_40021080/perf_hif_wr/             34018      ali_drw_40021080/perf_hif_rd/                63      ali_drw_40021080/perf_hif_rmw/         500631900      ali_drw_40021080/perf_cycle/          42905422      ali_drw_40023000/perf_hif_wr/             33843      ali_drw_40023000/perf_hif_rd/                75      ali_drw_40023000/perf_hif_rmw/         500628540      ali_drw_40023000/perf_cycle/          42905547      ali_drw_40023080/perf_hif_wr/             33858      ali_drw_40023080/perf_hif_rd/                68      ali_drw_40023080/perf_hif_rmw/         500623970      ali_drw_40023080/perf_cycle/          42905230      ali_drw_40025000/perf_hif_wr/             34028      ali_drw_40025000/perf_hif_rd/                56      ali_drw_40025000/perf_hif_rmw/         500620630      ali_drw_40025000/perf_cycle/          42904734      ali_drw_40025080/perf_hif_wr/             34141      ali_drw_40025080/perf_hif_rd/                61      ali_drw_40025080/perf_hif_rmw/         500615840      ali_drw_40025080/perf_cycle/          42903390      ali_drw_40027000/perf_hif_wr/             33712      ali_drw_40027000/perf_hif_rd/                84      ali_drw_40027000/perf_hif_rmw/         500610635      ali_drw_40027000/perf_cycle/          42903975      ali_drw_40027080/perf_hif_wr/             33916      ali_drw_40027080/perf_hif_rd/               106      ali_drw_40027080/perf_hif_rmw/         500606645      ali_drw_40027080/perf_cycle/       1.000953335 seconds time elapsed>>> (42903975+106)*8*64/1000/1000.021966.889#numactl --cpubind=1 --membind=1 ./bw_mem 40960M fwr40960.00 21934.51

标签:

上一篇:封面报道丨今天给孩子们读什么
下一篇:最后一页