|
GTPin
|
The Simdprof tool counts the effective number of SIMD operations executed by the kernel
To run Simdprof tool (default configuration) use the following command:
Profilers/Bin/gtpin -t simdprof -- app
There are number of factors that affect the number of operations executed by a single instruction. Some of the factors can be evaluated statically, by analyzing instruction attributes and operands. Other factors may depend on the runtime architectural state. Their analyses require dynamic calculations during the execution of the application and/or kernel.
This following figure shows a typical GEN instruction:
The following table describes input parameters of the SIMD counting algorithm, as well as methods used for their collection.
Once all input parameters are collected, the tool can compute the SIMD operation count for a particular instruction. The computation is done by the COUNT_SIMD_OPS procedure, as shown in the following pseudocode:
The Simdprof tool counts the dynamic number of operations performed by the graphics device EUs (which is basically equal to the amount of active SIMD channels). When you run the in-house GTPin Simdprof tool in a default configuration, the tool generates the directory: GTPIN_PROFILE_SIMDPROF0. Profiling results are stored in the file: GTPIN_PROFILE_SIMDPROF0\Session_Final\simdprof.out. The simdprof.out file has the following format:
Channels (SIMD operations) executed by kernels/BBLs
====================================================
----------------------------------------------------------------------------------------------------
BitonicSort___CS_asm6b96b239a92a0daa_simd32
BBL Head Ins ID Tail Ins ID Channels
0 0 13 55136
1 14 16 16272
2 17 94 280080
3 95 95 0
4 96 99 13824
5 100 100 0
6 101 104 9216
7 105 105 0
8 106 106 360
9 107 129 32400
10 130 130 0
11 131 148 23040
12 149 149 0
13 150 167 18432
14 168 168 0
15 169 169 504
16 170 182 2560
17 183 191 2064
18 192 199 0
19 200 207 1024
20 208 208 0
21 209 226 2304
22 227 227 0
23 228 245 2304
24 246 246 0
25 247 247 4160
26 248 248 4160
Total 467840
Total number of kernels: 1
Total number of channels (SIMD operations): 40337408000
For each kernel/shader, the data is presented by a basic block (BBL). For each BBL, its ID is provided, along with the head (the first) instruction ID of this BBL, the tail (the last) instruction ID of this BBL, and the dynamic amount of all active channels within this BBL.
A user can know which specific BBL is indicated by looking into the assembly dump of the corresponding kernel, which is saved in the folder: GTPIN_PROFILE_SIMDPROF0\ASM. For example:
// kernel name: BitonicSort // BBL0 [ 0] (W) mov (8|M0) r100.0<1>:ud r0.0<1;1,0>:ud [ 1] (W) or (1|M0) cr0.0<1>:ud cr0.0<0;1,0>:ud 0x4C0:uw {Switch} [ 2] (W) mul (1|M0) r8.0<1>:d r9.0<0;1,0>:d r100.1<0;1,0>:d {Compacted} [ 3] (W) cmp (16|M0) (eq)f1.0 null<1>:d r8.2<0;1,0>:d 0:w [ 4] (W) cmp (16|M16) (eq)f1.0 null<1>:d r8.2<0;1,0>:d 0:w [ 5] add (8|M0) r3.0<1>:q r1.0<8;8,1>:uw r8.0<0;1,0>:ud [ 6] add (8|M8) r5.0<1>:q r1.8<8;8,1>:uw r8.0<0;1,0>:ud [ 7] add (8|M16) r11.0<1>:q r2.0<8;8,1>:uw r8.0<0;1,0>:ud [ 8] add (8|M24) r9.0<1>:q r2.8<8;8,1>:uw r8.0<0;1,0>:ud [ 9] add (8|M0) r60.0<1>:q r3.0<4;4,1>:q r7.0<0;1,0>:ud [ 10] add (8|M8) r58.0<1>:q r5.0<4;4,1>:q r7.0<0;1,0>:ud [ 11] add (8|M16) r4.0<1>:q r11.0<4;4,1>:q r7.0<0;1,0>:ud [ 12] add (8|M24) r2.0<1>:q r9.0<4;4,1>:q r7.0<0;1,0>:ud [ 13] (W&f1.0) jmpi 2296 // BBL1 [ 14] (W) cmp (16|M0) (eq)f0.0 null<1>:d r8.3<0;1,0>:d 0:w [ 15] (W) cmp (16|M16) (eq)f0.0 null<1>:d r8.3<0;1,0>:d 0:w [ 16] (W&f0.0) jmpi 1376 // BBL2 [ 17] (W) add (1|M0) r8.0<1>:d r8.3<0;1,0>:d 31:w [ 18] (W) mov (1|M0) r6.0<1>:w 1:w [ 19] (W) add (1|M0) r8.7<1>:d r8.3<0;1,0>:d 63:w [ 20] (W) and (1|M0) r8.6<1>:d r8.3<0;1,0>:d 63:w [ 21] (W) and (1|M0) r8.1<1>:d r8.0<0;1,0>:d 31:w [ 22] (W) and (1|M0) r8.0<1>:d r8.7<0;1,0>:d 63:w [ 23] (W) shl (1|M0) r8.3<1>:d r6.0<0;1,0>:w r8.1<0;1,0>:d [ 24] shr (8|M0) r6.0<1>:q r60.0<4;4,1>:uq r8.0<0;1,0>:ud [ 25] shr (8|M8) r9.0<1>:q r58.0<4;4,1>:uq r8.0<0;1,0>:ud [ 26] shr (8|M16) r11.0<1>:q r4.0<4;4,1>:uq r8.0<0;1,0>:ud [ 27] shr (8|M24) r13.0<1>:q r2.0<4;4,1>:uq r8.0<0;1,0>:ud [ 28] (W) add (1|M0) r8.0<1>:q r8.3<0;1,0>:d -1:w [ 29] shl (8|M0) r25.0<1>:q r6.0<4;4,1>:q r8.6<0;1,0>:ud [ 30] shl (8|M8) r23.0<1>:q r9.0<4;4,1>:q r8.6<0;1,0>:ud [ 31] shl (8|M16) r21.0<1>:q r11.0<4;4,1>:q r8.6<0;1,0>:ud [ 32] shl (8|M24) r19.0<1>:q r13.0<4;4,1>:q r8.6<0;1,0>:ud [ 33] and (8|M0) r6.0<1>:q r60.0<4;4,1>:q r8.0<0;1,0>:q [ 34] and (8|M8) r9.0<1>:q r58.0<4;4,1>:q r8.0<0;1,0>:q [ 35] and (8|M16) r15.0<1>:q r4.0<4;4,1>:q r8.0<0;1,0>:q [ 36] and (8|M24) r17.0<1>:q r2.0<4;4,1>:q r8.0<0;1,0>:q [ 37] add (8|M0) r13.0<1>:q r25.0<4;4,1>:q r6.0<4;4,1>:q [ 38] add (8|M8) r11.0<1>:q r23.0<4;4,1>:q r9.0<4;4,1>:q [ 39] add (8|M16) r9.0<1>:q r21.0<4;4,1>:q r15.0<4;4,1>:q [ 40] add (8|M24) r6.0<1>:q r19.0<4;4,1>:q r17.0<4;4,1>:q [ 41] (W) add (1|M0) r8.0<1>:d r8.2<0;1,0>:d 63:w [ 42] add (8|M0) r15.0<2>:d r13.0<4;4,1>:q r8.3<0;1,0>:d [ 43] add (8|M8) r19.0<2>:d r11.0<4;4,1>:q r8.3<0;1,0>:d [ 44] add (8|M16) r17.0<2>:d r9.0<4;4,1>:q r8.3<0;1,0>:d [ 45] add (8|M24) r21.0<2>:d r6.0<4;4,1>:q r8.3<0;1,0>:d [ 46] mov (8|M0) r62.0<1>:d r13.0<2;1,0>:d [ 47] mov (8|M8) r63.0<1>:d r11.0<2;1,0>:d [ 48] mov (8|M16) r64.0<1>:d r9.0<2;1,0>:d [ 49] mov (8|M24) r65.0<1>:d r6.0<2;1,0>:d [ 50] mov (8|M0) r6.0<1>:d r15.0<2;1,0>:d [ 51] mov (8|M8) r7.0<1>:d r19.0<2;1,0>:d [ 52] mov (8|M16) r66.0<1>:d r17.0<2;1,0>:d [ 53] mov (8|M24) r67.0<1>:d r21.0<2;1,0>:d [ 54] shl (16|M0) r62.0<1>:d r62.0<8;8,1>:d 4:w [ 55] shl (16|M16) r64.0<1>:d r64.0<8;8,1>:d 4:w [ 56] shl (16|M0) r6.0<1>:d r6.0<8;8,1>:d 4:w [ 57] shl (16|M16) r66.0<1>:d r66.0<8;8,1>:d 4:w [ 58] add (16|M0) r62.0<1>:d r62.0<8;8,1>:d r8.5<0;1,0>:d {Compacted} [ 59] add (16|M16) r64.0<1>:d r64.0<8;8,1>:d r8.5<0;1,0>:d [ 60] add (16|M0) r6.0<1>:d r6.0<8;8,1>:d r8.5<0;1,0>:d {Compacted} [ 61] add (16|M16) r66.0<1>:d r66.0<8;8,1>:d r8.5<0;1,0>:d [ 62] send (16|M0) r18:w r62 0xC 0x4805000 [ 63] send (16|M16) r50:w r64 0xC 0x4805000 [ 64] send (16|M0) r10:w r6 0xC 0x4805000 [ 65] send (16|M16) r42:w r66 0xC 0x4805000 [ 66] (W) and (1|M0) r8.0<1>:d r8.0<0;1,0>:d 63:w [ 67] shr (8|M0) r26.0<1>:q r60.0<4;4,1>:uq r8.0<0;1,0>:ud [ 68] shr (8|M8) r28.0<1>:q r58.0<4;4,1>:uq r8.0<0;1,0>:ud [ 69] shr (8|M16) r34.0<1>:q r4.0<4;4,1>:uq r8.0<0;1,0>:ud [ 70] shr (8|M24) r36.0<1>:q r2.0<4;4,1>:uq r8.0<0;1,0>:ud [ 71] and (8|M0) r32.0<1>:q r26.0<4;4,1>:q 1:w [ 72] and (8|M8) r30.0<1>:q r28.0<4;4,1>:q 1:w [ 73] and (8|M16) r28.0<1>:q r34.0<4;4,1>:q 1:w [ 74] and (8|M24) r26.0<1>:q r36.0<4;4,1>:q 1:w [ 75] cmp (8|M0) (eq)f0.0 null<1>:q r32.0<4;4,1>:q r8.4<0;1,0>:ud [ 76] cmp (8|M8) (eq)f0.0 null<1>:q r30.0<4;4,1>:q r8.4<0;1,0>:ud [ 77] cmp (8|M16) (eq)f0.0 null<1>:q r28.0<4;4,1>:q r8.4<0;1,0>:ud [ 78] cmp (8|M24) (eq)f0.0 null<1>:q r26.0<4;4,1>:q r8.4<0;1,0>:ud [ 79] sel (16|M0) (lt)f0.0 r34.0<1>:d r18.0<8;8,1>:d r10.0<8;8,1>:d {Compacted} [ 80] sel (16|M0) (lt)f0.0 r36.0<1>:d r20.0<8;8,1>:d r12.0<8;8,1>:d {Compacted} [ 81] sel (16|M0) (lt)f0.0 r38.0<1>:d r22.0<8;8,1>:d r14.0<8;8,1>:d {Compacted} [ 82] sel (16|M0) (lt)f0.0 r40.0<1>:d r24.0<8;8,1>:d r16.0<8;8,1>:d {Compacted} [ 83] sel (16|M0) (ge)f0.0 r26.0<1>:d r10.0<8;8,1>:d r18.0<8;8,1>:d {Compacted} [ 84] sel (16|M0) (ge)f0.0 r28.0<1>:d r12.0<8;8,1>:d r20.0<8;8,1>:d {Compacted} [ 85] sel (16|M0) (ge)f0.0 r30.0<1>:d r14.0<8;8,1>:d r22.0<8;8,1>:d {Compacted} [ 86] sel (16|M0) (ge)f0.0 r32.0<1>:d r16.0<8;8,1>:d r24.0<8;8,1>:d {Compacted} [ 87] sel (16|M16) (lt)f0.0 r17.0<1>:d r50.0<8;8,1>:d r42.0<8;8,1>:d [ 88] sel (16|M16) (lt)f0.0 r19.0<1>:d r52.0<8;8,1>:d r44.0<8;8,1>:d [ 89] sel (16|M16) (lt)f0.0 r21.0<1>:d r54.0<8;8,1>:d r46.0<8;8,1>:d [ 90] sel (16|M16) (lt)f0.0 r23.0<1>:d r56.0<8;8,1>:d r48.0<8;8,1>:d [ 91] sel (16|M16) (ge)f0.0 r9.0<1>:d r42.0<8;8,1>:d r50.0<8;8,1>:d [ 92] sel (16|M16) (ge)f0.0 r11.0<1>:d r44.0<8;8,1>:d r52.0<8;8,1>:d [ 93] sel (16|M16) (ge)f0.0 r13.0<1>:d r46.0<8;8,1>:d r54.0<8;8,1>:d [ 94] sel (16|M16) (ge)f0.0 r15.0<1>:d r48.0<8;8,1>:d r56.0<8;8,1>:d // BBL3 [ 95] (~f0.0) if (32|M0) 96 160 // BBL4 [ 96] sends (16|M0) null:w r62 r34 0x20C 0x4025000 [ 97] sends (16|M16) null:w r64 r17 0x20C 0x4025000 [ 98] sends (16|M0) null:w r6 r26 0x20C 0x4025000 [ 99] sends (16|M16) null:w r66 r9 0x20C 0x4025000 // BBL5 [100] else (32|M0) 80 80 // BBL6 [101] sends (16|M0) null:w r6 r34 0x20C 0x4025000 [102] sends (16|M16) null:w r66 r17 0x20C 0x4025000 [103] sends (16|M0) null:w r62 r26 0x20C 0x4025000 [104] sends (16|M16) null:w r64 r9 0x20C 0x4025000 // BBL7 [105] endif (32|M0) 16 // BBL8 [106] (W) jmpi 872 // BBL9 [107] mov (8|M0) r6.0<1>:d r60.0<2;1,0>:d [108] mov (8|M8) r7.0<1>:d r58.0<2;1,0>:d [109] mov (8|M16) r42.0<1>:d r4.0<2;1,0>:d [110] mov (8|M24) r43.0<1>:d r2.0<2;1,0>:d [111] (W) and (1|M0) r8.0<1>:d r8.2<0;1,0>:d 63:w [112] shl (16|M0) r6.0<1>:d r6.0<8;8,1>:d 4:w [113] shl (16|M16) r42.0<1>:d r42.0<8;8,1>:d 4:w [114] shr (8|M0) r9.0<1>:q r60.0<4;4,1>:uq r8.0<0;1,0>:ud [115] shr (8|M8) r11.0<1>:q r58.0<4;4,1>:uq r8.0<0;1,0>:ud [116] shr (8|M16) r18.0<1>:q r4.0<4;4,1>:uq r8.0<0;1,0>:ud [117] add (16|M0) r6.0<1>:d r6.0<8;8,1>:d r8.5<0;1,0>:d {Compacted} [118] add (16|M16) r42.0<1>:d r42.0<8;8,1>:d r8.5<0;1,0>:d [119] shr (8|M24) r26.0<1>:q r2.0<4;4,1>:uq r8.0<0;1,0>:ud [120] and (8|M0) r24.0<1>:q r9.0<4;4,1>:q 1:w [121] and (8|M8) r22.0<1>:q r11.0<4;4,1>:q 1:w [122] send (16|M0) r10:w r6 0xC 0x4805000 [123] send (16|M16) r34:w r42 0xC 0x4805000 [124] and (8|M16) r20.0<1>:q r18.0<4;4,1>:q 1:w [125] and (8|M24) r18.0<1>:q r26.0<4;4,1>:q 1:w [126] cmp (8|M0) (eq)f1.0 null<1>:q r24.0<4;4,1>:q r8.4<0;1,0>:ud [127] cmp (8|M8) (eq)f1.0 null<1>:q r22.0<4;4,1>:q r8.4<0;1,0>:ud [128] cmp (8|M16) (eq)f1.0 null<1>:q r20.0<4;4,1>:q r8.4<0;1,0>:ud [129] cmp (8|M24) (eq)f1.0 null<1>:q r18.0<4;4,1>:q r8.4<0;1,0>:ud // BBL10 [130] (~f1.0) if (32|M0) 256 480 // BBL11 [131] sel (16|M0) (lt)f0.0 r30.0<1>:d r10.0<8;8,1>:d r14.0<8;8,1>:d {Compacted} [132] sel (16|M0) (lt)f0.0 r28.0<1>:d r12.0<8;8,1>:d r16.0<8;8,1>:d {Compacted} [133] sel (16|M0) (ge)f0.0 r18.0<1>:d r14.0<8;8,1>:d r10.0<8;8,1>:d {Compacted} [134] sel (16|M0) (ge)f0.0 r10.0<1>:d r16.0<8;8,1>:d r12.0<8;8,1>:d {Compacted} [135] sel (16|M16) (lt)f0.0 r22.0<1>:d r34.0<8;8,1>:d r38.0<8;8,1>:d [136] sel (16|M16) (lt)f0.0 r20.0<1>:d r36.0<8;8,1>:d r40.0<8;8,1>:d [137] sel (16|M16) (ge)f0.0 r44.0<1>:d r38.0<8;8,1>:d r34.0<8;8,1>:d [138] sel (16|M16) (ge)f0.0 r24.0<1>:d r40.0<8;8,1>:d r36.0<8;8,1>:d [139] sel (16|M0) (lt)f0.0 r26.0<1>:d r30.0<8;8,1>:d r28.0<8;8,1>:d {Compacted} [140] sel (16|M0) (ge)f0.0 r28.0<1>:d r30.0<8;8,1>:d r28.0<8;8,1>:d {Compacted} [141] sel (16|M0) (lt)f0.0 r30.0<1>:d r18.0<8;8,1>:d r10.0<8;8,1>:d {Compacted} [142] sel (16|M0) (ge)f0.0 r32.0<1>:d r18.0<8;8,1>:d r10.0<8;8,1>:d {Compacted} [143] sel (16|M16) (lt)f0.0 r18.0<1>:d r22.0<8;8,1>:d r20.0<8;8,1>:d [144] sel (16|M16) (ge)f0.0 r20.0<1>:d r22.0<8;8,1>:d r20.0<8;8,1>:d [145] sel (16|M16) (lt)f0.0 r22.0<1>:d r44.0<8;8,1>:d r24.0<8;8,1>:d [146] sel (16|M16) (ge)f0.0 r24.0<1>:d r44.0<8;8,1>:d r24.0<8;8,1>:d [147] sends (16|M0) null:w r6 r26 0x20C 0x4025000 [148] sends (16|M16) null:w r42 r18 0x20C 0x4025000 // BBL12 [149] else (32|M0) 240 240 // BBL13 [150] sel (16|M0) (ge)f0.0 r22.0<1>:d r14.0<8;8,1>:d r10.0<8;8,1>:d {Compacted} [151] sel (16|M0) (ge)f0.0 r20.0<1>:d r16.0<8;8,1>:d r12.0<8;8,1>:d {Compacted} [152] sel (16|M0) (lt)f0.0 r30.0<1>:d r10.0<8;8,1>:d r14.0<8;8,1>:d {Compacted} [153] sel (16|M0) (lt)f0.0 r28.0<1>:d r12.0<8;8,1>:d r16.0<8;8,1>:d {Compacted} [154] sel (16|M16) (ge)f0.0 r13.0<1>:d r38.0<8;8,1>:d r34.0<8;8,1>:d [155] sel (16|M16) (ge)f0.0 r11.0<1>:d r40.0<8;8,1>:d r36.0<8;8,1>:d [156] sel (16|M16) (lt)f0.0 r25.0<1>:d r34.0<8;8,1>:d r38.0<8;8,1>:d [157] sel (16|M16) (lt)f0.0 r15.0<1>:d r36.0<8;8,1>:d r40.0<8;8,1>:d [158] sel (16|M0) (ge)f0.0 r17.0<1>:d r20.0<8;8,1>:d r22.0<8;8,1>:d {Compacted} [159] sel (16|M0) (lt)f0.0 r19.0<1>:d r20.0<8;8,1>:d r22.0<8;8,1>:d {Compacted} [160] sel (16|M0) (ge)f0.0 r21.0<1>:d r28.0<8;8,1>:d r30.0<8;8,1>:d {Compacted} [161] sel (16|M0) (lt)f0.0 r23.0<1>:d r28.0<8;8,1>:d r30.0<8;8,1>:d {Compacted} [162] sel (16|M16) (ge)f0.0 r9.0<1>:d r11.0<8;8,1>:d r13.0<8;8,1>:d [163] sel (16|M16) (lt)f0.0 r11.0<1>:d r11.0<8;8,1>:d r13.0<8;8,1>:d [164] sel (16|M16) (ge)f0.0 r13.0<1>:d r15.0<8;8,1>:d r25.0<8;8,1>:d [165] sel (16|M16) (lt)f0.0 r15.0<1>:d r15.0<8;8,1>:d r25.0<8;8,1>:d [166] sends (16|M0) null:w r6 r17 0x20C 0x4025000 [167] sends (16|M16) null:w r42 r9 0x20C 0x4025000 // BBL14 [168] endif (32|M0) 16 // BBL15 [169] (W) jmpi 1048 // BBL16 [170] mov (8|M0) r6.0<1>:d r60.0<2;1,0>:d [171] mov (8|M8) r7.0<1>:d r58.0<2;1,0>:d [172] mov (8|M16) r26.0<1>:d r4.0<2;1,0>:d [173] mov (8|M24) r27.0<1>:d r2.0<2;1,0>:d [174] (W) cmp (16|M0) (eq)f1.0 null<1>:d r8.4<0;1,0>:d 0:w [175] (W) cmp (16|M16) (eq)f1.0 null<1>:d r8.4<0;1,0>:d 0:w [176] shl (16|M0) r6.0<1>:d r6.0<8;8,1>:d 4:w [177] shl (16|M16) r26.0<1>:d r26.0<8;8,1>:d 4:w [178] add (16|M0) r6.0<1>:d r6.0<8;8,1>:d r8.5<0;1,0>:d {Compacted} [179] add (16|M16) r26.0<1>:d r26.0<8;8,1>:d r8.5<0;1,0>:d [180] send (16|M0) r10:w r6 0xC 0x4805000 [181] send (16|M16) r18:w r26 0xC 0x4805000 [182] (W&f1.0) jmpi 128 // BBL17 [183] sel (16|M0) (lt)f0.0 r32.0<1>:d r10.0<8;8,1>:d r12.0<8;8,1>:d {Compacted} [184] sel (16|M16) (lt)f0.0 r42.0<1>:d r18.0<8;8,1>:d r20.0<8;8,1>:d [185] sel (16|M0) (ge)f0.0 r30.0<1>:d r12.0<8;8,1>:d r10.0<8;8,1>:d {Compacted} [186] sel (16|M16) (ge)f0.0 r38.0<1>:d r20.0<8;8,1>:d r18.0<8;8,1>:d [187] sel (16|M0) (ge)f0.0 r28.0<1>:d r14.0<8;8,1>:d r16.0<8;8,1>:d {Compacted} [188] sel (16|M16) (ge)f0.0 r40.0<1>:d r22.0<8;8,1>:d r24.0<8;8,1>:d [189] sel (16|M0) (lt)f0.0 r34.0<1>:d r16.0<8;8,1>:d r14.0<8;8,1>:d {Compacted} [190] sel (16|M16) (lt)f0.0 r36.0<1>:d r24.0<8;8,1>:d r22.0<8;8,1>:d [191] (W) jmpi 112 // BBL18 [192] sel (16|M0) (ge)f0.0 r32.0<1>:d r12.0<8;8,1>:d r10.0<8;8,1>:d {Compacted} [193] sel (16|M16) (ge)f0.0 r42.0<1>:d r20.0<8;8,1>:d r18.0<8;8,1>:d [194] sel (16|M0) (lt)f0.0 r30.0<1>:d r10.0<8;8,1>:d r12.0<8;8,1>:d {Compacted} [195] sel (16|M16) (lt)f0.0 r38.0<1>:d r18.0<8;8,1>:d r20.0<8;8,1>:d [196] sel (16|M0) (lt)f0.0 r28.0<1>:d r16.0<8;8,1>:d r14.0<8;8,1>:d {Compacted} [197] sel (16|M16) (lt)f0.0 r40.0<1>:d r24.0<8;8,1>:d r22.0<8;8,1>:d [198] sel (16|M0) (ge)f0.0 r34.0<1>:d r14.0<8;8,1>:d r16.0<8;8,1>:d {Compacted} [199] sel (16|M16) (ge)f0.0 r36.0<1>:d r22.0<8;8,1>:d r24.0<8;8,1>:d // BBL19 [200] and (8|M0) r11.0<1>:q r60.0<4;4,1>:q 1:w [201] and (8|M8) r9.0<1>:q r58.0<4;4,1>:q 1:w [202] and (8|M16) r4.0<1>:q r4.0<4;4,1>:q 1:w [203] and (8|M24) r2.0<1>:q r2.0<4;4,1>:q 1:w [204] cmp (8|M0) (eq)f0.0 null<1>:q r11.0<4;4,1>:q r8.4<0;1,0>:ud [205] cmp (8|M8) (eq)f0.0 null<1>:q r9.0<4;4,1>:q r8.4<0;1,0>:ud [206] cmp (8|M16) (eq)f0.0 null<1>:q r4.0<4;4,1>:q r8.4<0;1,0>:ud [207] cmp (8|M24) (eq)f0.0 null<1>:q r2.0<4;4,1>:q r8.4<0;1,0>:ud // BBL20 [208] (~f0.0) if (32|M0) 256 480 // BBL21 [209] sel (16|M0) (lt)f0.0 r20.0<1>:d r32.0<8;8,1>:d r28.0<8;8,1>:d {Compacted} [210] sel (16|M0) (lt)f0.0 r18.0<1>:d r30.0<8;8,1>:d r34.0<8;8,1>:d {Compacted} [211] sel (16|M0) (ge)f0.0 r8.0<1>:d r28.0<8;8,1>:d r32.0<8;8,1>:d {Compacted} [212] sel (16|M0) (ge)f0.0 r4.0<1>:d r34.0<8;8,1>:d r30.0<8;8,1>:d {Compacted} [213] sel (16|M16) (lt)f0.0 r12.0<1>:d r42.0<8;8,1>:d r40.0<8;8,1>:d [214] sel (16|M16) (lt)f0.0 r10.0<1>:d r38.0<8;8,1>:d r36.0<8;8,1>:d [215] sel (16|M16) (ge)f0.0 r14.0<1>:d r40.0<8;8,1>:d r42.0<8;8,1>:d [216] sel (16|M16) (ge)f0.0 r2.0<1>:d r36.0<8;8,1>:d r38.0<8;8,1>:d [217] sel (16|M0) (lt)f0.0 r16.0<1>:d r20.0<8;8,1>:d r18.0<8;8,1>:d {Compacted} [218] sel (16|M0) (ge)f0.0 r18.0<1>:d r20.0<8;8,1>:d r18.0<8;8,1>:d {Compacted} [219] sel (16|M0) (lt)f0.0 r20.0<1>:d r8.0<8;8,1>:d r4.0<8;8,1>:d {Compacted} [220] sel (16|M0) (ge)f0.0 r22.0<1>:d r8.0<8;8,1>:d r4.0<8;8,1>:d {Compacted} [221] sel (16|M16) (lt)f0.0 r8.0<1>:d r12.0<8;8,1>:d r10.0<8;8,1>:d [222] sel (16|M16) (ge)f0.0 r10.0<1>:d r12.0<8;8,1>:d r10.0<8;8,1>:d [223] sel (16|M16) (lt)f0.0 r12.0<1>:d r14.0<8;8,1>:d r2.0<8;8,1>:d [224] sel (16|M16) (ge)f0.0 r14.0<1>:d r14.0<8;8,1>:d r2.0<8;8,1>:d [225] sends (16|M0) null:w r6 r16 0x20C 0x4025000 [226] sends (16|M16) null:w r26 r8 0x20C 0x4025000 // BBL22 [227] else (32|M0) 240 240 // BBL23 [228] sel (16|M0) (ge)f0.0 r20.0<1>:d r28.0<8;8,1>:d r32.0<8;8,1>:d {Compacted} [229] sel (16|M0) (ge)f0.0 r18.0<1>:d r34.0<8;8,1>:d r30.0<8;8,1>:d {Compacted} [230] sel (16|M0) (lt)f0.0 r14.0<1>:d r32.0<8;8,1>:d r28.0<8;8,1>:d {Compacted} [231] sel (16|M0) (lt)f0.0 r8.0<1>:d r30.0<8;8,1>:d r34.0<8;8,1>:d {Compacted} [232] sel (16|M16) (ge)f0.0 r12.0<1>:d r40.0<8;8,1>:d r42.0<8;8,1>:d [233] sel (16|M16) (ge)f0.0 r10.0<1>:d r36.0<8;8,1>:d r38.0<8;8,1>:d [234] sel (16|M16) (lt)f0.0 r4.0<1>:d r42.0<8;8,1>:d r40.0<8;8,1>:d [235] sel (16|M16) (lt)f0.0 r2.0<1>:d r38.0<8;8,1>:d r36.0<8;8,1>:d [236] sel (16|M0) (ge)f0.0 r16.0<1>:d r18.0<8;8,1>:d r20.0<8;8,1>:d {Compacted} [237] sel (16|M0) (lt)f0.0 r18.0<1>:d r18.0<8;8,1>:d r20.0<8;8,1>:d {Compacted} [238] sel (16|M0) (ge)f0.0 r20.0<1>:d r8.0<8;8,1>:d r14.0<8;8,1>:d {Compacted} [239] sel (16|M0) (lt)f0.0 r22.0<1>:d r8.0<8;8,1>:d r14.0<8;8,1>:d {Compacted} [240] sel (16|M16) (ge)f0.0 r8.0<1>:d r10.0<8;8,1>:d r12.0<8;8,1>:d [241] sel (16|M16) (lt)f0.0 r10.0<1>:d r10.0<8;8,1>:d r12.0<8;8,1>:d [242] sel (16|M16) (ge)f0.0 r12.0<1>:d r2.0<8;8,1>:d r4.0<8;8,1>:d [243] sel (16|M16) (lt)f0.0 r14.0<1>:d r2.0<8;8,1>:d r4.0<8;8,1>:d [244] sends (16|M0) null:w r6 r16 0x20C 0x4025000 [245] sends (16|M16) null:w r26 r8 0x20C 0x4025000 // BBL24 [246] endif (32|M0) 16 // BBL25 [247] (W) mov (8|M0) r112.0<1>:ud r100.0<8;8,1>:ud {Compacted} // BBL26 [248] (W) send (8|M0) null r112 0x27 0x2000010 {EOT}
(Back to the list of all GTPin Sample Tools)
00001 /*========================== begin_copyright_notice ============================ 00002 Copyright (C) 2019-2022 Intel Corporation 00003 00004 SPDX-License-Identifier: MIT 00005 ============================= end_copyright_notice ===========================*/ 00006 00007 /*! 00008 * @file SIMD operation counting tool definitions 00009 */ 00010 #ifndef SIMDPROF_H_ 00011 #define SIMDPROF_H_ 00012 00013 #include <vector> 00014 #include <map> 00015 00016 #include "gtpin_api.h" 00017 #include "gtpin_tool_utils.h" 00018 00019 using namespace gtpin; 00020 00021 /* ============================================================================================= */ 00022 // Struct SimdProfRecord 00023 /* ============================================================================================= */ 00024 /*! 00025 * Layout of records collected in profile buffer by the Simdprof tool 00026 */ 00027 struct SimdProfRecord 00028 { 00029 uint64_t opCount; ///< Number of SIMD operations executed by a group of instructions 00030 }; 00031 00032 /* ============================================================================================= */ 00033 // Struct SimdProfArgs 00034 /* ============================================================================================= */ 00035 /*! 00036 * SimdProf instrumentation arguments (instruction properties). 00037 * Each unique combination of these arguments requires a separate instrumentation procedure 00038 * to be generated for each group of instructions with these properties 00039 */ 00040 struct SimdProfArgs 00041 { 00042 SimdProfArgs(bool ctrl, uint32_t mask, GtPredicate pred, bool isSend = false) : 00043 maskCtrl(ctrl), execMask(mask), predicate(pred), isSendIns(isSend){} 00044 00045 inline bool operator < (const SimdProfArgs& other) const; 00046 00047 bool maskCtrl; ///< 'MaskCtrl' flag of instrumented instructions 00048 uint32_t execMask; ///< Execution mask of instrumented instructions 00049 GtPredicate predicate; ///< Predicate of instrumented instructions 00050 bool isSendIns; ///< true if instrumented instructions are SEND instructions 00051 }; 00052 00053 /* ============================================================================================= */ 00054 // Struct SimdProfGroup 00055 /* ============================================================================================= */ 00056 /*! 00057 * Structure that holds information and profiling results for a group of instructions being 00058 * instrumented by a single instrumentation routine. 00059 * @note All instructions within a group have exactly the same SimdProfArgs. 00060 * @note In order to provide separate channel counters per instruction category (e.g. integer, FP, etc.), 00061 * replace the {insCount, opCount} pair with an array of counter pairs per category. 00062 */ 00063 struct SimdProfGroup 00064 { 00065 SimdProfGroup(uint32_t bbl, uint32_t numIns) : bblId(bbl), insCount(numIns), opCount(0) {} 00066 00067 BblId bblId; ///< Identifier of a BBL that contains this group of instructions 00068 uint32_t insCount; ///< Number of instructions in the group 00069 uint64_t opCount; ///< Number of SIMD operations (effective channels) executed by each instruction in the group 00070 }; 00071 00072 /* ============================================================================================= */ 00073 // Struct SimdProfSection 00074 /* ============================================================================================= */ 00075 /*! 00076 * Structure that holds information on a SimdProf section - sequence of instructions for which 00077 * instrumentation routines can be inserted at the same point. 00078 * @note All instructions within a section are executed with the same value of the flag register - 00079 * single dynamic parameter of the SIMD operation calulator 00080 */ 00081 struct SimdProfSection 00082 { 00083 SimdProfSection(const IGtIns& headIns) : headInsId(headIns.Id()) {} 00084 00085 /// Add a new instruction to the section. Update the corresponding SimdProf group within this section 00086 void AddInstruction(const IGtIns& ins); 00087 00088 InsId headInsId; ///< First intruction of the section - common 00089 ///< instrumentation point for all groups in the section 00090 std::map<SimdProfArgs, uint32_t> groups; ///< SimdProf groups along with the number of instructions 00091 }; 00092 00093 /* ============================================================================================= */ 00094 // Class SimdProfKernelProfile 00095 /* ============================================================================================= */ 00096 /*! 00097 * Class that represents a kernel profiled by the SimdProf instrumentation 00098 */ 00099 class SimdProfKernelProfile 00100 { 00101 public: 00102 SimdProfKernelProfile(const IGtKernel& kernel); 00103 00104 /*! 00105 * Instrument the kernel. 00106 * The function is called by the OnKernelBuild handler 00107 * @return success/failure status 00108 */ 00109 void Instrument(IGtKernelInstrument& instrumentor); 00110 00111 /*! 00112 * Read profiling results which are assumed to be collected and stored in the buffer 00113 * associated with the kernel. 00114 * The function is called by the OnKernelComplete handler 00115 */ 00116 void ReadProfileData(const IGtProfileBuffer* buffer); 00117 00118 /// @return Total number of SIMD operations executed by the kernel 00119 uint64_t GetTotalOpCounter() const { return _totalOpCount; } 00120 00121 std::string ToString() const; ///< @return Text representation of the profile data 00122 const GtProfileArray& GetProfileArray() const { return _profileArray; } ///< @return Profile buffer accessor 00123 00124 private: 00125 /*! 00126 * Generate instrumentation procedures for all SimdProf groups of the specified SimdProf section. 00127 * Insert instrumentation at the beginning of the section. 00128 * Initialize the _profileData array 00129 * @param[in] instrumentor Instrumentor of the GEN kernel 00130 * @param[in] section SimdProf section to be instrumented 00131 */ 00132 void InstrumentSection(IGtKernelInstrument& instrumentor, const SimdProfSection& section); 00133 00134 /// @return true/false - use 64-bit/32-bit integer for the operation counter 00135 static bool Use64BitCounters(const IGtGenCoder& coder); 00136 00137 /// Increment counter of SIMD operations for the specified BBL by 'incValue' 00138 void UpdateBblOpCounter(BblId bblId, uint64_t incValue); 00139 00140 /// @return Extended kernel name 00141 std::string ExtendedName() const { return _extName; } 00142 00143 private: 00144 /// Kernel descriptor 00145 std::string _name; ///< Kernel's name 00146 std::string _extName; ///< Kernel's extended name 00147 GtKernelType _type; ///< Kernel's type 00148 GtGpuPlatform _platform; ///< Kernel's platform 00149 uint64_t _hashId; ///< Kernel's hash identifier 00150 GtSimdWidth _simd; ///< Kernel's SIMD width 00151 uint64_t _binarySignature; ///< Kernel's binary signature 00152 00153 GtProfileArray _profileArray; ///< Profile buffer accessor 00154 std::vector<SimdProfGroup> _profileData; ///< Profiling data for instrumented SimdProf groups 00155 00156 GtReg _addrReg; ///< Virtual register that holds address within profile buffer 00157 GtReg _dataReg; ///< Virtual register that holds data to be read from/written to profile buffer 00158 00159 std::map<BblId, std::pair<InsId, InsId> > _bblInsInfo; ///< Head and tail instructions per BBL 00160 std::map<BblId, uint64_t> _bblOpCounts; ///< Number of executed SIMD operations per BBL 00161 uint64_t _totalOpCount; ///< Number of SIMD operations executed by the kernel 00162 }; 00163 00164 /* ============================================================================================= */ 00165 // Class SimdProf 00166 /* ============================================================================================= */ 00167 /*! 00168 * Implementation of the IGtTool interface for the SimdProf tool 00169 */ 00170 class SimdProf : public GtTool 00171 { 00172 public: 00173 /// Implementation of the IGtTool interface 00174 const char* Name() const { return "simdprof"; } 00175 00176 void OnKernelBuild(IGtKernelInstrument& instrumentor); 00177 void OnKernelRun(IGtKernelDispatch& dispatcher); 00178 void OnKernelComplete(IGtKernelDispatch& dispatcher); 00179 00180 public: 00181 std::string ToString() const; ///< @return Text representation of the profile data 00182 static SimdProf* Instance(); ///< @return Single instance of this class 00183 static void OnFini() { Instance()->Fini(); } ///< Callback function registered with atexit() 00184 00185 protected: 00186 SimdProf() = default; 00187 SimdProf(const SimdProf&) = delete; 00188 SimdProf& operator = (const SimdProf&) = delete; 00189 ~SimdProf() = default; 00190 00191 void Fini(); ///< Post process and dump profiling data 00192 00193 private: 00194 /// Collection of kernel profiles 00195 typedef std::map<GtKernelId, SimdProfKernelProfile> KernelProfiles; 00196 KernelProfiles _kernels; 00197 }; 00198 #endif
00001 /*========================== begin_copyright_notice ============================ 00002 Copyright (C) 2019-2025 Intel Corporation 00003 00004 SPDX-License-Identifier: MIT 00005 ============================= end_copyright_notice ===========================*/ 00006 00007 /*! 00008 * @file Implementation of the SIMD operation counting tool 00009 */ 00010 00011 #include <algorithm> 00012 #include <vector> 00013 #include <map> 00014 #include <string> 00015 #include <fstream> 00016 #include <sstream> 00017 #include <iomanip> 00018 #include <assert.h> 00019 00020 #include "simdprof.h" 00021 00022 using namespace gtpin; 00023 using namespace std; 00024 00025 /* ============================================================================================= */ 00026 // Configuration 00027 /* ============================================================================================= */ 00028 Knob<int> knobNumThreadBuckets("num_thread_buckets", 32, "Number of thread buckets. 0 - maximum thread buckets"); 00029 00030 /* ============================================================================================= */ 00031 // SimdProfArgs implementation 00032 /* ============================================================================================= */ 00033 00034 bool SimdProfArgs::operator < (const SimdProfArgs& other) const 00035 { 00036 return std::make_tuple(maskCtrl, execMask, predicate, isSendIns) < 00037 std::make_tuple(other.maskCtrl, other.execMask, other.predicate, other.isSendIns); 00038 } 00039 00040 /* ============================================================================================= */ 00041 // SimdProfSection implementation 00042 /* ============================================================================================= */ 00043 00044 void SimdProfSection::AddInstruction(const IGtIns& ins) 00045 { 00046 uint32_t execMask = ins.ExecMask().Bits(); 00047 GtPredicate predicate = ins.Predicate(); 00048 bool maskCtrl = !ins.IsWriteMaskEnabled(); 00049 bool isSendIns = ins.IsSendMessage(); 00050 00051 auto it = groups.emplace(SimdProfArgs(maskCtrl, execMask, predicate, isSendIns), 0).first; 00052 ++(it->second); 00053 } 00054 00055 /* ============================================================================================= */ 00056 // SimdprofKernelProfile implementation 00057 /* ============================================================================================= */ 00058 00059 SimdProfKernelProfile::SimdProfKernelProfile(const IGtKernel& kernel) : 00060 _name(GlueString(kernel.Name())), _extName(ExtendedKernelName(kernel)), _type(kernel.Type()), _platform(kernel.GpuPlatform()), 00061 _hashId(kernel.HashId()), _simd(kernel.SimdWidth()), _binarySignature(kernel.BinarySignature()), 00062 _totalOpCount(0) {} 00063 00064 void SimdProfKernelProfile::Instrument(IGtKernelInstrument& instrumentor) 00065 { 00066 const IGtGenCoder& coder = instrumentor.Coder(); 00067 const IGtKernel& kernel = instrumentor.Kernel(); 00068 const IGtCfg& cfg = instrumentor.Cfg(); 00069 IGtVregFactory& vregs = coder.VregFactory(); 00070 bool is64BitCounter = Use64BitCounters(coder); 00071 00072 // Initialize virtual registers 00073 _addrReg = vregs.MakeMsgAddrScratch(); 00074 _dataReg = vregs.MakeMsgDataScratch(is64BitCounter ? VREG_TYPE_QWORD : VREG_TYPE_DWORD); 00075 00076 // Identify SimdProf sections and #groups in the kernel 00077 std::vector<SimdProfSection> sections; // All SimdProf sections in the kernel 00078 uint32_t numGroups = 0; // Number of SimdProf groups in the kernel 00079 00080 for (auto bblPtr : cfg.Bbls()) 00081 { 00082 bool isSectionBegin = true; 00083 00084 // Iterate through sections within the current BBL 00085 for (auto insPtr : bblPtr->Instructions()) 00086 { 00087 const IGtIns& ins = *insPtr; 00088 00089 if (ins.Id() < (uint32_t)knobMinInstrumentIns || ins.Id() > (uint32_t)knobMaxInstrumentIns) 00090 { 00091 continue; 00092 } 00093 00094 if (isSectionBegin) 00095 { 00096 sections.emplace_back(ins); 00097 isSectionBegin = false; 00098 } 00099 00100 SimdProfSection& section = sections.back(); 00101 section.AddInstruction(ins); 00102 00103 if (ins.IsFlagModifier() || (ins.Id() == bblPtr->LastIns().Id())) //section end 00104 { 00105 numGroups += (uint32_t)section.groups.size(); 00106 isSectionBegin = true; 00107 } 00108 } 00109 00110 if (isSectionBegin == false) 00111 { 00112 numGroups += (uint32_t)sections.back().groups.size(); 00113 } 00114 } 00115 00116 // Allocate the profile buffer. It will hold single SimdProfRecord per each group in each thread bucket 00117 uint32_t numThreadBuckets = (knobNumThreadBuckets == 0) ? kernel.GenModel().MaxThreadBuckets() : knobNumThreadBuckets; 00118 _profileArray = GtProfileArray(sizeof(SimdProfRecord), numGroups, numThreadBuckets); 00119 _profileArray.Allocate(instrumentor.ProfileBufferAllocator()); 00120 00121 // Instrument SimdProf sections and initialize the _profileData array 00122 for (auto& section : sections) { InstrumentSection(instrumentor, section); } 00123 00124 // Save BBL information for the post processing phase 00125 for (auto bblPtr : cfg.Bbls()) 00126 { 00127 _bblInsInfo.emplace(bblPtr->Id(), std::make_pair(bblPtr->FirstIns().Id(), bblPtr->LastIns().Id())); 00128 } 00129 } 00130 00131 void SimdProfKernelProfile::InstrumentSection(IGtKernelInstrument& instrumentor, const SimdProfSection& section) 00132 { 00133 const IGtGenCoder& coder = instrumentor.Coder(); 00134 IGtInsFactory& insF = coder.InstructionFactory(); 00135 const IGtCfg& cfg = instrumentor.Cfg(); 00136 bool is64BitCounter = Use64BitCounters(coder); 00137 GtReg dataRegL = {_dataReg, sizeof(uint32_t), 0}; // Low 32-bits of the data payload register 00138 00139 // Instrument each SimdProf group: 00140 // - If a group is associated with a non-SEND instructions, compute the SIMD count by aplying CBIT to the SIMD mask. 00141 // - Otherwise, if a group is created for SEND instructions, increment the SIMD count for each SEND whose SIMD mask 00142 // is nonzero. From the EU perspective, SEND instruction is 1 operation, unless the SIMD mask is zero 00143 // Insert each per-group instrumentation procedure at the beginning of the corresponding section 00144 00145 //Insert SimdProf instrumentaion at the beginning of the current section 00146 const IGtIns& ins = cfg.GetInstruction(section.headInsId); 00147 const IGtBbl& bbl = cfg.GetBbl(ins); 00148 00149 for (auto& group : section.groups) 00150 { 00151 GtGenProcedure proc; 00152 const SimdProfArgs& args = group.first; 00153 00154 if (is64BitCounter) 00155 { 00156 // Clear the high 32-bits of the data payload register 00157 GtReg dataRegH = {_dataReg, sizeof(uint32_t), 1}; 00158 proc += insF.MakeMov(dataRegH, 0); 00159 } 00160 00161 // dataRegL = SIMD mask 00162 coder.ComputeSimdMask(proc, dataRegL, args.maskCtrl, args.execMask, args.predicate); 00163 00164 // dataRegL = number SIMD operations executed 00165 if (!args.isSendIns) 00166 { 00167 proc += insF.MakeCbit(dataRegL, dataRegL); 00168 } 00169 else 00170 { 00171 proc += insF.MakeSel(dataRegL, dataRegL, 1).SetCondModifier(GED_COND_MODIFIER_l); // dataRegL = min(dataRegL, 1) 00172 } 00173 00174 // Generate code that updates the SIMD operation counter in the corresponding SimdProfRecord 00175 uint32_t recordNum = (uint32_t)_profileData.size(); 00176 _profileArray.ComputeAddress(coder, proc, _addrReg, recordNum); 00177 00178 proc += insF.MakeAtomicAdd(NullReg(), _addrReg, _dataReg, (is64BitCounter? GED_DATA_TYPE_uq : GED_DATA_TYPE_ud)); 00179 00180 // Insert a new instrumentation routine and append the new group to _profileData 00181 proc.front()->AppendAnnotation(__func__); 00182 SimdProf::Instance()->InstrumentInstruction(instrumentor, ins, GtIpoint::Before(), proc); 00183 _profileData.emplace_back(bbl.Id(), group.second); 00184 } 00185 } 00186 00187 bool SimdProfKernelProfile::Use64BitCounters(const IGtGenCoder& coder) 00188 { 00189 return coder.InstructionFactory().CanAccessAtomically(GED_DATA_TYPE_uq); 00190 } 00191 00192 void SimdProfKernelProfile::ReadProfileData(const IGtProfileBuffer* buffer) 00193 { 00194 GTPIN_ASSERT(_profileData.size() == _profileArray.NumRecords()); 00195 uint32_t recordNum = 0; 00196 00197 // Iterate through all SimdProf groups and read counters of executed operations (channels). 00198 for (auto& group : _profileData) 00199 { 00200 // Accumulate counters for all threads in which this group of instructions was executed 00201 for (uint32_t threadBucket = 0; threadBucket < _profileArray.NumThreadBuckets(); ++threadBucket) 00202 { 00203 SimdProfRecord record; 00204 if (!_profileArray.Read(*buffer, &record, recordNum, 1, threadBucket)) 00205 { 00206 GTPIN_ERROR_MSG(string("SIMDPROF : ") + _name + " : Failed to read from memory buffer"); 00207 } 00208 else 00209 { 00210 // Update counters of executed operations 00211 uint64_t opCount = record.opCount * group.insCount; 00212 group.opCount += opCount; 00213 UpdateBblOpCounter(group.bblId, opCount); 00214 _totalOpCount += opCount; 00215 } 00216 } 00217 recordNum++; 00218 } 00219 } 00220 00221 void SimdProfKernelProfile::UpdateBblOpCounter(BblId bblId, uint64_t incValue) 00222 { 00223 auto it = _bblOpCounts.emplace(bblId, 0).first; 00224 it->second += incValue; 00225 } 00226 00227 std::string SimdProfKernelProfile::ToString() const 00228 { 00229 ostringstream ostr; 00230 ostr << ExtendedName() << endl; 00231 00232 if (!_bblOpCounts.empty()) 00233 { 00234 ostr << setw(10) << "BBL" << setw(15) << "Head Ins ID" << setw(15) << "Tail Ins ID" << setw(20) << "Channels" << endl; 00235 for (const auto& bc : _bblOpCounts) 00236 { 00237 ostr << setw(10) << bc.first << setw(15) << _bblInsInfo.at(bc.first).first << setw(15) << _bblInsInfo.at(bc.first).second << setw(20) << bc.second << endl; 00238 } 00239 ostr << setw(10) << "Total" << setw(15) << _totalOpCount << endl; 00240 } 00241 else 00242 { 00243 ostr << "No channels executed" << endl; 00244 } 00245 00246 return ostr.str(); 00247 } 00248 00249 /* ============================================================================================= */ 00250 // SimdProf implementation 00251 /* ============================================================================================= */ 00252 SimdProf* SimdProf::Instance() 00253 { 00254 static SimdProf instance; 00255 return &instance; 00256 } 00257 00258 void SimdProf::OnKernelBuild(IGtKernelInstrument& instrumentor) 00259 { 00260 const IGtKernel& kernel = instrumentor.Kernel(); 00261 auto it = _kernels.emplace(kernel.Id(), kernel).first; 00262 it->second.Instrument(instrumentor); 00263 } 00264 00265 void SimdProf::OnKernelRun(IGtKernelDispatch& dispatcher) 00266 { 00267 bool isProfileEnabled = false; 00268 00269 const IGtKernel& kernel = dispatcher.Kernel(); 00270 GtKernelExecDesc execDesc; dispatcher.GetExecDescriptor(execDesc); 00271 if (kernel.IsInstrumented() && IsKernelExecProfileEnabled(execDesc, kernel.GpuPlatform())) 00272 { 00273 auto it = _kernels.find(kernel.Id()); 00274 00275 if (it != _kernels.end()) 00276 { 00277 IGtProfileBuffer* buffer = dispatcher.CreateProfileBuffer(); GTPIN_ASSERT(buffer); 00278 SimdProfKernelProfile& kernelProfile = it->second; 00279 const GtProfileArray& profileArray = kernelProfile.GetProfileArray(); 00280 if (profileArray.Initialize(*buffer)) 00281 { 00282 isProfileEnabled = true; 00283 } 00284 else 00285 { 00286 GTPIN_ERROR_MSG(string("SIMDPROF : ") + string(kernel.Name()) + " : Failed to write into memory buffer"); 00287 } 00288 } 00289 } 00290 dispatcher.SetProfilingMode(isProfileEnabled); 00291 } 00292 00293 void SimdProf::OnKernelComplete(IGtKernelDispatch& dispatcher) 00294 { 00295 if (!dispatcher.IsProfilingEnabled()) 00296 { 00297 return; // Do nothing with unprofiled kernel dispatches 00298 } 00299 00300 const IGtKernel& kernel = dispatcher.Kernel(); 00301 auto it = _kernels.find(kernel.Id()); 00302 00303 if (it != _kernels.end()) 00304 { 00305 const IGtProfileBuffer* buffer = dispatcher.GetProfileBuffer(); GTPIN_ASSERT(buffer); 00306 SimdProfKernelProfile& kernelProfile = it->second; 00307 kernelProfile.ReadProfileData(buffer); 00308 } 00309 } 00310 00311 void SimdProf::Fini() 00312 { 00313 string profileDir = GTPin_GetCore()->ProfileDir(); 00314 string filePath = JoinPath(profileDir, "simdprof.txt"); 00315 00316 ofstream fs(filePath); 00317 if (fs.is_open()) 00318 { 00319 fs << ToString(); 00320 fs.close(); 00321 } 00322 else 00323 { 00324 GTPIN_WARNING("SIMDPROF : could not create file: " + filePath); 00325 } 00326 } 00327 00328 string SimdProf::ToString() const 00329 { 00330 ostringstream ostr; 00331 ostr << "Channels (SIMD operations) executed by kernels/BBLs" << endl; 00332 ostr << "===================================================" << endl; 00333 00334 uint64_t totalOpCount = 0; 00335 for (const auto& k : _kernels) 00336 { 00337 ostr << string(100, '-') << endl; 00338 ostr << k.second.ToString() << endl; 00339 totalOpCount += k.second.GetTotalOpCounter(); 00340 } 00341 ostr << "Total number of kernels: " << _kernels.size() << std::endl; 00342 ostr << "Total number of channels (SIMD operations): " << totalOpCount << std::endl; 00343 00344 return ostr.str(); 00345 } 00346 00347 // Define DETACHED_SIMDPROF to use SimdProf functionality in a different tool 00348 #if !defined (DETACHED_SIMDPROF) 00349 /* ============================================================================================= */ 00350 // GTPin_Entry 00351 /* ============================================================================================= */ 00352 EXPORT_C_FUNC void GTPin_Entry(int argc, const char* argv[]) 00353 { 00354 ConfigureGTPin(argc, argv); 00355 SimdProf::Instance()->Register(); 00356 atexit(SimdProf::OnFini); 00357 } 00358 #endif
(Back to the list of all GTPin Sample Tools)
Copyright (C) 2013-2025 Intel Corporation
SPDX-License-Identifier: MIT
1.7.4