|
GTPin
|
The Memorytrace tool generates a dynamic trace of memory addresses that are accessed by the kernel
The trace is provided for each kernel, for each Draw/Enqueue granularity, and for each individual HW thread.
The Memorytrace tool (as well as all GTPin tracing tools) works in two phases, which should be run separately:
To run the pre-processing phase of the Memorytrace tool in its default configuration, use this command:
Profilers/Bin/gtpin -t memorytrace --phase 1 -- app
NOTE: You may run this phase only once per application.
To run the trace gathering phase of the Memorytrace tool (in its default configuration), use this command:
Profilers/Bin/gtpin -t memorytrace --phase 2 -- app
When you run the in-house GTPin Memorytrace tool in its default configuration for pre-processing (phase 1), the tool generates a directory called: GTPIN_PROFILE_MEMORYTRACE0. In addition the tool creates the following two files in the current directory:
memorytrace_pre_process.txt: Provides estimation of trace sizes that can be generated by kernels executed on the device. The memorytrace_pre_process.txt file contains the per-kernel buffer capacity required to hold any trace that can be generated by kernel dispatches. For example, if the kernel generates tarces between 100 and 150 bytes in different dispatches, the buffer space for that kernel should be large enough to hold 150 bytes.This file is an input to the trace gathering phase. It has the following format:
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 142606336
where, for each kernel, the required trace size is provided.
memorytrace_pre_process_dispatch.txt: Provides information about trace sizes and threads profiled in kernel dispatches.This file contains informational data only, and has the following format:
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 142606336 262144 OpenCL 0 0 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 100034048 131072 OpenCL 0 1 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 142606336 262144 OpenCL 0 2 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 100034048 131072 OpenCL 0 3 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 100034048 131072 OpenCL 0 4 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 142606336 262144 OpenCL 0 5 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 100034048 131072 OpenCL 0 6 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 100034048 131072 OpenCL 0 7 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 100034048 131072 OpenCL 0 8 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 142606336 262144 OpenCL 0 9 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 100034048 131072 OpenCL 0 10
where each line corresponds to a single kernel dispatch. The fields have the following meaning (from left to right):
When the Memorytrace tool is run for the trace gathering phase (phase 2), the tool generates the directory: GTPIN_PROFILE_MEMORYTRACE1. GTPin saves the profiling results in the folder: GTPIN_PROFILE_MEMORYTRACE1\Session_Final. The traces for each kernel is saved in a separate sub-folder that has the same name as the kernel. Each Draw/Enqueue command has a separate trace, which is saved in a corresponding sub-directory, as shown in the following screenshot:
Each trace is saved in a compressed binary format within a file called memorytrace_compressed.bin, as shown above. To uncompress the trace, you must run a Profilers\Scripts\uncompress_memorytrace.py Python Software Foundation Python* script, in the following manner (Python 3.5 or above is required):
python3 Profilers\Scripts\uncompress_memtrace.py --input_dir GTPIN_PROFILE_MEMORYTRACE1\Session_Final\BitonicSort\device_0__enqueue_0 --gen 9 -v
NOTE: the -v parameter is optional. It provides additional metadata (such as access size, access type, address width, operand width, etc) for each memory access.
Running the script opens the compressed trace into separate traces, one per HW thread, as shown in the following screenshot:
where the trace generated on each HW thread is saved in a text file named memorytrace___s_0_ss_0_eu_1_tid_5.out. The file name indicates the HW thread topology ID (Slice (S), DualSubSlice (DSS), SubSlice (SS), Execution Unit (EU), and HW thread ID (TID)). The resulting trace is provided in the following format:
BBL-ID DP SEND-OFFSET RW ADDRESS SIZE
======================================================
16 1 0x00000a58 R 0x00004800 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a58 R 0x00004810 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a58 R 0x00004820 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a58 R 0x00004830 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a58 R 0x00004840 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a58 R 0x00004850 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a58 R 0x00004860 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a58 R 0x00004870 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a58 R 0x00004880 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a58 R 0x00004890 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a58 R 0x000048a0 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a58 R 0x000048b0 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a58 R 0x000048c0 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a58 R 0x000048d0 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a58 R 0x000048e0 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a58 R 0x000048f0 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a68 R 0x00004900 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a68 R 0x00004910 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a68 R 0x00004920 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a68 R 0x00004930 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a68 R 0x00004940 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a68 R 0x00004950 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a68 R 0x00004960 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a68 R 0x00004970 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a68 R 0x00004980 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a68 R 0x00004990 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a68 R 0x000049a0 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a68 R 0x000049b0 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a68 R 0x000049c0 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a68 R 0x000049d0 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a68 R 0x000049e0 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
16 1 0x00000a68 R 0x000049f0 16 // R Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
21 1 0x00000ca8 W 0x00004800 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
21 1 0x00000ca8 W 0x00004820 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
21 1 0x00000ca8 W 0x00004840 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
21 1 0x00000ca8 W 0x00004860 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
21 1 0x00000ca8 W 0x00004880 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
21 1 0x00000ca8 W 0x000048a0 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
21 1 0x00000ca8 W 0x000048c0 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
21 1 0x00000ca8 W 0x000048e0 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
21 1 0x00000cb8 W 0x00004900 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
21 1 0x00000cb8 W 0x00004920 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
21 1 0x00000cb8 W 0x00004940 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
21 1 0x00000cb8 W 0x00004960 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
21 1 0x00000cb8 W 0x00004980 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
21 1 0x00000cb8 W 0x000049a0 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
21 1 0x00000cb8 W 0x000049c0 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
21 1 0x00000cb8 W 0x000049e0 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
23 1 0x00000d98 W 0x00004810 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
23 1 0x00000d98 W 0x00004830 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
23 1 0x00000d98 W 0x00004850 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
23 1 0x00000d98 W 0x00004870 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
23 1 0x00000d98 W 0x00004890 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
23 1 0x00000d98 W 0x000048b0 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
23 1 0x00000d98 W 0x000048d0 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
23 1 0x00000d98 W 0x000048f0 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
23 1 0x00000da8 W 0x00004910 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
23 1 0x00000da8 W 0x00004930 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
23 1 0x00000da8 W 0x00004950 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
23 1 0x00000da8 W 0x00004970 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
23 1 0x00000da8 W 0x00004990 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
23 1 0x00000da8 W 0x000049b0 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
23 1 0x00000da8 W 0x000049d0 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
23 1 0x00000da8 W 0x000049f0 16 // W Scatter BTS 32-bit access SIMD16 BTI = 00 Operand = DW Access size = 16
----EOT----
Each line corresponds to a single memory access. The different fields in the output have the following meanings (from left to right):
An EOT indication separates the consequent dispatches of different SW threads of the kernel from the same HW thread.
To map the basic block IDs and instruction offsets to the kernel code, you must look in the Session_Final\ISA sub-folder where GEN assembly of all the kernels are saved.
(Back to the list of all GTPin Sample Tools)
00001 /*========================== begin_copyright_notice ============================ 00002 Copyright (C) 2018-2025 Intel Corporation 00003 00004 SPDX-License-Identifier: MIT 00005 ============================= end_copyright_notice ===========================*/ 00006 00007 /*! 00008 * @file Memorytrace tool definitions 00009 */ 00010 00011 #ifndef MEMORYTRACE_H_ 00012 #define MEMORYTRACE_H_ 00013 00014 #include <list> 00015 #include <map> 00016 #include <vector> 00017 00018 #include "gtpin_api.h" 00019 #include "gtpin_tool_utils.h" 00020 #include "kernel_weight.h" 00021 #include "gen_send_decoder.h" 00022 #include "gt_basic_defs.h" 00023 00024 using namespace gtpin; 00025 00026 /* 00027 * The trace of memory accesses consists of records, each of which holds data pertaining to a single 00028 * execution of a basic block that accesses memory. 00029 * Each trace record comprises two data portions: 00030 * - Header that details architectural state during this BBL execution. The MemTraceRecordHeader 00031 * structure defines layout of the data elements in the header 00032 * - Array of address payload register values in SEND instructions of this BBL. The values are 00033 * stored in the order of the corresponding instructions in the basic block. This portion of the 00034 * record is of a variable length - different BBLs may access different number of data elements. 00035 * Both portions (header and address payloads) are aligned with the GRF register size. 00036 */ 00037 00038 /* ============================================================================================= */ 00039 // Struct MemTraceRecordHeader 00040 /* ============================================================================================= */ 00041 /*! 00042 * Structure of the trace record header. 00043 * The header details architectural state during execution of a BBL that accesses memory. In the 00044 * trace record, the header is followed by the array of address payload registers. 00045 */ 00046 #pragma pack(push, 1) 00047 struct MemTraceRecordHeader 00048 { 00049 uint16_t bblId; ///< BBL identifier 00050 uint16_t sr0; ///< LSB-16 of the State register sr0.0:ud 00051 uint32_t tileId; ///< Tile ID 00052 uint32_t ce; ///< Channel Enable register ce:ud 00053 uint32_t dm; ///< Dispatch Mask register dm:ud 00054 uint32_t res; ///< Reserved 00055 uint32_t tm00; ///< tm0.0:ud 00056 uint32_t tm01; ///< tm0.1:ud 00057 00058 /// @return Size of this structure aligned with the GRF register size in the specified GEN model 00059 static uint32_t AlignedSize(const IGtGenModel& genModel) 00060 { 00061 return (uint32_t)AlignUp(sizeof(MemTraceRecordHeader), genModel.GrfRegSize()); 00062 } 00063 }; 00064 #pragma pack(pop) 00065 00066 /* ============================================================================================= */ 00067 // Struct MemIns 00068 /* ============================================================================================= */ 00069 /// SEND instruction descriptor 00070 struct MemIns 00071 { 00072 InsId id; ///< Instruction ID 00073 uint32_t offset; ///< Offset ot the instruction within the kernel 00074 DcSendMsg msg; ///< Decoded SEND message instruction 00075 }; 00076 using MemInsList = std::list<MemIns>; 00077 00078 /* ============================================================================================= */ 00079 // Class BblMemAccessInfo 00080 /* ============================================================================================= */ 00081 /*! 00082 * Class that retrieves information about memory accesses in the basic block, and provides an 00083 * interface for querying this information 00084 */ 00085 class BblMemAccessInfo 00086 { 00087 public: 00088 BblMemAccessInfo() : _recordSize(0) {} 00089 00090 /// Populate this object with the information about memory accesses in the specified basic block 00091 BblMemAccessInfo(const IGtKernelInstrument& kernelInstrument, const IGtBbl& bbl) : BblMemAccessInfo() { Build(kernelInstrument, bbl); } 00092 BblMemAccessInfo& Build(const IGtKernelInstrument& kernelInstrument, const IGtBbl& bbl); 00093 00094 /// @return List of memory instructions in the basic block 00095 const MemInsList& MemInstructions() const { return _memInstructions; } 00096 00097 /// @return Size, in bytes, of the trace record in the basic block 00098 uint32_t RecordSize() const { return _recordSize; } 00099 00100 /// @return true if basic block does not contain any memory instruction of interest 00101 bool IsEmpty() const { return (_recordSize == 0); } 00102 00103 private: 00104 MemInsList _memInstructions; ///< List of memory instructions in the basic block 00105 uint32_t _recordSize; ///< Size, in bytes, of the trace record in the basic block 00106 }; 00107 00108 /* ============================================================================================= */ 00109 // Class KernelMemAccessInfo 00110 /* ============================================================================================= */ 00111 /*! 00112 * Class that retrieves information about memory accesses in the kernel, and provides an 00113 * interface for querying this information 00114 */ 00115 class KernelMemAccessInfo 00116 { 00117 public: 00118 KernelMemAccessInfo() : _maxRecordSize(0) {} 00119 00120 /// Populate this object with the information about memory accesses in the specified kernel 00121 explicit KernelMemAccessInfo(const IGtKernelInstrument& kernelInstrument) { Build(kernelInstrument); } 00122 KernelMemAccessInfo& Build(const IGtKernelInstrument& kernelInstrument); 00123 00124 /// @return Information about memory accesses in the kernel, indexed by BBL identifiers 00125 using MemAccessMap = std::map<BblId, BblMemAccessInfo>; 00126 const MemAccessMap& GetMemAccessMap() const { return _memAccessMap; } 00127 00128 /// @return Information about memory accesses in the specified BBL, or NULL if requested information is not found 00129 const BblMemAccessInfo* GetBblInfo(BblId bblId) const; 00130 00131 /// @return Number of basic blocks that access memory 00132 uint32_t NumMemBbls() const { return (uint32_t)_memAccessMap.size(); } 00133 00134 /// @return Maximum size, in bytes, of the trace record in the kernel 00135 uint32_t MaxRecordSize() const { return _maxRecordSize; } 00136 00137 private: 00138 MemAccessMap _memAccessMap; ///< Information about memory accesses, indexed by BBL identifiers 00139 uint32_t _maxRecordSize; ///< Max. size, in bytes, of the trace record in the kernel 00140 }; 00141 00142 /* ============================================================================================= */ 00143 // Class MemTraceDispatch 00144 /* ============================================================================================= */ 00145 /*! 00146 * Class that holds memory trace collected during a single kernel dispatch 00147 */ 00148 class MemTraceDispatch 00149 { 00150 public: 00151 /// Construct a MemTraceDispatch object with the empty trace 00152 explicit MemTraceDispatch(const IGtKernelDispatch& dispatch) : _isTrimmed(false) { dispatch.GetExecDescriptor(_kernelExecDesc); } 00153 00154 /// Read the entire trace from the specified profile buffer into this object 00155 bool ReadTrace(const GtProfileTrace& traceAccessor, const IGtProfileBuffer& profileBuffer); 00156 00157 const GtKernelExecDesc& KernelExecDesc() const { return _kernelExecDesc; } ///< @return Descriptor of this kernel dispatch 00158 uint32_t Size() const { return (uint32_t)_rawTrace.size(); } ///< @return Trace size in bytes 00159 const uint8_t* Data() const { return _rawTrace.data(); } ///< @return Trace data collected in this dispatch 00160 uint8_t* Data() { return _rawTrace.data(); } ///< @return Trace data collected in this dispatch 00161 bool IsEmpty() const; ///< @return true if the trace is empty 00162 bool IsTrimmed() const { return _isTrimmed; } ///< @return true if the trace has been trimmed 00163 00164 private: 00165 GtKernelExecDesc _kernelExecDesc; ///< Kernel execution descriptor 00166 std::vector<uint8_t> _rawTrace; ///< Trace data collected in this kernel dispatch 00167 bool _isTrimmed; ///< true if the trace has been trimmed to avoid buffer overflow 00168 }; 00169 00170 /* ============================================================================================= */ 00171 // Class MemTraceKernel 00172 /* ============================================================================================= */ 00173 /*! 00174 * Class that contains 00175 * - Static information about memory accesses in the kernel 00176 * - Collection of memory traces recorded by kernel dispatches 00177 */ 00178 class MemTraceKernel 00179 { 00180 public: 00181 MemTraceKernel() = default; 00182 00183 /// Construct a MemTraceKernel object intended to hold traces of the specified kernel 00184 explicit MemTraceKernel(const IGtKernelInstrument& kernelInstrument, uint32_t numTiles); 00185 00186 /*! 00187 * Read a trace recorded by the specified kernel dispatch. Create and add the corresponding MemTraceDispatch 00188 * instance to this object 00189 */ 00190 MemTraceDispatch& AddMemTrace(IGtKernelDispatch& kernelDispatch); 00191 00192 std::string Name() const { return _name; } ///< @return Kernel's name 00193 std::string ExtendedName() const { return _extName; } ///< @return Kernel's extended name 00194 std::string UniqueName() const { return _uniqueName; } ///< @return Kernel's unique name 00195 const GtGpuPlatform Platform() const { return _platform; } ///< @return Kernel's platform 00196 const IGtGenModel& GenModel() const { return GetGenModel(_genId); } ///< @return Kernel's GEN model 00197 const GtProfileTrace& TraceAccessor() const { return _traceAccessor; } ///< @return Trace accessor 00198 void DumpAsm() const; ///< Dump kernel's assembly text to file 00199 00200 /// @return true, if tracing of this kernel is enabled 00201 uint32_t IsEnabled() const { return (_traceAccessor.MaxTraceSize() != 0); } 00202 00203 /// @return Static information about kernel's memory accesses 00204 const KernelMemAccessInfo& GetMemAccessInfo() const { return _memAccessInfo; } 00205 00206 /// @return Traces collected in kernel's dispatches 00207 typedef std::list<MemTraceDispatch> Traces; 00208 const Traces& GetTraces() const { return _traces; } 00209 00210 /// @return Number of tiles 00211 uint32_t NumTiles() const { return _numTiles; } 00212 private: 00213 std::string _name; ///< Kernel's name 00214 std::string _uniqueName; ///< Kernel's unique name 00215 std::string _extName; ///< Kernel's extended name 00216 GtGpuPlatform _platform; ///< Kernel's platform 00217 GtGenModelId _genId; ///< Identifier of the GEN model, the kernel is compiled for 00218 std::string _asmText; ///< Kernel's assembly text 00219 KernelMemAccessInfo _memAccessInfo; ///< Static information about kernel's memory accesses 00220 GtProfileTrace _traceAccessor; ///< Trace accessor 00221 Traces _traces; ///< Traces collected in kernel's dispatches 00222 uint32_t _numTiles; ///< The number of supported tiles 00223 }; 00224 00225 /* ============================================================================================= */ 00226 // Class MemTrace 00227 /* ============================================================================================= */ 00228 /*! 00229 * Implementation of the IGtTool interface for the Memorytrace tool 00230 */ 00231 class MemTrace : public GtTool 00232 { 00233 public: 00234 /// Implementation of the IGtTool interface 00235 const char* Name() const { return "memorytrace"; } 00236 00237 void OnKernelBuild(IGtKernelInstrument& instrumentor); 00238 void OnKernelRun(IGtKernelDispatch& dispatcher); 00239 void OnKernelComplete(IGtKernelDispatch& dispatcher); 00240 00241 public: 00242 static void OnFini(); ///< Callback function registered with atexit() 00243 static MemTrace* Instance(); ///< @return Single instance of this class 00244 00245 private: 00246 MemTrace() = default; 00247 MemTrace(const MemTrace&) = delete; 00248 MemTrace& operator = (const MemTrace&) = delete; 00249 ~MemTrace() = default; 00250 00251 /*! 00252 * Generate instrumentation for the specified basic block 00253 * @param[in] instrumentor Interface of the kernel being instrumented 00254 * @param[in] bbl Basic block to be instrumented 00255 * @param[in] memTraceKernel Object that holds information about memory accesses in the kernel 00256 * @return success/failure status 00257 */ 00258 bool InstrumentBbl(IGtKernelInstrument& instrumentor, const IGtBbl& bbl, const MemTraceKernel& memTraceKernel); 00259 00260 /*! 00261 * Generate procedure that allocates space for the new trace record in the trace and stores the record header 00262 * for the specified basic block. The procedure sets _offsetReg equal to the offset of the location within the 00263 * profile buffer immediately following the record header. 00264 * If new record can not be allocated due to the trace capacity limitations, the procedure zeroes _offsetReg. 00265 * 00266 * @param[in, out] proc Procedure, the generated code is appended to 00267 * @param[in] coder GEN code generator 00268 * @param[in] bbl Basic block being instrumented 00269 * @param[in] memTraceKernel Object that holds information about memory accesses in the kernel 00270 * @param[in] recordSize Size of the new record, in bytes 00271 * @param[in] firstSendInBbl Indicates whether this is the first Send instruction in BBL 00272 */ 00273 void StoreRecordHeader(GtGenProcedure& proc, const IGtGenCoder& coder, const IGtBbl& bbl, 00274 const MemTraceKernel& memTraceKernel, uint32_t recordSize, bool firstSendInBbl = true); 00275 00276 /*! 00277 * Generate procedure that stores the specified range of GRF registers in the trace. 00278 * On entry, the procedure assumes that _offsetReg holds offset of the location within the profile buffer 00279 * at which address payload is about to be stored. 00280 * On exit, the procedure advances _offsetReg to the location just after the stored register range. 00281 * If input value of _offsetReg is zero. no registers are stored, and _offsetReg retains value zero. 00282 * 00283 * @param[in, out] proc Procedure, the generated code is appended to 00284 * @param[in] coder GEN code generator 00285 * @param[in] firstRegNum First GRF register to be stored 00286 * @param[in] numRegs Number of registers to be stored 00287 */ 00288 void StoreRegRange(GtGenProcedure& proc, const IGtGenCoder& coder, uint32_t firstRegNum, uint32_t numRegs); 00289 00290 private: 00291 std::map<GtKernelId, MemTraceKernel> _kernels; ///< Collection of traces per kernel 00292 00293 GtReg _addrReg; ///< Virtual register that holds address within profile buffer 00294 GtReg _dataReg; ///< Virtual register that holds data to be read from/written to profile buffer 00295 GtReg _offsetReg; ///< Virtual register that holds offset within the trace 00296 GtReg _tileIdReg; ///< Virtual register that holds tile ID 00297 }; 00298 00299 /* ============================================================================================= */ 00300 // Class MemoryTracePreProcessor 00301 /* ============================================================================================= */ 00302 /*! 00303 * Class that computes per-kernel trace sizes in the preprocessing phase, and provides access to 00304 * this data in the trace gathering phase 00305 */ 00306 class MemoryTracePreProcessor : public KernelWeight 00307 { 00308 public: 00309 uint64_t TraceSize(const std::string& extKernelName) const; ///< Given extended kernel name, return the trace size in bytes 00310 static MemoryTracePreProcessor* Instance(); ///< @return Single instance of this class 00311 static void OnFini(); ///< Callback function registered with atexit() 00312 00313 private: 00314 MemoryTracePreProcessor(); 00315 MemoryTracePreProcessor(const MemoryTracePreProcessor&) = delete; 00316 MemoryTracePreProcessor& operator = (const MemoryTracePreProcessor&) = delete; 00317 00318 private: 00319 /// KernelWeight interface overrides (@see description of KernelWeight functions) 00320 uint32_t GetBblWeight(IGtKernelInstrument& kernelInstrument, const IGtBbl& bbl) const; 00321 void AggregateDispatchCounters(KernelWeightCounters& kc, KernelWeightCounters dc) const; 00322 00323 private: 00324 KernelWeightProfileData _kernelCounters; ///< Per-kernel counters of required trace records; collected in preprocessing phase 00325 00326 static const char* _kernelPreProcessFileName; ///< Name of the file that contains preprocessing data per kernel 00327 static const char* _dispatchPreProcessFileName; ///< Name of the file that contains preprocessing data per kernel dispatch 00328 }; 00329 00330 /* ============================================================================================= */ 00331 // Class MemoryTracePostProcessor 00332 /* ============================================================================================= */ 00333 /*! 00334 * Function object that processes kernel traces - stores them in files within the profile directory: 00335 * 00336 * kernel_name 00337 * | 00338 * |- kernel_dispatch_1 00339 * |- memorytrace_compressed.bin 00340 * |- kernel_dispatch_2 00341 * |- memorytrace_compressed.bin 00342 * The .bin trace files can be uncompressed by the uncompress_memtrace.exe utility. 00343 * 00344 * Format of .bin trace files: 00345 * - Static information: 00346 * - Number of BBLs that access memory 00347 * - For each BBL that accesses memory: 00348 * - BBL ID 00349 * - Number of SEND instructions in this basic block 00350 * - For each SEND instruction: 00351 * - Decoded information including offset, address model, address type, address payload size, etc 00352 * 00353 * - Dynamic trace data: 00354 * - Number of HW threads in which the trace was collected 00355 * - For each HW thread: 00356 * - HW Thread ID (in the format of sr0.0) 00357 * - Number of records collected for this HW thread 00358 * - All the records collected for this HW thread 00359 */ 00360 class MemoryTracePostProcessor 00361 { 00362 public: 00363 /// Construct a MemoryTracePostProcessor object for the specified collection of kernel traces 00364 MemoryTracePostProcessor(const IGtCore& gtpinCore, const MemTraceKernel& memTraceKernel); 00365 00366 /// Process all kernel traces associated with this object - store them in files within the profile directory 00367 bool operator()(); 00368 00369 private: 00370 /// Store the specified trace in the specified file stream 00371 void StoreTrace(const MemTraceDispatch& trace, std::ofstream& fs); 00372 00373 /// Store static information about memory accesses in the kernel 00374 void StoreMemAccessInfo(std::ofstream& fs); 00375 00376 /// Store Global Thread Identifier 00377 void StoreGlobalTid(uint32_t gtid, std::ofstream& fs); 00378 00379 /// Store the specified value in the specified file stream in the binary format 00380 template <typename T> void Store(const T& val, std::ofstream& fs) { fs.write((const char*)&val, sizeof(val)); } 00381 00382 private: 00383 struct TraceRecord ///< Reference to the trace record 00384 { 00385 const MemTraceRecordHeader* header; ///< Pointer to the header of the record 00386 uint32_t size; ///< Size of the record in bytes, including header 00387 }; 00388 using TraceRecordList = std::list<TraceRecord>; ///< List of references to trace records 00389 using PerTileTraceRecords = std::vector<TraceRecordList>; ///< Per tile trace records 00390 ///The SEND instruction description stored in the trace 00391 struct MemInsInfo 00392 { 00393 explicit MemInsInfo(const MemIns& memIns); ///< Construct packed variant of the specified MemIns structure 00394 00395 uint32_t offset; ///< Offset of the instruction within the kernel 00396 uint32_t isWrite; ///< Is write (true) or read (false) operation 00397 uint32_t isBlock2D; ///< Is block 2D load or store access 00398 uint32_t isScatter; ///< Is scatter 00399 uint32_t isBTS; ///< Is Binding Table State address model 00400 uint32_t isSLM; ///< Is Shared Local Memory access 00401 uint32_t isScratch; ///< Is access to scratch block 00402 uint32_t isAtomic; ///< Is atomic operation 00403 uint32_t isFence; ///< Is fence 00404 uint32_t addressWidth; ///< Address width (32bit or 64bit) 00405 uint32_t simdWidth; ///< SIMD width 00406 uint32_t bti; ///< BTI value 00407 uint32_t addrPayloadLength; ///< Address payload length 00408 uint32_t dataPort; ///< Data port ID: 0=DP0, 1=DP1, 2=UGM, 3=UGML, 4=TGM, 5=SLM, 6-all others 00409 uint32_t isEOT; ///< Is SEND.EOT 00410 uint32_t isMedia; ///< Is media block access 00411 uint32_t elementSize; ///< Data element size in bytes 00412 uint32_t numElements; ///< Number of elements 00413 uint32_t execSize; ///< Instruction execution size 00414 uint32_t channelOffset; ///< Channel mask offset 00415 }; 00416 00417 private: 00418 const MemTraceKernel* _kernel; ///< Kernel&traces to be processed 00419 const KernelMemAccessInfo* _memAccessInfo; ///< Static information about memory accesses in the kernel 00420 std::string _kernelDir; ///< Directory to store kernel's trace files 00421 std::vector<PerTileTraceRecords> _threadTraceRecords;///< Lists of trace records, indexed by the thread ID 00422 00423 static const char* _traceFileName; ///< Name of the file to store trace in 00424 }; 00425 00426 #endif
00001 /*========================== begin_copyright_notice ============================ 00002 Copyright (C) 2018-2025 Intel Corporation 00003 00004 SPDX-License-Identifier: MIT 00005 ============================= end_copyright_notice ===========================*/ 00006 00007 /*! 00008 * @file Implementation of the Memorytrace tool 00009 */ 00010 00011 #include <fstream> 00012 #include <string> 00013 00014 #include "memorytrace.h" 00015 #include "gtpin_tool_utils.h" 00016 00017 using namespace gtpin; 00018 using namespace std; 00019 00020 /* ============================================================================================= */ 00021 // Configuration 00022 /* ============================================================================================= */ 00023 Knob<int> gKnobMaxTraceBufferInMB("max_buffer_mb", 3072, "memorytrace - the max allowed size of the trace buffer per kernel in MB\n"); 00024 Knob<int> gKnobPhase("phase", 0, "tracing tool - processing phase\n { 1 - pre-processing, 2 - processing - trace gathering} "); 00025 Knob<bool> gKnobTimeStamp("include_timestamp", false, "true - includes time stamp, false - doesn't include time stamp"); 00026 00027 /* ============================================================================================= */ 00028 // BblMemAccessInfo implementation 00029 /* ============================================================================================= */ 00030 BblMemAccessInfo& BblMemAccessInfo::Build(const IGtKernelInstrument& kernelInstrument, const IGtBbl& bbl) 00031 { 00032 const IGtCfg& cfg = kernelInstrument.Cfg(); 00033 uint32_t addrPayloadSize = 0; 00034 uint32_t headerSize = 0; 00035 const IGtGenModel& genModel = kernelInstrument.Kernel().GenModel(); 00036 00037 _memInstructions.clear(); 00038 for (auto insPtr : bbl.Instructions()) 00039 { 00040 const IGtIns& ins = *insPtr; 00041 00042 if ((ins.Id() < uint32_t(knobMinInstrumentIns)) || (ins.Id() > uint32_t(knobMaxInstrumentIns))) 00043 { 00044 continue; 00045 } 00046 if (ins.IsSendMessage()) 00047 { 00048 DcSendMsg msg = DcSendMsg::Decode(ins.GetGedIns()); 00049 if (msg.IsValid() || ins.IsEot()) 00050 { 00051 addrPayloadSize += (msg.AddrPayloadLength() * genModel.GrfRegSize()); // Accumulate SEND address payloads 00052 headerSize = MemTraceRecordHeader::AlignedSize(genModel); // Add one trace record header per BBL 00053 _memInstructions.emplace_back(MemIns{ins.Id(), cfg.GetInstructionOffset(ins), std::move(msg)}); 00054 } 00055 } 00056 } 00057 _recordSize = addrPayloadSize + headerSize; 00058 if (!_memInstructions.empty() && gKnobTimeStamp) 00059 { 00060 _recordSize = addrPayloadSize + (uint32_t)_memInstructions.size() * headerSize; 00061 } 00062 return *this; 00063 } 00064 00065 /* ============================================================================================= */ 00066 // KernelMemAccessInfo implementation 00067 /* ============================================================================================= */ 00068 KernelMemAccessInfo& KernelMemAccessInfo::Build(const IGtKernelInstrument& kernelInstrument) 00069 { 00070 const IGtCfg& cfg = kernelInstrument.Cfg(); 00071 00072 _memAccessMap.clear(); 00073 _maxRecordSize = 0; 00074 for (auto bblPtr : cfg.Bbls()) 00075 { 00076 const IGtBbl& bbl = *bblPtr; 00077 BblMemAccessInfo bblMemAccessInfo(kernelInstrument, bbl); 00078 if (!bblMemAccessInfo.IsEmpty()) 00079 { 00080 _maxRecordSize = std::max(_maxRecordSize, bblMemAccessInfo.RecordSize()); 00081 _memAccessMap.emplace(bbl.Id(), std::move(bblMemAccessInfo)); 00082 } 00083 } 00084 return *this; 00085 } 00086 00087 const BblMemAccessInfo* KernelMemAccessInfo::GetBblInfo(BblId bblId) const 00088 { 00089 auto it = _memAccessMap.find(bblId); 00090 return (it == _memAccessMap.end()) ? nullptr: &(it->second); 00091 } 00092 00093 /* ============================================================================================= */ 00094 // MemTraceDispatch implementation 00095 /* ============================================================================================= */ 00096 bool MemTraceDispatch::ReadTrace(const GtProfileTrace& traceAccessor, const IGtProfileBuffer& profileBuffer) 00097 { 00098 uint32_t traceSize = traceAccessor.Size(profileBuffer); 00099 _rawTrace.resize(traceSize); 00100 _isTrimmed = traceAccessor.IsTruncated(profileBuffer); 00101 return traceAccessor.Read(profileBuffer, _rawTrace.data(), 0, traceSize); 00102 } 00103 00104 bool MemTraceDispatch::IsEmpty() const 00105 { 00106 return _rawTrace.size() < sizeof(MemTraceRecordHeader); 00107 } 00108 00109 /* ============================================================================================= */ 00110 // MemTraceKernel implementation 00111 /* ============================================================================================= */ 00112 MemTraceKernel::MemTraceKernel(const IGtKernelInstrument& kernelInstrument, uint32_t numTiles) : _numTiles(numTiles) 00113 { 00114 const IGtKernel& kernel = kernelInstrument.Kernel(); 00115 const IGtCfg& cfg = kernelInstrument.Cfg(); 00116 00117 _name = GlueString(kernel.Name()); 00118 _extName = ExtendedKernelName(kernel); 00119 _platform = kernel.GpuPlatform(); 00120 _genId = kernel.GenModel().Id(); 00121 _asmText = CfgAsmText(cfg); 00122 _uniqueName = kernel.UniqueName(); 00123 00124 // Build static information about memory accesses in the kernel 00125 _memAccessInfo.Build(kernelInstrument); 00126 00127 // Initialize trace accessor. The trace capacity is expected to be computed during the preprocessing phase. 00128 uint64_t traceCapacity = MemoryTracePreProcessor::Instance()->TraceSize(_extName); 00129 if (traceCapacity == 0) 00130 { 00131 // Unknown trace capacity 00132 GTPIN_WARNING("MEMORYTRACE: unknown trace capacity for kernel " + _name + ". Assuming the kernel is filtered out. " 00133 "Allocating a buffer of 8KB size. If the kernel is supposed to run, expect buffer overflow. " 00134 "In this case, please re-run phase 1 and make sure the kernel is not filtered out."); 00135 traceCapacity = 0x2000; 00136 } 00137 else 00138 { 00139 traceCapacity += 0x2000; // Add some space to account for possible fluctuation of trace sizes between phases 00140 if (traceCapacity > UINT32_MAX) 00141 { 00142 GTPIN_WARNING("MEMORYTRACE: The kernel " + _name + " exceedeed maximum trace capacity."); 00143 traceCapacity = UINT32_MAX; 00144 } 00145 } 00146 if (traceCapacity > (uint64_t(gKnobMaxTraceBufferInMB) * 0x100000)) 00147 { 00148 GTPIN_WARNING("MEMORYTRACE: required capacity (" + DecStr(traceCapacity) + ") for kernel " + _name + " is too big - cut to " + DecStr(gKnobMaxTraceBufferInMB) + "MB. " 00149 "Expect the final trace to contain partial data."); 00150 traceCapacity = uint64_t(gKnobMaxTraceBufferInMB) * 0x100000; 00151 } 00152 uint32_t maxRecordSize = _memAccessInfo.MaxRecordSize(); 00153 _traceAccessor = GtProfileTrace((uint32_t)traceCapacity, maxRecordSize); 00154 _traceAccessor.Allocate(kernelInstrument.ProfileBufferAllocator()); 00155 } 00156 00157 MemTraceDispatch& MemTraceKernel::AddMemTrace(IGtKernelDispatch& kernelDispatch) 00158 { 00159 // Create a new MemTraceDispatch object and store the entire trace within this object 00160 _traces.emplace_back(kernelDispatch); 00161 MemTraceDispatch& memTraceDispatch = _traces.back(); 00162 if (!memTraceDispatch.ReadTrace(_traceAccessor, *kernelDispatch.GetProfileBuffer())) 00163 { 00164 GTPIN_ERROR_MSG("MEMORYTRACE: Failed to read profile buffer for kernel " + _name); 00165 } 00166 return memTraceDispatch; 00167 } 00168 00169 void MemTraceKernel::DumpAsm() const 00170 { 00171 DumpKernelAsmText(_name, _uniqueName, _asmText); 00172 } 00173 00174 /* ============================================================================================= */ 00175 // MemTrace implementation 00176 /* ============================================================================================= */ 00177 MemTrace* MemTrace::Instance() 00178 { 00179 static MemTrace instance; 00180 return &instance; 00181 } 00182 00183 void MemTrace::OnKernelBuild(IGtKernelInstrument& instrumentor) 00184 { 00185 const IGtKernel& kernel = instrumentor.Kernel(); 00186 uint32_t numTiles = (instrumentor.Coder().IsTileIdSupported()) ? GTPin_GetCore()->GenArch().MaxTiles(kernel.GpuPlatform()) : 1; 00187 00188 00189 // Handle common instrumentation knobs 00190 HandleCommonInstrumnetationKnobs(instrumentor); 00191 // Create new KernelData object and add it to the data base 00192 auto ret = _kernels.emplace(piecewise_construct, forward_as_tuple(kernel.Id()), forward_as_tuple(instrumentor, numTiles)); 00193 if (ret.second) 00194 { 00195 MemTraceKernel& memTraceKernel = (*ret.first).second; 00196 if (!memTraceKernel.IsEnabled()) 00197 { 00198 GTPIN_WARNING("MEMORYTRACE: The trace won't be generated for kernel " + memTraceKernel.Name()); 00199 return; 00200 } 00201 00202 const IGtCfg& cfg = instrumentor.Cfg(); 00203 IGtVregFactory& vregs = instrumentor.Coder().VregFactory(); 00204 00205 // Initialize virtual registers 00206 _addrReg = vregs.MakeMsgAddrScratch(); 00207 _dataReg = vregs.MakeMsgDataScratch(VREG_TYPE_HWORD); 00208 _offsetReg = vregs.Make(VREG_TYPE_DWORD); 00209 _tileIdReg = vregs.Make(VREG_TYPE_DWORD); 00210 00211 GtGenProcedure loadTileIdProc; 00212 instrumentor.Coder().LoadTileId(loadTileIdProc, _tileIdReg); 00213 00214 // Instrument kernel entries 00215 instrumentor.InstrumentEntries(loadTileIdProc); 00216 00217 // Instrument basic blocks 00218 for (auto bblPtr : cfg.Bbls()) 00219 { 00220 InstrumentBbl(instrumentor, *bblPtr, memTraceKernel); 00221 } 00222 } 00223 } 00224 00225 void MemTrace::OnKernelRun(IGtKernelDispatch& dispatcher) 00226 { 00227 bool isProfileEnabled = false; 00228 00229 const IGtKernel& kernel = dispatcher.Kernel(); 00230 GtKernelExecDesc execDesc; dispatcher.GetExecDescriptor(execDesc); 00231 if (kernel.IsInstrumented() && IsKernelExecProfileEnabled(execDesc, kernel.GpuPlatform(), kernel.Name().Get())) 00232 { 00233 auto it = _kernels.find(kernel.Id()); 00234 if (it != _kernels.end()) 00235 { 00236 const MemTraceKernel& memTraceKernel = it->second; 00237 if (memTraceKernel.IsEnabled()) 00238 { 00239 IGtProfileBuffer* buffer = dispatcher.CreateProfileBuffer(); GTPIN_ASSERT(buffer); 00240 const GtProfileTrace& traceAccessor = memTraceKernel.TraceAccessor(); 00241 if (traceAccessor.Initialize(*buffer)) 00242 { 00243 isProfileEnabled = true; 00244 } 00245 else 00246 { 00247 GTPIN_ERROR_MSG("MEMORYTRACE: Failed to write into memory buffer for kernel " + string(kernel.Name())); 00248 } 00249 } 00250 } 00251 } 00252 dispatcher.SetProfilingMode(isProfileEnabled); 00253 } 00254 00255 void MemTrace::OnKernelComplete(IGtKernelDispatch& dispatcher) 00256 { 00257 if (!dispatcher.IsProfilingEnabled()) 00258 { 00259 return; // Do nothing with unprofiled kernel dispatches 00260 } 00261 00262 const IGtKernel& kernel = dispatcher.Kernel(); 00263 auto it = _kernels.find(kernel.Id()); 00264 if (it != _kernels.end()) 00265 { 00266 // Read the trace from the profile buffer 00267 MemTraceKernel& memTraceKernel = it->second; 00268 memTraceKernel.AddMemTrace(dispatcher); 00269 } 00270 } 00271 00272 bool MemTrace::InstrumentBbl(IGtKernelInstrument& instrumentor, const IGtBbl& bbl, const MemTraceKernel& memTraceKernel) 00273 { 00274 const KernelMemAccessInfo& kernelMemAccessInfo = memTraceKernel.GetMemAccessInfo(); 00275 const BblMemAccessInfo* bblInfoPtr = kernelMemAccessInfo.GetBblInfo(bbl.Id()); 00276 if ((bblInfoPtr == nullptr) || bblInfoPtr->IsEmpty()) 00277 { 00278 return false; // The basic block does not contain any memory instruction of interest 00279 } 00280 00281 const IGtGenCoder& coder = instrumentor.Coder(); 00282 const IGtCfg& cfg = instrumentor.Cfg(); 00283 const BblMemAccessInfo& bblInfo = *bblInfoPtr; 00284 00285 // Generate code that allocates space for the new record in the trace and stores the trace record header. 00286 // Insert this procedure before the first memory access in the basic block. 00287 GtGenProcedure headerProc; 00288 const IGtIns& firstMemIns = cfg.GetInstruction(bblInfo.MemInstructions().front().id); 00289 StoreRecordHeader(headerProc, coder, bbl, memTraceKernel, bblInfo.RecordSize()); 00290 instrumentor.InstrumentInstruction(firstMemIns, GtIpoint::Before(), headerProc); 00291 00292 // Generate code that stores address payload registers of SEND instructions in the basic block 00293 for (const auto& insInfo : bblInfo.MemInstructions()) 00294 { 00295 GtGenProcedure proc; 00296 const IGtIns& ins = cfg.GetInstruction(insInfo.id); 00297 const DcSendMsg& msg = insInfo.msg; 00298 GtRegNum src0RegNum = msg.Src0(); 00299 GtRegNum src1RegNum = msg.Src1(); 00300 uint32_t addrPayloadLength = msg.AddrPayloadLength(); 00301 uint32_t src0Length = msg.Src0Length(); 00302 00303 if (gKnobTimeStamp && (ins.Id() != firstMemIns.Id())) 00304 { 00305 GtGenProcedure headerProc; 00306 StoreRecordHeader(headerProc, coder, bbl, memTraceKernel, bblInfo.RecordSize(), false); 00307 instrumentor.InstrumentInstruction(ins, GtIpoint::Before(), headerProc); 00308 } 00309 00310 // If size of the SRC0 register range is greater or equal to the address payload length, the SRC0 range contains 00311 // the entire address payload. 00312 // Otherwise the address payload is split between SRC0 and SRC1 register ranges 00313 if (src0Length >= addrPayloadLength) 00314 { 00315 StoreRegRange(proc, coder, src0RegNum, addrPayloadLength); 00316 } 00317 else 00318 { 00319 StoreRegRange(proc, coder, src0RegNum, src0Length); 00320 if (src1RegNum.IsValid()) 00321 { 00322 StoreRegRange(proc, coder, src1RegNum, addrPayloadLength - src0Length); 00323 } 00324 } 00325 instrumentor.InstrumentInstruction(ins, GtIpoint::Before(), proc); 00326 } 00327 00328 return true; 00329 } 00330 00331 void MemTrace::StoreRecordHeader(GtGenProcedure& proc, const IGtGenCoder& coder, const IGtBbl& bbl, 00332 const MemTraceKernel& memTraceKernel, uint32_t recordSize, bool firstSendInBbl) 00333 { 00334 /// @return Subregister of _dataReg intended to hold the value of the MemTraceRecordHeader field whose offset is specified 00335 auto fieldReg = [&](uint32_t fieldOffset) -> GtReg { return GtReg(_dataReg, sizeof(uint32_t), fieldOffset / sizeof(uint32_t)); }; 00336 00337 GtReg idFieldReg = fieldReg(offsetof(MemTraceRecordHeader, bblId)); 00338 GtReg tileIdFieldReg = fieldReg(offsetof(MemTraceRecordHeader, tileId)); 00339 GtReg ceFieldReg = fieldReg(offsetof(MemTraceRecordHeader, ce)); 00340 GtReg dmFieldReg = fieldReg(offsetof(MemTraceRecordHeader, dm)); 00341 GtReg tm0FieldReg = fieldReg(offsetof(MemTraceRecordHeader, tm00)); 00342 00343 IGtInsFactory& insF = coder.InstructionFactory(); 00344 GtPredicate predicate(FlagReg(0)); 00345 00346 // Set values of MemTraceRecordHeader fields in _dataReg 00347 if (firstSendInBbl) 00348 { 00349 proc += insF.MakeShl(idFieldReg, StateReg(0), 16); // idFieldReg[16:31] = sr0.0 00350 proc += insF.MakeAdd(idFieldReg, idFieldReg, GtImmU32(bbl.Id())); // idFieldReg[0:15] = bbl.Id() 00351 proc += insF.MakeMov(tileIdFieldReg, _tileIdReg); // Tile ID 00352 proc += insF.MakeMov(ceFieldReg, ChannelEnableReg()); // ceFieldReg = ChannelEnableReg() 00353 proc += insF.MakeMov(dmFieldReg, DispatchMaskReg()); // dmFieldReg = DispatchMaskReg() 00354 } 00355 if (gKnobTimeStamp) 00356 { 00357 proc += insF.MakeMov(tm0FieldReg, GtRegRegion(TimeStampReg(), GtStride(2, 2, 1), GED_DATA_TYPE_ud), { 2 }); 00358 } 00359 00360 // Allocate new record in the trace. 00361 if (firstSendInBbl) 00362 { 00363 // Set _offsetReg = offset of the allocated record in the profile buffer, _addrReg = address of the allocated record 00364 memTraceKernel.TraceAccessor().ComputeNewRecordOffset(coder, proc, recordSize, _offsetReg); 00365 } 00366 coder.ComputeAddress(proc, _addrReg, _offsetReg); 00367 00368 // Zero _offsetReg if the trace buffer is overflowed (predicate == true) 00369 proc += insF.MakeMov(_offsetReg, 0).SetPredicate(predicate); 00370 00371 //if (!predicate) { STORE buffer[_offsetReg] = _dataReg; _offsetReg += aligned-header-size} 00372 uint32_t alignedHeaderSize = MemTraceRecordHeader::AlignedSize(memTraceKernel.GenModel()); 00373 coder.StoreMemBlock(proc, _addrReg, _dataReg, alignedHeaderSize, !predicate); 00374 proc += insF.MakeAdd(_offsetReg, _offsetReg, alignedHeaderSize).SetPredicate(!predicate); 00375 00376 if (!proc.empty()) { proc.front()->AppendAnnotation(__func__); } 00377 } 00378 00379 void MemTrace::StoreRegRange(GtGenProcedure& proc, const IGtGenCoder& coder, uint32_t firstRegNum, uint32_t numRegs) 00380 { 00381 if (numRegs == 0) { return; } 00382 00383 IGtInsFactory& insF = coder.InstructionFactory(); 00384 uint32_t grfRegSize = insF.GenModel().GrfRegSize(); 00385 GtReg flagReg = FlagReg(0); 00386 GtPredicate predicate(flagReg); 00387 uint32_t blockSize = numRegs * grfRegSize; 00388 00389 // predicate = (_offsetReg != 0) = trace is not overflowed 00390 proc += insF.MakeCmp(GED_COND_MODIFIER_nz, flagReg, _offsetReg, 0, {16}); 00391 00392 // store address payload in addrReg = buffer[offsetReg] 00393 coder.ComputeAddress(proc, _addrReg, _offsetReg); 00394 coder.StoreRegRange(proc, _addrReg, firstRegNum, numRegs, predicate); 00395 00396 proc += insF.MakeAdd(_offsetReg, _offsetReg, blockSize).SetPredicate(predicate);; 00397 00398 if (!proc.empty()) { proc.front()->AppendAnnotation(__func__); } 00399 } 00400 00401 void MemTrace::OnFini() 00402 { 00403 MemTrace& me = *Instance(); 00404 IGtCore* gtpinCore = GTPin_GetCore(); 00405 for (auto& ref : me._kernels) 00406 { 00407 const MemTraceKernel& memTraceKernel = ref.second; 00408 MemoryTracePostProcessor(*gtpinCore, memTraceKernel)(); 00409 memTraceKernel.DumpAsm(); 00410 } 00411 } 00412 00413 /* ============================================================================================= */ 00414 // MemoryTracePreProcessor implementation 00415 /* ============================================================================================= */ 00416 const char* MemoryTracePreProcessor::_kernelPreProcessFileName = "memorytrace_pre_process.txt"; 00417 const char* MemoryTracePreProcessor::_dispatchPreProcessFileName = "memorytrace_pre_process_dispatch.txt"; 00418 00419 MemoryTracePreProcessor::MemoryTracePreProcessor() 00420 { 00421 if (gKnobPhase == 2) 00422 { 00423 // Read the data collected during the preprocessing phase 00424 std::ifstream is(_kernelPreProcessFileName); 00425 GTPIN_ASSERT_MSG(is, string("File ") + _kernelPreProcessFileName + " does not exist. The trace won't be generated"); 00426 is >> _kernelCounters; 00427 } 00428 else if (gKnobPhase == 1) 00429 { 00430 // Create pre_process files or remove old pre_process files's content if they exist 00431 CreateCleanFile(_kernelPreProcessFileName); 00432 CreateCleanFile(_dispatchPreProcessFileName); 00433 } 00434 } 00435 00436 MemoryTracePreProcessor* MemoryTracePreProcessor::Instance() 00437 { 00438 static MemoryTracePreProcessor instance; 00439 return &instance; 00440 } 00441 00442 void MemoryTracePreProcessor::OnFini() 00443 { 00444 MemoryTracePreProcessor& tool = *Instance(); 00445 tool.DumpKernelProfiles(_kernelPreProcessFileName); 00446 tool.DumpDispatchProfiles(_dispatchPreProcessFileName); 00447 } 00448 00449 uint64_t MemoryTracePreProcessor::TraceSize(const string& extKernelName) const 00450 { 00451 auto it = _kernelCounters.find(extKernelName); 00452 return ((it == _kernelCounters.end()) ? 0 : it->second.weight); 00453 } 00454 00455 uint32_t MemoryTracePreProcessor::GetBblWeight(IGtKernelInstrument& kernelInstrument, const IGtBbl& bbl) const 00456 { 00457 // For the memorytrace tool, the weight of the BBL is the trace record size in this BBL 00458 return BblMemAccessInfo(kernelInstrument, bbl).RecordSize(); 00459 } 00460 00461 void MemoryTracePreProcessor::AggregateDispatchCounters(KernelWeightCounters& kc, KernelWeightCounters dc) const 00462 { 00463 kc.weight = std::max(kc.weight, dc.weight); 00464 kc.freq += dc.freq; 00465 } 00466 00467 /* ============================================================================================= */ 00468 // MemoryTracePostProcessor implementation 00469 /* ============================================================================================= */ 00470 const char* MemoryTracePostProcessor::_traceFileName = "memorytrace_compressed.bin"; 00471 00472 MemoryTracePostProcessor::MemoryTracePostProcessor(const IGtCore& gtpinCore, const MemTraceKernel& memTraceKernel) : 00473 _kernel(&memTraceKernel), _memAccessInfo(&memTraceKernel.GetMemAccessInfo()), 00474 _kernelDir(JoinPath(string(gtpinCore.ProfileDir()), memTraceKernel.UniqueName())) {} 00475 00476 bool MemoryTracePostProcessor::operator()() 00477 { 00478 if (!MakeDirectory(_kernelDir)) 00479 { 00480 GTPIN_WARNING("MEMORYTRACE: Could not create directory " + _kernelDir); 00481 return false; 00482 } 00483 00484 // Process traces recorded in kernel dispatches 00485 for (const MemTraceDispatch& trace : _kernel->GetTraces()) 00486 { 00487 if (!trace.IsEmpty()) 00488 { 00489 if (trace.IsTrimmed()) 00490 { 00491 GTPIN_WARNING("MEMORYTRACE: Detected trace buffer overflow in kernel " + _kernel->Name()); 00492 } 00493 00494 string subdir = trace.KernelExecDesc().ToString(_kernel->Platform(), ExecDescFileNameFormat()); 00495 string dir = MakeSubDirectory(_kernelDir, subdir); 00496 string filePath = JoinPath(dir, _traceFileName); 00497 00498 ofstream fs(filePath, std::ios::binary); 00499 if (!fs) 00500 { 00501 GTPIN_WARNING("MEMORYTRACE: Could not create file " + filePath); 00502 continue; 00503 } 00504 StoreTrace(trace, fs); 00505 } 00506 } 00507 return true; 00508 } 00509 00510 void MemoryTracePostProcessor::StoreTrace(const MemTraceDispatch& trace, std::ofstream& fs) 00511 { 00512 const uint8_t* traceData = trace.Data(); 00513 uint32_t traceSize = trace.Size(); 00514 uint32_t alignedHeaderSize = MemTraceRecordHeader::AlignedSize(_kernel->GenModel()); 00515 00516 // Associate trace records with threads - populate _threadTraceRecords array 00517 const GtStateRegAccessor& sra = _kernel->GenModel().StateRegAccessor(); 00518 uint32_t maxThreads = _kernel->GenModel().MaxThreads(); // Max number of HW threads 00519 00520 _threadTraceRecords.resize(_kernel->NumTiles()); 00521 00522 for (uint32_t tile = 0; tile < _kernel->NumTiles(); tile++) 00523 { 00524 _threadTraceRecords[tile].clear(); 00525 _threadTraceRecords[tile].resize(maxThreads); 00526 } 00527 00528 std::vector<uint32_t> numProfiledThreads; // Number of profiled (active) threads 00529 numProfiledThreads.resize(_kernel->NumTiles(), 0); 00530 00531 for (uint32_t recordOffset = 0; recordOffset + sizeof(MemTraceRecordHeader) <= traceSize;) 00532 { 00533 // Retrive thread ID and BBL ID from the record header 00534 const MemTraceRecordHeader* header = (const MemTraceRecordHeader*)(traceData + recordOffset); 00535 uint32_t tid = sra.GetGlobalTid(header->sr0); 00536 uint32_t bblId = header->bblId; 00537 uint32_t tileId = header->tileId; GTPIN_ASSERT(tileId < _kernel->NumTiles()); 00538 const BblMemAccessInfo* bblInfoPtr = _memAccessInfo->GetBblInfo(bblId); GTPIN_ASSERT(bblInfoPtr != nullptr); 00539 uint32_t recordSize = bblInfoPtr->RecordSize(); 00540 if (recordOffset + recordSize > traceSize) 00541 { 00542 break; // end of trace 00543 } 00544 00545 auto& tileThreadRecords = _threadTraceRecords[tileId]; 00546 auto& threadTraceRecords = tileThreadRecords[tid]; 00547 00548 // Add a new trace record reference to _threadTraceRecords 00549 if (threadTraceRecords.empty()) { ++numProfiledThreads[tileId]; } // Increment thread count on the first relevant record 00550 threadTraceRecords.emplace_back(TraceRecord{ header, recordSize }); 00551 00552 recordOffset += recordSize; 00553 } 00554 00555 uint32_t registerSize = _kernel->GenModel().GrfRegSize(); 00556 00557 StoreMemAccessInfo(fs); // Store static information about memory accesses in the kernel 00558 Store(registerSize, fs); // Store register size 00559 00560 uint32_t timeStampIncluded = gKnobTimeStamp ? 1 : 0; 00561 00562 Store(timeStampIncluded, fs); // Store whether timestamp is included 00563 00564 // Compute and store the number of involved tiles 00565 uint32_t numOfTiles = 0; 00566 for (uint32_t i = 0; i < numProfiledThreads.size(); i++) 00567 { 00568 numOfTiles += (numProfiledThreads[i] == 0) ? 0 : 1; 00569 } 00570 Store(numOfTiles, fs); 00571 00572 for (uint32_t tileId = 0; tileId < _threadTraceRecords.size(); tileId++) 00573 { 00574 if (numProfiledThreads[tileId] == 0) { continue; } 00575 00576 Store(tileId, fs); 00577 00578 Store(numProfiledThreads[tileId], fs); 00579 00580 // Store per-thread traces 00581 for (uint32_t tid = 0; tid < maxThreads; tid++) 00582 { 00583 const auto& tileThreadRecords = _threadTraceRecords[tileId]; 00584 const auto& traceRecordList = tileThreadRecords[tid]; 00585 00586 if (traceRecordList.empty()) { continue; } 00587 00588 StoreGlobalTid(tid, fs); // Store Global Thread Identifier 00589 00590 uint32_t numRecords = (uint32_t)traceRecordList.size(); 00591 Store(numRecords, fs); // Store #records collected in the thread 00592 00593 // Store trace records 00594 for (const auto& record : traceRecordList) 00595 { 00596 const auto& header = *(record.header); 00597 uint32_t bblId = header.bblId; 00598 uint32_t execMask = header.ce & header.dm; 00599 00600 Store(bblId, fs); // Store BBL ID 00601 Store(execMask, fs); // Store dynamic execution mask 00602 00603 if (timeStampIncluded) 00604 { 00605 Store(header.tm00, fs); // Store tm0.0 00606 Store(header.tm01, fs); // Store tm0.1 00607 } 00608 // Store address paylads 00609 if (record.size > alignedHeaderSize) 00610 { 00611 fs.write((const char*)(record.header) + alignedHeaderSize, record.size - alignedHeaderSize); 00612 } 00613 } 00614 } 00615 } 00616 } 00617 00618 void MemoryTracePostProcessor::StoreMemAccessInfo(std::ofstream& fs) 00619 { 00620 // Store static information about memory accesses in BBLs 00621 uint32_t numBbls = _memAccessInfo->NumMemBbls(); 00622 Store(numBbls, fs); // Store the number of BBLs that access memory 00623 00624 for (const auto& entry : _memAccessInfo->GetMemAccessMap()) 00625 { 00626 uint32_t bblId = entry.first; 00627 const BblMemAccessInfo& bblInfo = entry.second; 00628 uint32_t numMemInstructions = (uint32_t)bblInfo.MemInstructions().size(); 00629 00630 Store(bblId, fs); // Store BBL ID 00631 Store(numMemInstructions, fs); // Store the number of memory instructions in BBL 00632 for (const auto& memIns : bblInfo.MemInstructions()) 00633 { 00634 MemInsInfo memInsInfo(memIns); 00635 Store(memInsInfo, fs); // Store the memory instruction descriptor 00636 } 00637 } 00638 } 00639 00640 void MemoryTracePostProcessor::StoreGlobalTid(uint32_t gtid, std::ofstream& fs) 00641 { 00642 const GtStateRegAccessor& sra = _kernel->GenModel().StateRegAccessor(); 00643 uint32_t sr0 = sra.SetGlobalTid(0, gtid); 00644 00645 auto storeSr0Field = [&](const ScatteredBitFieldU32& sbf) 00646 { 00647 uint32_t val = (sbf.IsEmpty() ? UINT32_MAX : sbf.GetValue(sr0)); 00648 Store(val, fs); 00649 }; 00650 00651 storeSr0Field(sra.SliceIdField()); 00652 storeSr0Field(sra.DualSubSliceIdField()); 00653 storeSr0Field(sra.SubSliceIdField()); 00654 storeSr0Field(sra.EuIdField()); 00655 storeSr0Field(sra.ThreadSlotField()); 00656 } 00657 00658 MemoryTracePostProcessor::MemInsInfo::MemInsInfo(const MemIns& memIns) 00659 { 00660 offset = memIns.offset; 00661 isWrite = memIns.msg.IsWrite() && (memIns.msg.Opcode() != GED_DP_OPCODE_LOAD_2D_BLOCK); 00662 isBlock2D = (memIns.msg.Opcode() == GED_DP_OPCODE_LOAD_2D_BLOCK) || (memIns.msg.Opcode() == GED_DP_OPCODE_STORE_2D_BLOCK); 00663 isScatter = memIns.msg.IsScatter(); 00664 isBTS = memIns.msg.IsBts(); 00665 isSLM = memIns.msg.IsSlm(); 00666 isScratch = memIns.msg.IsScratch(); 00667 isAtomic = memIns.msg.IsAtomic(); 00668 isFence = memIns.msg.IsMemFence(); 00669 addressWidth = (memIns.msg.IsA64() || isBlock2D) ? 64 : 32; 00670 simdWidth = isBlock2D ? 1 : memIns.msg.SimdWidth(); 00671 bti = memIns.msg.Bti(); 00672 elementSize = memIns.msg.ElementSize(); 00673 numElements = memIns.msg.NumElements(); 00674 addrPayloadLength = memIns.msg.AddrPayloadLength(); 00675 dataPort = (memIns.msg.IsDp0() ? 0 : 00676 (memIns.msg.IsDp1() ? 1 : 00677 (memIns.msg.IsUgm() ? 2 : 00678 (memIns.msg.IsUgml()? 3 : 00679 (memIns.msg.IsTgm() ? 4 : 00680 (memIns.msg.IsSlm() ? 5 : 6)))))); 00681 isEOT = memIns.msg.IsEot(); 00682 isMedia = memIns.msg.IsMedia(); 00683 execSize = memIns.msg.ExecSize(); 00684 channelOffset = memIns.msg.ChannelOffset(); 00685 } 00686 00687 /* ============================================================================================= */ 00688 // GTPin_Entry 00689 /* ============================================================================================= */ 00690 EXPORT_C_FUNC void GTPin_Entry(int argc, const char *argv[]) 00691 { 00692 ConfigureGTPin(argc, argv); 00693 if (gKnobPhase == 1) 00694 { 00695 MemoryTracePreProcessor::Instance()->Register(); 00696 atexit(MemoryTracePreProcessor::OnFini); 00697 } 00698 else 00699 { 00700 GTPIN_ASSERT_MSG((gKnobPhase == 2), "MEMORYTRACE: Invalid phase value. Should be 1 or 2, provided " + std::to_string(gKnobPhase)); 00701 MemTrace::Instance()->Register(); 00702 atexit(MemTrace::OnFini); 00703 } 00704 }
(Back to the list of all GTPin Sample Tools)
Copyright (C) 2013-2025 Intel Corporation
SPDX-License-Identifier: MIT
1.7.4