GTPin
GTPin: Memorytrace Sample Tool

The Memorytrace tool generates a dynamic trace of memory addresses that are accessed by the kernel

The trace is provided for each kernel, for each Draw/Enqueue granularity, and for each individual HW thread.

Running the Memorytrace tool

The Memorytrace tool (as well as all GTPin tracing tools) works in two phases, which should be run separately:

To run the pre-processing phase of the Memorytrace tool in its default configuration, use this command:

Profilers/Bin/gtpin -t memorytrace --phase 1 -- app

NOTE: You may run this phase only once per application.

To run the trace gathering phase of the Memorytrace tool (in its default configuration), use this command:

Profilers/Bin/gtpin -t memorytrace --phase 2 -- app

How to understand Memorytrace results

When you run the in-house GTPin Memorytrace tool in its default configuration for pre-processing (phase 1), the tool generates a directory called: GTPIN_PROFILE_MEMORYTRACE0. In addition the tool creates the following two files in the current directory:

This file is an input to the trace gathering phase. It has the following format:

BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 142606336

where, for each kernel, the required trace size is provided.

This file contains informational data only, and has the following format:

BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 142606336 262144  OpenCL 0  0
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 100034048 131072  OpenCL 0  1
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 142606336 262144  OpenCL 0  2
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 100034048 131072  OpenCL 0  3
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 100034048 131072  OpenCL 0  4
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 142606336 262144  OpenCL 0  5
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 100034048 131072  OpenCL 0  6
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 100034048 131072  OpenCL 0  7
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 100034048 131072  OpenCL 0  8
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 142606336 262144  OpenCL 0  9
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 100034048 131072  OpenCL 0  10

where each line corresponds to a single kernel dispatch. The fields have the following meaning (from left to right):

When the Memorytrace tool is run for the trace gathering phase (phase 2), the tool generates the directory: GTPIN_PROFILE_MEMORYTRACE1. GTPin saves the profiling results in the folder: GTPIN_PROFILE_MEMORYTRACE1\Session_Final. The traces for each kernel is saved in a separate sub-folder that has the same name as the kernel. Each Draw/Enqueue command has a separate trace, which is saved in a corresponding sub-directory, as shown in the following screenshot:

memorytrace_res_dir_structure.jpg

How to uncompress Memorytrace and read the trace

Each trace is saved in a compressed binary format within a file called memorytrace_compressed.bin, as shown above. To uncompress the trace, you must run a Profilers\Scripts\uncompress_memorytrace.py Python Software Foundation Python* script, in the following manner (Python 3.5 or above is required):

python3 Profilers\Scripts\uncompress_memtrace.py --input_dir GTPIN_PROFILE_MEMORYTRACE1\Session_Final\BitonicSort\device_0__enqueue_0 --gen 9 -v

NOTE: the -v parameter is optional. It provides additional metadata (such as access size, access type, address width, operand width, etc) for each memory access.

Running the script opens the compressed trace into separate traces, one per HW thread, as shown in the following screenshot:

memorytrace_uncompressed_res.jpg

where the trace generated on each HW thread is saved in a text file named memorytrace___s_0_ss_0_eu_1_tid_5.out. The file name indicates the HW thread topology ID (Slice (S), DualSubSlice (DSS), SubSlice (SS), Execution Unit (EU), and HW thread ID (TID)). The resulting trace is provided in the following format:

 BBL-ID   DP  SEND-OFFSET    RW      ADDRESS      SIZE
======================================================
    16    1    0x00000a58     R      0x00004800             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a58     R      0x00004810             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a58     R      0x00004820             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a58     R      0x00004830             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a58     R      0x00004840             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a58     R      0x00004850             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a58     R      0x00004860             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a58     R      0x00004870             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a58     R      0x00004880             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a58     R      0x00004890             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a58     R      0x000048a0             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a58     R      0x000048b0             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a58     R      0x000048c0             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a58     R      0x000048d0             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a58     R      0x000048e0             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a58     R      0x000048f0             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a68     R      0x00004900             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a68     R      0x00004910             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a68     R      0x00004920             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a68     R      0x00004930             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a68     R      0x00004940             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a68     R      0x00004950             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a68     R      0x00004960             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a68     R      0x00004970             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a68     R      0x00004980             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a68     R      0x00004990             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a68     R      0x000049a0             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a68     R      0x000049b0             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a68     R      0x000049c0             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a68     R      0x000049d0             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a68     R      0x000049e0             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    16    1    0x00000a68     R      0x000049f0             16  // R  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    21    1    0x00000ca8     W      0x00004800             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    21    1    0x00000ca8     W      0x00004820             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    21    1    0x00000ca8     W      0x00004840             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    21    1    0x00000ca8     W      0x00004860             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    21    1    0x00000ca8     W      0x00004880             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    21    1    0x00000ca8     W      0x000048a0             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    21    1    0x00000ca8     W      0x000048c0             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    21    1    0x00000ca8     W      0x000048e0             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    21    1    0x00000cb8     W      0x00004900             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    21    1    0x00000cb8     W      0x00004920             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    21    1    0x00000cb8     W      0x00004940             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    21    1    0x00000cb8     W      0x00004960             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    21    1    0x00000cb8     W      0x00004980             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    21    1    0x00000cb8     W      0x000049a0             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    21    1    0x00000cb8     W      0x000049c0             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    21    1    0x00000cb8     W      0x000049e0             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    23    1    0x00000d98     W      0x00004810             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    23    1    0x00000d98     W      0x00004830             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    23    1    0x00000d98     W      0x00004850             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    23    1    0x00000d98     W      0x00004870             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    23    1    0x00000d98     W      0x00004890             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    23    1    0x00000d98     W      0x000048b0             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    23    1    0x00000d98     W      0x000048d0             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    23    1    0x00000d98     W      0x000048f0             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    23    1    0x00000da8     W      0x00004910             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    23    1    0x00000da8     W      0x00004930             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    23    1    0x00000da8     W      0x00004950             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    23    1    0x00000da8     W      0x00004970             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    23    1    0x00000da8     W      0x00004990             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    23    1    0x00000da8     W      0x000049b0             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    23    1    0x00000da8     W      0x000049d0             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     
    23    1    0x00000da8     W      0x000049f0             16  // W  Scatter  BTS   32-bit access  SIMD16  BTI = 00  Operand = DW  Access size = 16     

----EOT----

Each line corresponds to a single memory access. The different fields in the output have the following meanings (from left to right):

An EOT indication separates the consequent dispatches of different SW threads of the kernel from the same HW thread.

To map the basic block IDs and instruction offsets to the kernel code, you must look in the Session_Final\ISA sub-folder where GEN assembly of all the kernels are saved.

(Back to the list of all GTPin Sample Tools)

memorytrace.h

00001 /*========================== begin_copyright_notice ============================
00002 Copyright (C) 2018-2025 Intel Corporation
00003 
00004 SPDX-License-Identifier: MIT
00005 ============================= end_copyright_notice ===========================*/
00006 
00007 /*!
00008  * @file Memorytrace tool definitions
00009  */
00010 
00011 #ifndef MEMORYTRACE_H_
00012 #define MEMORYTRACE_H_
00013 
00014 #include <list>
00015 #include <map>
00016 #include <vector>
00017 
00018 #include "gtpin_api.h"
00019 #include "gtpin_tool_utils.h"
00020 #include "kernel_weight.h"
00021 #include "gen_send_decoder.h"
00022 #include "gt_basic_defs.h"
00023 
00024 using namespace gtpin;
00025 
00026 /*
00027  * The trace of memory accesses consists of records, each of which holds data pertaining to a single
00028  * execution of a basic block that accesses memory.
00029  * Each trace record comprises two data portions:
00030  *  - Header that details architectural state during this BBL execution. The MemTraceRecordHeader
00031  *    structure defines layout of the data elements in the header
00032  *  - Array of address payload register values in SEND instructions of this BBL. The values are
00033  *    stored in the order of the corresponding instructions in the basic block. This portion of the
00034  *    record is of a variable length - different BBLs may access different number of data elements.
00035  * Both portions (header and address payloads) are aligned with the GRF register size.
00036  */
00037 
00038 /* ============================================================================================= */
00039 // Struct MemTraceRecordHeader
00040 /* ============================================================================================= */
00041 /*!
00042  * Structure of the trace record header.
00043  * The header details architectural state during execution of a BBL that accesses memory. In the
00044  * trace record, the header is followed by the array of address payload registers.
00045  */
00046 #pragma pack(push, 1)
00047 struct MemTraceRecordHeader 
00048 {
00049     uint16_t bblId;     ///< BBL identifier
00050     uint16_t sr0;       ///< LSB-16 of the State register sr0.0:ud
00051     uint32_t tileId;    ///< Tile ID
00052     uint32_t ce;        ///< Channel Enable register ce:ud
00053     uint32_t dm;        ///< Dispatch Mask register dm:ud
00054     uint32_t res;       ///< Reserved
00055     uint32_t tm00;      ///< tm0.0:ud
00056     uint32_t tm01;      ///< tm0.1:ud
00057 
00058     /// @return Size of this structure aligned with the GRF register size in the specified GEN model 
00059     static uint32_t AlignedSize(const IGtGenModel& genModel)
00060     {
00061         return (uint32_t)AlignUp(sizeof(MemTraceRecordHeader), genModel.GrfRegSize());
00062     }
00063 };
00064 #pragma pack(pop)
00065 
00066 /* ============================================================================================= */
00067 // Struct MemIns
00068 /* ============================================================================================= */
00069 /// SEND instruction descriptor
00070 struct MemIns
00071 {
00072     InsId       id;         ///< Instruction ID
00073     uint32_t    offset;     ///< Offset ot the instruction within the kernel
00074     DcSendMsg   msg;        ///< Decoded SEND message instruction
00075 };
00076 using MemInsList = std::list<MemIns>;
00077 
00078 /* ============================================================================================= */
00079 // Class BblMemAccessInfo
00080 /* ============================================================================================= */
00081 /*!
00082  * Class that retrieves information about memory accesses in the basic block, and provides an
00083  * interface for querying this information
00084  */
00085 class BblMemAccessInfo
00086 {
00087 public:
00088     BblMemAccessInfo() : _recordSize(0) {}
00089 
00090     /// Populate this object with the information about memory accesses in the specified basic block
00091     BblMemAccessInfo(const IGtKernelInstrument& kernelInstrument, const IGtBbl& bbl) : BblMemAccessInfo() { Build(kernelInstrument, bbl); }
00092     BblMemAccessInfo& Build(const IGtKernelInstrument& kernelInstrument, const IGtBbl& bbl);
00093 
00094     /// @return List of memory instructions in the basic block
00095     const MemInsList& MemInstructions() const { return _memInstructions; }
00096 
00097     /// @return Size, in bytes, of the trace record in the basic block
00098     uint32_t RecordSize() const { return _recordSize; }
00099 
00100     /// @return true if basic block does not contain any memory instruction of interest
00101     bool IsEmpty() const { return (_recordSize == 0); }
00102 
00103 private:
00104     MemInsList  _memInstructions;   ///< List of memory instructions in the basic block
00105     uint32_t    _recordSize;        ///< Size, in bytes, of the trace record in the basic block
00106 };
00107 
00108 /* ============================================================================================= */
00109 // Class KernelMemAccessInfo
00110 /* ============================================================================================= */
00111 /*!
00112  * Class that retrieves information about memory accesses in the kernel, and provides an
00113  * interface for querying this information
00114  */
00115 class KernelMemAccessInfo
00116 {
00117 public:
00118     KernelMemAccessInfo() : _maxRecordSize(0) {}
00119 
00120     /// Populate this object with the information about memory accesses in the specified kernel
00121     explicit KernelMemAccessInfo(const IGtKernelInstrument& kernelInstrument) { Build(kernelInstrument); }
00122     KernelMemAccessInfo& Build(const IGtKernelInstrument& kernelInstrument);
00123 
00124     /// @return Information about memory accesses in the kernel, indexed by BBL identifiers
00125     using MemAccessMap = std::map<BblId, BblMemAccessInfo>;
00126     const MemAccessMap& GetMemAccessMap() const { return _memAccessMap; }
00127 
00128     /// @return Information about memory accesses in the specified BBL, or NULL if requested information is not found
00129     const BblMemAccessInfo* GetBblInfo(BblId bblId) const;
00130 
00131     /// @return Number of basic blocks that access memory
00132     uint32_t NumMemBbls() const { return (uint32_t)_memAccessMap.size(); }
00133 
00134     /// @return Maximum size, in bytes, of the trace record in the kernel
00135     uint32_t MaxRecordSize() const { return _maxRecordSize; }
00136 
00137 private:
00138     MemAccessMap _memAccessMap;  ///< Information about memory accesses, indexed by BBL identifiers
00139     uint32_t     _maxRecordSize; ///< Max. size, in bytes, of the trace record in the kernel
00140 };
00141 
00142 /* ============================================================================================= */
00143 // Class MemTraceDispatch
00144 /* ============================================================================================= */
00145 /*!
00146  * Class that holds memory trace collected during a single kernel dispatch
00147  */
00148 class MemTraceDispatch
00149 {
00150 public:
00151     /// Construct a MemTraceDispatch object with the empty trace
00152     explicit MemTraceDispatch(const IGtKernelDispatch& dispatch) : _isTrimmed(false) { dispatch.GetExecDescriptor(_kernelExecDesc); }
00153     
00154     /// Read the entire trace from the specified profile buffer into this object
00155     bool ReadTrace(const GtProfileTrace& traceAccessor, const IGtProfileBuffer& profileBuffer);
00156 
00157     const GtKernelExecDesc& KernelExecDesc() const { return _kernelExecDesc; } ///< @return Descriptor of this kernel dispatch
00158     uint32_t        Size()      const { return (uint32_t)_rawTrace.size(); }   ///< @return Trace size in bytes
00159     const uint8_t*  Data()      const { return _rawTrace.data(); }             ///< @return Trace data collected in this dispatch
00160     uint8_t*        Data()            { return _rawTrace.data(); }             ///< @return Trace data collected in this dispatch
00161     bool            IsEmpty()   const;                                         ///< @return true if the trace is empty
00162     bool            IsTrimmed() const { return _isTrimmed; }                   ///< @return true if the trace has been trimmed
00163 
00164 private:
00165     GtKernelExecDesc        _kernelExecDesc; ///< Kernel execution descriptor
00166     std::vector<uint8_t>    _rawTrace;       ///< Trace data collected in this kernel dispatch
00167     bool                    _isTrimmed;      ///< true if the trace has been trimmed to avoid buffer overflow
00168 };
00169 
00170 /* ============================================================================================= */
00171 // Class MemTraceKernel
00172 /* ============================================================================================= */
00173 /*!
00174  * Class that contains
00175  *  - Static information about memory accesses in the kernel
00176  *  - Collection of memory traces recorded by kernel dispatches
00177  */
00178 class MemTraceKernel
00179 {
00180 public:
00181     MemTraceKernel() = default;
00182 
00183     /// Construct a MemTraceKernel object intended to hold traces of the specified kernel
00184     explicit MemTraceKernel(const IGtKernelInstrument& kernelInstrument, uint32_t numTiles);
00185 
00186     /*!
00187      * Read a trace recorded by the specified kernel dispatch. Create and add the corresponding MemTraceDispatch
00188      * instance to this object
00189      */
00190     MemTraceDispatch& AddMemTrace(IGtKernelDispatch& kernelDispatch);
00191 
00192     std::string           Name()            const { return _name; }               ///< @return Kernel's name
00193     std::string           ExtendedName()    const { return _extName; }            ///< @return Kernel's extended name
00194     std::string           UniqueName()      const { return _uniqueName; }         ///< @return Kernel's unique name
00195     const GtGpuPlatform   Platform()        const { return _platform; }           ///< @return Kernel's platform
00196     const IGtGenModel&    GenModel()        const { return GetGenModel(_genId); } ///< @return Kernel's GEN model
00197     const GtProfileTrace& TraceAccessor()   const { return _traceAccessor; }      ///< @return Trace accessor
00198     void  DumpAsm()                         const;                                ///< Dump kernel's assembly text to file
00199 
00200      /// @return true, if tracing of this kernel is enabled
00201     uint32_t IsEnabled() const { return (_traceAccessor.MaxTraceSize() != 0); }
00202 
00203     /// @return Static information about kernel's memory accesses
00204     const KernelMemAccessInfo& GetMemAccessInfo() const { return _memAccessInfo; }
00205 
00206     /// @return Traces collected in kernel's dispatches
00207     typedef std::list<MemTraceDispatch>  Traces;
00208     const Traces& GetTraces() const { return _traces; }
00209 
00210     /// @return Number of tiles
00211     uint32_t NumTiles() const { return _numTiles; }
00212 private:
00213     std::string         _name;              ///< Kernel's name
00214     std::string         _uniqueName;        ///< Kernel's unique name
00215     std::string         _extName;           ///< Kernel's extended name
00216     GtGpuPlatform       _platform;          ///< Kernel's platform
00217     GtGenModelId        _genId;             ///< Identifier of the GEN model, the kernel is compiled for
00218     std::string         _asmText;           ///< Kernel's assembly text
00219     KernelMemAccessInfo _memAccessInfo;     ///< Static information about kernel's memory accesses
00220     GtProfileTrace      _traceAccessor;     ///< Trace accessor
00221     Traces              _traces;            ///< Traces collected in kernel's dispatches
00222     uint32_t            _numTiles;          ///< The number of supported tiles
00223 };
00224 
00225 /* ============================================================================================= */
00226 // Class MemTrace
00227 /* ============================================================================================= */
00228 /*!
00229  * Implementation of the IGtTool interface for the Memorytrace tool
00230  */
00231 class MemTrace : public GtTool
00232 {
00233 public:
00234     /// Implementation of the IGtTool interface
00235     const char* Name() const { return "memorytrace"; }
00236 
00237     void OnKernelBuild(IGtKernelInstrument& instrumentor);
00238     void OnKernelRun(IGtKernelDispatch& dispatcher);
00239     void OnKernelComplete(IGtKernelDispatch& dispatcher);
00240 
00241 public:
00242     static void      OnFini();      ///< Callback function registered with atexit()
00243     static MemTrace* Instance();    ///< @return Single instance of this class
00244 
00245 private:
00246     MemTrace() = default;
00247     MemTrace(const MemTrace&) = delete;
00248     MemTrace& operator = (const MemTrace&) = delete;
00249     ~MemTrace() = default;
00250 
00251     /*!
00252      * Generate instrumentation for the specified basic block
00253      * @param[in] instrumentor      Interface of the kernel being instrumented
00254      * @param[in] bbl               Basic block to be instrumented
00255      * @param[in] memTraceKernel    Object that holds information about memory accesses in the kernel
00256      * @return success/failure status
00257      */
00258     bool InstrumentBbl(IGtKernelInstrument& instrumentor, const IGtBbl& bbl, const MemTraceKernel& memTraceKernel);
00259 
00260     /*!
00261      * Generate procedure that allocates space for the new trace record in the trace and stores the record header
00262      * for the specified basic block. The procedure sets _offsetReg equal to the offset of the location within the
00263      * profile buffer immediately following the record header.
00264      * If new record can not be allocated due to the trace capacity limitations, the procedure zeroes _offsetReg.
00265      *
00266      * @param[in, out]  proc            Procedure, the generated code is appended to
00267      * @param[in]       coder           GEN code generator
00268      * @param[in]       bbl             Basic block being instrumented
00269      * @param[in]       memTraceKernel  Object that holds information about memory accesses in the kernel
00270      * @param[in]       recordSize      Size of the new record, in bytes
00271      * @param[in]       firstSendInBbl  Indicates whether this is the first Send instruction in BBL
00272      */
00273     void StoreRecordHeader(GtGenProcedure& proc, const IGtGenCoder& coder, const IGtBbl& bbl,
00274                            const MemTraceKernel& memTraceKernel, uint32_t recordSize, bool firstSendInBbl = true);
00275 
00276     /*!
00277      * Generate procedure that stores the specified range of GRF registers in the trace.
00278      * On entry, the procedure assumes that _offsetReg holds offset of the location within the profile buffer
00279      * at which address payload is about to be stored.
00280      * On exit, the procedure advances _offsetReg to the location just after the stored register range.
00281      * If input value of _offsetReg is zero. no registers are stored, and _offsetReg retains value zero.
00282      * 
00283      * @param[in, out]  proc            Procedure, the generated code is appended to
00284      * @param[in]       coder           GEN code generator
00285      * @param[in]       firstRegNum     First GRF register to be stored
00286      * @param[in]       numRegs         Number of registers to be stored
00287      */
00288     void StoreRegRange(GtGenProcedure& proc, const IGtGenCoder& coder, uint32_t firstRegNum, uint32_t numRegs);
00289 
00290 private:
00291     std::map<GtKernelId, MemTraceKernel>    _kernels;  ///< Collection of traces per kernel
00292 
00293     GtReg _addrReg;     ///< Virtual register that holds address within profile buffer
00294     GtReg _dataReg;     ///< Virtual register that holds data to be read from/written to profile buffer 
00295     GtReg _offsetReg;   ///< Virtual register that holds offset within the trace
00296     GtReg _tileIdReg;   ///< Virtual register that holds tile ID
00297 };
00298 
00299 /* ============================================================================================= */
00300 // Class MemoryTracePreProcessor
00301 /* ============================================================================================= */
00302 /*!
00303  * Class that computes per-kernel trace sizes in the preprocessing phase, and provides access to
00304  * this data in the trace gathering phase
00305  */
00306 class MemoryTracePreProcessor : public KernelWeight
00307 {
00308 public:
00309     uint64_t TraceSize(const std::string& extKernelName) const; ///< Given extended kernel name, return the trace size in bytes
00310     static MemoryTracePreProcessor* Instance();                 ///< @return Single instance of this class
00311     static void OnFini();                                       ///< Callback function registered with atexit()
00312 
00313 private:
00314     MemoryTracePreProcessor();
00315     MemoryTracePreProcessor(const MemoryTracePreProcessor&) = delete;
00316     MemoryTracePreProcessor& operator = (const MemoryTracePreProcessor&) = delete;
00317 
00318 private:
00319     /// KernelWeight interface overrides (@see description of KernelWeight functions)
00320     uint32_t GetBblWeight(IGtKernelInstrument& kernelInstrument, const IGtBbl& bbl) const;
00321     void AggregateDispatchCounters(KernelWeightCounters& kc, KernelWeightCounters dc) const;
00322 
00323 private:
00324     KernelWeightProfileData     _kernelCounters; ///< Per-kernel counters of required trace records; collected in preprocessing phase
00325 
00326     static const char* _kernelPreProcessFileName;   ///< Name of the file that contains preprocessing data per kernel
00327     static const char* _dispatchPreProcessFileName; ///< Name of the file that contains preprocessing data per kernel dispatch
00328 };
00329 
00330 /* ============================================================================================= */
00331 // Class MemoryTracePostProcessor
00332 /* ============================================================================================= */
00333 /*!
00334  * Function object that processes kernel traces - stores them in files within the profile directory:
00335  *
00336  *    kernel_name
00337  *    |
00338  *        |- kernel_dispatch_1
00339  *           |- memorytrace_compressed.bin
00340  *        |- kernel_dispatch_2
00341  *           |- memorytrace_compressed.bin
00342  * The .bin trace files can be uncompressed by the uncompress_memtrace.exe utility.
00343  *
00344  * Format of .bin trace files:
00345  * - Static information:
00346  *     - Number of BBLs that access memory
00347  *     - For each BBL that accesses memory:
00348  *        - BBL ID
00349  *        - Number of SEND instructions in this basic block
00350  *        - For each SEND instruction:
00351  *            - Decoded information including offset, address model, address type, address payload size, etc
00352  * 
00353  * - Dynamic trace data:
00354  *      - Number of HW threads in which the trace was collected
00355  *      - For each HW thread:
00356  *          - HW Thread ID (in the format of sr0.0)
00357  *          - Number of records collected for this HW thread
00358  *          - All the records collected for this HW thread
00359  */
00360 class MemoryTracePostProcessor
00361 {
00362 public:
00363     /// Construct a MemoryTracePostProcessor object for the specified collection of kernel traces
00364     MemoryTracePostProcessor(const IGtCore& gtpinCore, const MemTraceKernel&  memTraceKernel);
00365 
00366     /// Process all kernel traces associated with this object - store them in files within the profile directory
00367     bool operator()();
00368 
00369 private:
00370     /// Store the specified trace in the specified file stream
00371     void StoreTrace(const MemTraceDispatch& trace, std::ofstream& fs);
00372 
00373     /// Store static information about memory accesses in the kernel
00374     void StoreMemAccessInfo(std::ofstream& fs);
00375 
00376     /// Store Global Thread Identifier
00377     void StoreGlobalTid(uint32_t gtid, std::ofstream& fs);
00378 
00379     /// Store the specified value in the specified file stream in the binary format
00380     template <typename T> void Store(const T& val, std::ofstream& fs) { fs.write((const char*)&val, sizeof(val)); }
00381 
00382 private:
00383     struct TraceRecord                              ///< Reference to the trace record
00384     {
00385         const MemTraceRecordHeader* header;         ///< Pointer to the header of the record
00386         uint32_t                    size;           ///< Size of the record in bytes, including header
00387     };
00388     using TraceRecordList = std::list<TraceRecord>;           ///< List of references to trace records
00389     using PerTileTraceRecords = std::vector<TraceRecordList>; ///< Per tile trace records
00390     ///The SEND instruction description stored in the trace
00391     struct MemInsInfo
00392     {
00393         explicit MemInsInfo(const MemIns& memIns);    ///< Construct packed variant of the specified MemIns structure
00394 
00395         uint32_t    offset;             ///< Offset of the instruction within the kernel
00396         uint32_t    isWrite;            ///< Is write (true) or read (false) operation
00397         uint32_t    isBlock2D;          ///< Is block 2D load or store access
00398         uint32_t    isScatter;          ///< Is scatter
00399         uint32_t    isBTS;              ///< Is Binding Table State address model
00400         uint32_t    isSLM;              ///< Is Shared Local Memory access
00401         uint32_t    isScratch;          ///< Is access to scratch block
00402         uint32_t    isAtomic;           ///< Is atomic operation
00403         uint32_t    isFence;            ///< Is fence
00404         uint32_t    addressWidth;       ///< Address width (32bit or 64bit)
00405         uint32_t    simdWidth;          ///< SIMD width
00406         uint32_t    bti;                ///< BTI value
00407         uint32_t    addrPayloadLength;  ///< Address payload length
00408         uint32_t    dataPort;           ///< Data port ID: 0=DP0, 1=DP1, 2=UGM, 3=UGML, 4=TGM, 5=SLM, 6-all others
00409         uint32_t    isEOT;              ///< Is SEND.EOT
00410         uint32_t    isMedia;            ///< Is media block access
00411         uint32_t    elementSize;        ///< Data element size in bytes
00412         uint32_t    numElements;        ///< Number of elements
00413         uint32_t    execSize;           ///< Instruction execution size
00414         uint32_t    channelOffset;      ///< Channel mask offset
00415     };
00416 
00417 private:
00418     const MemTraceKernel*            _kernel;            ///< Kernel&traces to be processed
00419     const KernelMemAccessInfo*       _memAccessInfo;     ///< Static information about memory accesses in the kernel
00420     std::string                      _kernelDir;         ///< Directory to store kernel's trace files
00421     std::vector<PerTileTraceRecords> _threadTraceRecords;///< Lists of trace records, indexed by the thread ID
00422 
00423     static const char*               _traceFileName;     ///< Name of the file to store trace in
00424 };
00425 
00426 #endif

memorytrace.cpp

00001 /*========================== begin_copyright_notice ============================
00002 Copyright (C) 2018-2025 Intel Corporation
00003 
00004 SPDX-License-Identifier: MIT
00005 ============================= end_copyright_notice ===========================*/
00006 
00007 /*!
00008  * @file Implementation of the Memorytrace tool
00009  */
00010 
00011 #include <fstream>
00012 #include <string>
00013 
00014 #include "memorytrace.h"
00015 #include "gtpin_tool_utils.h"
00016 
00017 using namespace gtpin;
00018 using namespace std;
00019 
00020 /* ============================================================================================= */
00021 // Configuration
00022 /* ============================================================================================= */
00023 Knob<int>  gKnobMaxTraceBufferInMB("max_buffer_mb", 3072, "memorytrace - the max allowed size of the trace buffer per kernel in MB\n");
00024 Knob<int>  gKnobPhase("phase", 0, "tracing tool - processing phase\n { 1 - pre-processing, 2 - processing - trace gathering} ");
00025 Knob<bool> gKnobTimeStamp("include_timestamp", false, "true - includes time stamp, false - doesn't include time stamp");
00026 
00027 /* ============================================================================================= */
00028 // BblMemAccessInfo implementation
00029 /* ============================================================================================= */
00030 BblMemAccessInfo& BblMemAccessInfo::Build(const IGtKernelInstrument& kernelInstrument, const IGtBbl& bbl)
00031 {
00032     const IGtCfg&       cfg             = kernelInstrument.Cfg();
00033     uint32_t            addrPayloadSize = 0;
00034     uint32_t            headerSize      = 0;
00035     const IGtGenModel&  genModel        = kernelInstrument.Kernel().GenModel();
00036 
00037     _memInstructions.clear();
00038     for (auto insPtr : bbl.Instructions())
00039     {
00040         const IGtIns& ins = *insPtr;
00041 
00042         if ((ins.Id() < uint32_t(knobMinInstrumentIns)) || (ins.Id() > uint32_t(knobMaxInstrumentIns)))
00043         {
00044             continue;
00045         }
00046         if (ins.IsSendMessage())
00047         {
00048             DcSendMsg msg = DcSendMsg::Decode(ins.GetGedIns());
00049             if (msg.IsValid() || ins.IsEot())
00050             {
00051                 addrPayloadSize += (msg.AddrPayloadLength() * genModel.GrfRegSize());   // Accumulate SEND address payloads
00052                 headerSize = MemTraceRecordHeader::AlignedSize(genModel);               // Add one trace record header per BBL
00053                 _memInstructions.emplace_back(MemIns{ins.Id(), cfg.GetInstructionOffset(ins), std::move(msg)});
00054             }
00055         }
00056     }
00057     _recordSize = addrPayloadSize + headerSize;
00058     if (!_memInstructions.empty() && gKnobTimeStamp)
00059     {
00060         _recordSize = addrPayloadSize + (uint32_t)_memInstructions.size() * headerSize;
00061     }
00062     return *this;
00063 }
00064 
00065 /* ============================================================================================= */
00066 // KernelMemAccessInfo implementation
00067 /* ============================================================================================= */
00068 KernelMemAccessInfo& KernelMemAccessInfo::Build(const IGtKernelInstrument& kernelInstrument)
00069 {
00070     const IGtCfg&  cfg = kernelInstrument.Cfg();
00071 
00072     _memAccessMap.clear();
00073     _maxRecordSize = 0;
00074     for (auto bblPtr : cfg.Bbls())
00075     {
00076         const IGtBbl& bbl = *bblPtr;
00077         BblMemAccessInfo bblMemAccessInfo(kernelInstrument, bbl);
00078         if (!bblMemAccessInfo.IsEmpty())
00079         {
00080             _maxRecordSize = std::max(_maxRecordSize, bblMemAccessInfo.RecordSize());
00081             _memAccessMap.emplace(bbl.Id(), std::move(bblMemAccessInfo));
00082         }
00083     }
00084     return *this;
00085 }
00086 
00087 const BblMemAccessInfo* KernelMemAccessInfo::GetBblInfo(BblId bblId) const 
00088 {
00089     auto it = _memAccessMap.find(bblId);
00090     return (it == _memAccessMap.end()) ? nullptr: &(it->second);
00091 }
00092 
00093 /* ============================================================================================= */
00094 // MemTraceDispatch implementation
00095 /* ============================================================================================= */
00096 bool MemTraceDispatch::ReadTrace(const GtProfileTrace& traceAccessor, const IGtProfileBuffer& profileBuffer)
00097 {
00098     uint32_t traceSize = traceAccessor.Size(profileBuffer);
00099     _rawTrace.resize(traceSize);
00100     _isTrimmed = traceAccessor.IsTruncated(profileBuffer);
00101     return traceAccessor.Read(profileBuffer, _rawTrace.data(), 0, traceSize);
00102 }
00103 
00104 bool MemTraceDispatch::IsEmpty() const
00105 {
00106     return _rawTrace.size() < sizeof(MemTraceRecordHeader);
00107 }
00108 
00109 /* ============================================================================================= */
00110 // MemTraceKernel implementation
00111 /* ============================================================================================= */
00112 MemTraceKernel::MemTraceKernel(const IGtKernelInstrument& kernelInstrument, uint32_t numTiles) : _numTiles(numTiles)
00113 {
00114     const IGtKernel& kernel = kernelInstrument.Kernel();
00115     const IGtCfg&    cfg    = kernelInstrument.Cfg();
00116 
00117     _name       = GlueString(kernel.Name());
00118     _extName    = ExtendedKernelName(kernel);
00119     _platform   = kernel.GpuPlatform();
00120     _genId      = kernel.GenModel().Id();
00121     _asmText    = CfgAsmText(cfg);
00122     _uniqueName = kernel.UniqueName();
00123 
00124     // Build static information about memory accesses in the kernel
00125     _memAccessInfo.Build(kernelInstrument);
00126 
00127     // Initialize trace accessor. The trace capacity is expected to be computed during the preprocessing phase.
00128     uint64_t traceCapacity =  MemoryTracePreProcessor::Instance()->TraceSize(_extName);
00129     if (traceCapacity == 0)
00130     {
00131         // Unknown trace capacity
00132         GTPIN_WARNING("MEMORYTRACE: unknown trace capacity for kernel " + _name + ". Assuming the kernel is filtered out. "
00133             "Allocating a buffer of 8KB size. If the kernel is supposed to run, expect buffer overflow. "
00134             "In this case, please re-run phase 1 and make sure the kernel is not filtered out.");
00135         traceCapacity = 0x2000;
00136     }
00137     else
00138     {
00139         traceCapacity += 0x2000; // Add some space to account for possible fluctuation of trace sizes between phases
00140         if (traceCapacity > UINT32_MAX)
00141         {
00142             GTPIN_WARNING("MEMORYTRACE: The kernel " + _name + " exceedeed maximum trace capacity.");
00143             traceCapacity = UINT32_MAX;
00144         }
00145     }
00146     if (traceCapacity > (uint64_t(gKnobMaxTraceBufferInMB) * 0x100000))
00147     {
00148         GTPIN_WARNING("MEMORYTRACE: required capacity (" + DecStr(traceCapacity) + ") for kernel " + _name + " is too big - cut to " + DecStr(gKnobMaxTraceBufferInMB) + "MB. "
00149                       "Expect the final trace to contain partial data.");
00150         traceCapacity = uint64_t(gKnobMaxTraceBufferInMB) * 0x100000;
00151     }
00152     uint32_t maxRecordSize = _memAccessInfo.MaxRecordSize();
00153     _traceAccessor = GtProfileTrace((uint32_t)traceCapacity, maxRecordSize);
00154     _traceAccessor.Allocate(kernelInstrument.ProfileBufferAllocator());
00155 }
00156 
00157 MemTraceDispatch& MemTraceKernel::AddMemTrace(IGtKernelDispatch& kernelDispatch)
00158 {
00159     // Create a new MemTraceDispatch object and store the entire trace within this object
00160     _traces.emplace_back(kernelDispatch);
00161     MemTraceDispatch& memTraceDispatch  = _traces.back();
00162     if (!memTraceDispatch.ReadTrace(_traceAccessor, *kernelDispatch.GetProfileBuffer()))
00163     {
00164         GTPIN_ERROR_MSG("MEMORYTRACE: Failed to read profile buffer for kernel " + _name);
00165     }
00166     return memTraceDispatch;
00167 }
00168 
00169 void MemTraceKernel::DumpAsm() const
00170 {
00171     DumpKernelAsmText(_name, _uniqueName, _asmText);
00172 }
00173 
00174 /* ============================================================================================= */
00175 // MemTrace implementation
00176 /* ============================================================================================= */
00177 MemTrace* MemTrace::Instance()
00178 {
00179     static MemTrace instance;
00180     return &instance;
00181 }
00182 
00183 void MemTrace::OnKernelBuild(IGtKernelInstrument& instrumentor)
00184 {
00185     const IGtKernel& kernel = instrumentor.Kernel();
00186     uint32_t numTiles = (instrumentor.Coder().IsTileIdSupported()) ? GTPin_GetCore()->GenArch().MaxTiles(kernel.GpuPlatform()) : 1;
00187 
00188 
00189     // Handle common instrumentation knobs
00190     HandleCommonInstrumnetationKnobs(instrumentor);
00191     // Create new KernelData object and add it to the data base
00192     auto ret = _kernels.emplace(piecewise_construct, forward_as_tuple(kernel.Id()), forward_as_tuple(instrumentor, numTiles));
00193     if (ret.second)
00194     {
00195         MemTraceKernel& memTraceKernel = (*ret.first).second;
00196         if (!memTraceKernel.IsEnabled())
00197         {
00198             GTPIN_WARNING("MEMORYTRACE: The trace won't be generated for kernel " + memTraceKernel.Name());
00199             return;
00200         }
00201 
00202         const IGtCfg&   cfg   = instrumentor.Cfg();
00203         IGtVregFactory& vregs = instrumentor.Coder().VregFactory();
00204 
00205         // Initialize virtual registers
00206         _addrReg   = vregs.MakeMsgAddrScratch();
00207         _dataReg   = vregs.MakeMsgDataScratch(VREG_TYPE_HWORD);
00208         _offsetReg = vregs.Make(VREG_TYPE_DWORD);
00209         _tileIdReg = vregs.Make(VREG_TYPE_DWORD);
00210 
00211         GtGenProcedure loadTileIdProc;
00212         instrumentor.Coder().LoadTileId(loadTileIdProc, _tileIdReg);
00213 
00214         // Instrument kernel entries
00215         instrumentor.InstrumentEntries(loadTileIdProc);
00216 
00217         // Instrument basic blocks
00218         for (auto bblPtr : cfg.Bbls())
00219         {
00220             InstrumentBbl(instrumentor, *bblPtr, memTraceKernel);
00221         }
00222     }
00223 }
00224 
00225 void MemTrace::OnKernelRun(IGtKernelDispatch& dispatcher)
00226 {
00227     bool isProfileEnabled = false;
00228 
00229     const IGtKernel& kernel = dispatcher.Kernel();
00230     GtKernelExecDesc execDesc; dispatcher.GetExecDescriptor(execDesc);
00231     if (kernel.IsInstrumented() && IsKernelExecProfileEnabled(execDesc, kernel.GpuPlatform(), kernel.Name().Get()))
00232     {
00233         auto it = _kernels.find(kernel.Id());
00234         if (it != _kernels.end())
00235         {
00236             const MemTraceKernel&  memTraceKernel = it->second;
00237             if (memTraceKernel.IsEnabled())
00238             {
00239                 IGtProfileBuffer*     buffer        = dispatcher.CreateProfileBuffer(); GTPIN_ASSERT(buffer);
00240                 const GtProfileTrace& traceAccessor = memTraceKernel.TraceAccessor();
00241                 if (traceAccessor.Initialize(*buffer))
00242                 {
00243                     isProfileEnabled = true;
00244                 }
00245                 else
00246                 {
00247                     GTPIN_ERROR_MSG("MEMORYTRACE: Failed to write into memory buffer for kernel " + string(kernel.Name()));
00248                 }
00249             }
00250         }
00251     }
00252     dispatcher.SetProfilingMode(isProfileEnabled);
00253 }
00254 
00255 void MemTrace::OnKernelComplete(IGtKernelDispatch& dispatcher)
00256 {
00257     if (!dispatcher.IsProfilingEnabled())
00258     {
00259         return; // Do nothing with unprofiled kernel dispatches
00260     }
00261 
00262     const IGtKernel& kernel = dispatcher.Kernel();
00263     auto it = _kernels.find(kernel.Id());
00264     if (it != _kernels.end())
00265     {
00266         // Read the trace from the profile buffer
00267         MemTraceKernel&  memTraceKernel = it->second;
00268         memTraceKernel.AddMemTrace(dispatcher);
00269     }
00270 }
00271 
00272 bool MemTrace::InstrumentBbl(IGtKernelInstrument& instrumentor, const IGtBbl& bbl, const MemTraceKernel& memTraceKernel)
00273 {
00274     const KernelMemAccessInfo&  kernelMemAccessInfo = memTraceKernel.GetMemAccessInfo();
00275     const BblMemAccessInfo*     bblInfoPtr          = kernelMemAccessInfo.GetBblInfo(bbl.Id());
00276     if ((bblInfoPtr == nullptr) || bblInfoPtr->IsEmpty())
00277     {
00278         return false; // The basic block does not contain any memory instruction of interest
00279     }
00280 
00281     const IGtGenCoder&      coder         = instrumentor.Coder();
00282     const IGtCfg&           cfg           = instrumentor.Cfg();
00283     const BblMemAccessInfo& bblInfo       = *bblInfoPtr;
00284 
00285     // Generate code that allocates space for the new record in the trace and stores the trace record header.
00286     // Insert this procedure before the first memory access in the basic block.
00287     GtGenProcedure headerProc;
00288     const IGtIns&  firstMemIns = cfg.GetInstruction(bblInfo.MemInstructions().front().id);
00289     StoreRecordHeader(headerProc, coder, bbl, memTraceKernel, bblInfo.RecordSize());
00290     instrumentor.InstrumentInstruction(firstMemIns, GtIpoint::Before(), headerProc);
00291 
00292     // Generate code that stores address payload registers of SEND instructions in the basic block
00293     for (const auto& insInfo : bblInfo.MemInstructions())
00294     {
00295         GtGenProcedure      proc;
00296         const IGtIns&       ins                 = cfg.GetInstruction(insInfo.id);
00297         const DcSendMsg&    msg                 = insInfo.msg;
00298         GtRegNum            src0RegNum          = msg.Src0();
00299         GtRegNum            src1RegNum          = msg.Src1();
00300         uint32_t            addrPayloadLength   = msg.AddrPayloadLength();
00301         uint32_t            src0Length          = msg.Src0Length();
00302 
00303         if (gKnobTimeStamp && (ins.Id() != firstMemIns.Id()))
00304         {
00305             GtGenProcedure headerProc;
00306             StoreRecordHeader(headerProc, coder, bbl, memTraceKernel, bblInfo.RecordSize(), false);
00307             instrumentor.InstrumentInstruction(ins, GtIpoint::Before(), headerProc);
00308         }
00309 
00310         // If size of the SRC0 register range is greater or equal to the address payload length, the SRC0 range contains
00311         // the entire address payload.
00312         // Otherwise the address payload is split between SRC0 and SRC1 register ranges
00313         if (src0Length >= addrPayloadLength)
00314         {
00315             StoreRegRange(proc, coder, src0RegNum, addrPayloadLength);
00316         }
00317         else
00318         {
00319             StoreRegRange(proc, coder, src0RegNum, src0Length);
00320             if (src1RegNum.IsValid())
00321             {
00322                 StoreRegRange(proc, coder, src1RegNum, addrPayloadLength - src0Length);
00323             }
00324         }
00325         instrumentor.InstrumentInstruction(ins, GtIpoint::Before(), proc);
00326     }
00327 
00328     return true;
00329 }
00330 
00331 void MemTrace::StoreRecordHeader(GtGenProcedure& proc, const IGtGenCoder& coder, const IGtBbl& bbl,
00332                                  const MemTraceKernel& memTraceKernel, uint32_t recordSize, bool firstSendInBbl)
00333 {
00334     /// @return Subregister of _dataReg intended to hold the value of the MemTraceRecordHeader field whose offset is specified
00335     auto fieldReg = [&](uint32_t fieldOffset) -> GtReg { return GtReg(_dataReg, sizeof(uint32_t), fieldOffset / sizeof(uint32_t)); };
00336 
00337     GtReg idFieldReg     = fieldReg(offsetof(MemTraceRecordHeader, bblId));
00338     GtReg tileIdFieldReg = fieldReg(offsetof(MemTraceRecordHeader, tileId));
00339     GtReg ceFieldReg     = fieldReg(offsetof(MemTraceRecordHeader, ce));
00340     GtReg dmFieldReg     = fieldReg(offsetof(MemTraceRecordHeader, dm));
00341     GtReg tm0FieldReg    = fieldReg(offsetof(MemTraceRecordHeader, tm00));
00342 
00343     IGtInsFactory&  insF = coder.InstructionFactory();
00344     GtPredicate     predicate(FlagReg(0));
00345 
00346     // Set values of MemTraceRecordHeader fields in _dataReg 
00347     if (firstSendInBbl)
00348     {
00349         proc += insF.MakeShl(idFieldReg, StateReg(0), 16);                      // idFieldReg[16:31] = sr0.0
00350         proc += insF.MakeAdd(idFieldReg, idFieldReg, GtImmU32(bbl.Id()));       // idFieldReg[0:15]  = bbl.Id()
00351         proc += insF.MakeMov(tileIdFieldReg, _tileIdReg);                       // Tile ID
00352         proc += insF.MakeMov(ceFieldReg, ChannelEnableReg());                   // ceFieldReg        = ChannelEnableReg()
00353         proc += insF.MakeMov(dmFieldReg, DispatchMaskReg());                    // dmFieldReg        = DispatchMaskReg()
00354     }
00355     if (gKnobTimeStamp)
00356     {
00357         proc += insF.MakeMov(tm0FieldReg, GtRegRegion(TimeStampReg(), GtStride(2, 2, 1), GED_DATA_TYPE_ud), { 2 });
00358     }
00359 
00360     // Allocate new record in the trace.
00361     if (firstSendInBbl)
00362     {
00363         // Set _offsetReg = offset of the allocated record in the profile buffer, _addrReg = address of the allocated record
00364         memTraceKernel.TraceAccessor().ComputeNewRecordOffset(coder, proc, recordSize, _offsetReg);
00365     }
00366     coder.ComputeAddress(proc, _addrReg, _offsetReg);
00367 
00368     // Zero _offsetReg if the trace buffer is overflowed (predicate == true)
00369     proc += insF.MakeMov(_offsetReg, 0).SetPredicate(predicate);
00370 
00371     //if (!predicate) { STORE buffer[_offsetReg] = _dataReg;  _offsetReg += aligned-header-size}
00372     uint32_t alignedHeaderSize = MemTraceRecordHeader::AlignedSize(memTraceKernel.GenModel());
00373     coder.StoreMemBlock(proc, _addrReg, _dataReg, alignedHeaderSize, !predicate);
00374     proc += insF.MakeAdd(_offsetReg, _offsetReg, alignedHeaderSize).SetPredicate(!predicate);
00375 
00376     if (!proc.empty()) { proc.front()->AppendAnnotation(__func__); }
00377 }
00378 
00379 void MemTrace::StoreRegRange(GtGenProcedure& proc, const IGtGenCoder& coder, uint32_t firstRegNum, uint32_t numRegs)
00380 {
00381     if (numRegs == 0) { return; }
00382 
00383     IGtInsFactory&  insF        = coder.InstructionFactory();
00384     uint32_t        grfRegSize  = insF.GenModel().GrfRegSize();
00385     GtReg           flagReg     = FlagReg(0);
00386     GtPredicate     predicate(flagReg);
00387     uint32_t blockSize = numRegs * grfRegSize;
00388 
00389     // predicate = (_offsetReg != 0) = trace is not overflowed
00390     proc += insF.MakeCmp(GED_COND_MODIFIER_nz, flagReg, _offsetReg, 0, {16});
00391     
00392     // store address payload in addrReg = buffer[offsetReg]
00393     coder.ComputeAddress(proc, _addrReg, _offsetReg);
00394     coder.StoreRegRange(proc, _addrReg, firstRegNum, numRegs, predicate);
00395    
00396     proc += insF.MakeAdd(_offsetReg, _offsetReg, blockSize).SetPredicate(predicate);;
00397 
00398     if (!proc.empty()) { proc.front()->AppendAnnotation(__func__); }
00399 }
00400 
00401 void MemTrace::OnFini()
00402 {
00403     MemTrace& me = *Instance();
00404     IGtCore* gtpinCore = GTPin_GetCore();
00405     for (auto& ref : me._kernels)
00406     {
00407         const MemTraceKernel&  memTraceKernel = ref.second;
00408         MemoryTracePostProcessor(*gtpinCore, memTraceKernel)();
00409         memTraceKernel.DumpAsm();
00410     }
00411 }
00412 
00413 /* ============================================================================================= */
00414 // MemoryTracePreProcessor implementation
00415 /* ============================================================================================= */
00416 const char* MemoryTracePreProcessor::_kernelPreProcessFileName   = "memorytrace_pre_process.txt";
00417 const char* MemoryTracePreProcessor::_dispatchPreProcessFileName = "memorytrace_pre_process_dispatch.txt";
00418 
00419 MemoryTracePreProcessor::MemoryTracePreProcessor()
00420 {
00421     if (gKnobPhase == 2)
00422     {
00423         // Read the data collected during the preprocessing phase
00424         std::ifstream is(_kernelPreProcessFileName);
00425         GTPIN_ASSERT_MSG(is, string("File ") + _kernelPreProcessFileName + " does not exist. The trace won't be generated");
00426         is >> _kernelCounters;
00427     }
00428     else if (gKnobPhase == 1)
00429     {
00430         // Create pre_process files or remove old pre_process files's content if they exist
00431         CreateCleanFile(_kernelPreProcessFileName);
00432         CreateCleanFile(_dispatchPreProcessFileName);
00433     }
00434 }
00435 
00436 MemoryTracePreProcessor* MemoryTracePreProcessor::Instance()
00437 {
00438     static MemoryTracePreProcessor instance;
00439     return &instance;
00440 }
00441 
00442 void MemoryTracePreProcessor::OnFini()
00443 {
00444     MemoryTracePreProcessor&  tool = *Instance();
00445     tool.DumpKernelProfiles(_kernelPreProcessFileName);
00446     tool.DumpDispatchProfiles(_dispatchPreProcessFileName);
00447 }
00448 
00449 uint64_t MemoryTracePreProcessor::TraceSize(const string& extKernelName) const
00450 {
00451     auto it = _kernelCounters.find(extKernelName);
00452     return ((it == _kernelCounters.end()) ? 0 : it->second.weight);
00453 }
00454 
00455 uint32_t MemoryTracePreProcessor::GetBblWeight(IGtKernelInstrument& kernelInstrument, const IGtBbl& bbl) const
00456 {
00457     // For the memorytrace tool, the weight of the BBL is the trace record size in this BBL
00458     return BblMemAccessInfo(kernelInstrument, bbl).RecordSize();
00459 }
00460 
00461 void MemoryTracePreProcessor::AggregateDispatchCounters(KernelWeightCounters& kc, KernelWeightCounters dc) const
00462 {
00463     kc.weight = std::max(kc.weight, dc.weight);
00464     kc.freq += dc.freq;
00465 }
00466 
00467 /* ============================================================================================= */
00468 // MemoryTracePostProcessor implementation
00469 /* ============================================================================================= */
00470 const char* MemoryTracePostProcessor::_traceFileName = "memorytrace_compressed.bin";
00471 
00472 MemoryTracePostProcessor::MemoryTracePostProcessor(const IGtCore& gtpinCore, const MemTraceKernel& memTraceKernel) :
00473     _kernel(&memTraceKernel), _memAccessInfo(&memTraceKernel.GetMemAccessInfo()),
00474     _kernelDir(JoinPath(string(gtpinCore.ProfileDir()), memTraceKernel.UniqueName())) {}
00475 
00476 bool MemoryTracePostProcessor::operator()()
00477 {
00478     if (!MakeDirectory(_kernelDir))
00479     {
00480         GTPIN_WARNING("MEMORYTRACE: Could not create directory " + _kernelDir);
00481         return false;
00482     }
00483 
00484     // Process traces recorded in kernel dispatches
00485     for (const MemTraceDispatch& trace : _kernel->GetTraces())
00486     {
00487         if (!trace.IsEmpty())
00488         {
00489             if (trace.IsTrimmed())
00490             {
00491                 GTPIN_WARNING("MEMORYTRACE: Detected trace buffer overflow in kernel " + _kernel->Name());
00492             }
00493 
00494             string subdir   = trace.KernelExecDesc().ToString(_kernel->Platform(), ExecDescFileNameFormat());
00495             string dir      = MakeSubDirectory(_kernelDir, subdir);
00496             string filePath = JoinPath(dir, _traceFileName);
00497 
00498             ofstream fs(filePath, std::ios::binary);
00499             if (!fs)
00500             {
00501                 GTPIN_WARNING("MEMORYTRACE: Could not create file " + filePath);
00502                 continue;
00503             }
00504             StoreTrace(trace, fs);
00505         }
00506     }
00507     return true;
00508 }
00509 
00510 void MemoryTracePostProcessor::StoreTrace(const MemTraceDispatch& trace, std::ofstream& fs)
00511 {
00512     const uint8_t* traceData         = trace.Data();
00513     uint32_t       traceSize         = trace.Size();
00514     uint32_t       alignedHeaderSize = MemTraceRecordHeader::AlignedSize(_kernel->GenModel());
00515 
00516     // Associate trace records with threads - populate _threadTraceRecords array
00517     const GtStateRegAccessor& sra = _kernel->GenModel().StateRegAccessor();
00518     uint32_t maxThreads = _kernel->GenModel().MaxThreads(); // Max number of HW threads
00519 
00520     _threadTraceRecords.resize(_kernel->NumTiles());
00521 
00522     for (uint32_t tile = 0; tile < _kernel->NumTiles(); tile++)
00523     {
00524         _threadTraceRecords[tile].clear();
00525         _threadTraceRecords[tile].resize(maxThreads);
00526     }
00527 
00528     std::vector<uint32_t> numProfiledThreads; // Number of profiled (active) threads
00529     numProfiledThreads.resize(_kernel->NumTiles(), 0);
00530 
00531     for (uint32_t recordOffset = 0; recordOffset + sizeof(MemTraceRecordHeader) <= traceSize;)
00532     {
00533         // Retrive thread ID and BBL ID from the record header
00534         const MemTraceRecordHeader* header = (const MemTraceRecordHeader*)(traceData + recordOffset);
00535         uint32_t tid    = sra.GetGlobalTid(header->sr0);
00536         uint32_t bblId  = header->bblId;
00537         uint32_t tileId = header->tileId; GTPIN_ASSERT(tileId < _kernel->NumTiles());
00538         const BblMemAccessInfo* bblInfoPtr = _memAccessInfo->GetBblInfo(bblId); GTPIN_ASSERT(bblInfoPtr != nullptr);
00539         uint32_t recordSize = bblInfoPtr->RecordSize();
00540         if (recordOffset + recordSize > traceSize)
00541         {
00542             break; // end of trace
00543         }
00544 
00545         auto& tileThreadRecords = _threadTraceRecords[tileId];
00546         auto& threadTraceRecords = tileThreadRecords[tid];
00547 
00548         // Add a new trace record reference to _threadTraceRecords
00549         if (threadTraceRecords.empty()) { ++numProfiledThreads[tileId]; } // Increment thread count on the first relevant record
00550         threadTraceRecords.emplace_back(TraceRecord{ header, recordSize });
00551 
00552         recordOffset += recordSize;
00553     }
00554 
00555     uint32_t registerSize = _kernel->GenModel().GrfRegSize();
00556 
00557     StoreMemAccessInfo(fs);         // Store static information about memory accesses in the kernel
00558     Store(registerSize, fs);        // Store register size
00559 
00560     uint32_t timeStampIncluded = gKnobTimeStamp ? 1 : 0;
00561 
00562     Store(timeStampIncluded, fs);   // Store whether timestamp is included
00563 
00564     // Compute and store the number of involved tiles
00565     uint32_t numOfTiles = 0; 
00566     for (uint32_t i = 0; i < numProfiledThreads.size(); i++)
00567     {
00568         numOfTiles += (numProfiledThreads[i] == 0) ? 0 : 1;
00569     }
00570     Store(numOfTiles, fs);
00571 
00572     for (uint32_t tileId = 0; tileId < _threadTraceRecords.size(); tileId++)
00573     {
00574         if (numProfiledThreads[tileId] == 0) { continue; }
00575 
00576         Store(tileId, fs);
00577 
00578         Store(numProfiledThreads[tileId], fs);
00579 
00580         // Store per-thread traces
00581         for (uint32_t tid = 0; tid < maxThreads; tid++)
00582         {
00583             const auto& tileThreadRecords = _threadTraceRecords[tileId];
00584             const auto& traceRecordList = tileThreadRecords[tid];
00585 
00586             if (traceRecordList.empty()) { continue; }
00587 
00588             StoreGlobalTid(tid, fs);    // Store Global Thread Identifier
00589 
00590             uint32_t numRecords = (uint32_t)traceRecordList.size();
00591             Store(numRecords, fs);      // Store #records collected in the thread
00592 
00593             // Store trace records
00594             for (const auto& record : traceRecordList)
00595             {
00596                 const auto& header   = *(record.header);
00597                 uint32_t    bblId    = header.bblId;
00598                 uint32_t    execMask = header.ce & header.dm;
00599 
00600                 Store(bblId, fs);       // Store BBL ID
00601                 Store(execMask, fs);    // Store dynamic execution mask
00602 
00603                 if (timeStampIncluded)
00604                 {
00605                     Store(header.tm00, fs);  // Store tm0.0
00606                     Store(header.tm01, fs);  // Store tm0.1
00607                 }
00608                 // Store address paylads
00609                 if (record.size > alignedHeaderSize)
00610                 {
00611                     fs.write((const char*)(record.header) + alignedHeaderSize, record.size - alignedHeaderSize);
00612                 }
00613             }
00614         }
00615     }
00616 }
00617 
00618 void MemoryTracePostProcessor::StoreMemAccessInfo(std::ofstream& fs)
00619 {
00620     // Store static information about memory accesses in BBLs
00621     uint32_t numBbls = _memAccessInfo->NumMemBbls();
00622     Store(numBbls, fs);                                 // Store the number of BBLs that access memory
00623 
00624     for (const auto& entry : _memAccessInfo->GetMemAccessMap())
00625     {
00626         uint32_t                bblId               = entry.first;
00627         const BblMemAccessInfo& bblInfo             = entry.second;
00628         uint32_t                numMemInstructions  = (uint32_t)bblInfo.MemInstructions().size();
00629 
00630         Store(bblId, fs);                               // Store BBL ID
00631         Store(numMemInstructions, fs);                  // Store the number of memory instructions in BBL
00632         for (const auto& memIns : bblInfo.MemInstructions())
00633         {
00634             MemInsInfo memInsInfo(memIns);
00635             Store(memInsInfo, fs);                    // Store the memory instruction descriptor
00636         }
00637     }
00638 }
00639 
00640 void MemoryTracePostProcessor::StoreGlobalTid(uint32_t gtid, std::ofstream& fs)
00641 {
00642     const GtStateRegAccessor& sra = _kernel->GenModel().StateRegAccessor();
00643     uint32_t sr0 = sra.SetGlobalTid(0, gtid);
00644 
00645     auto storeSr0Field = [&](const ScatteredBitFieldU32& sbf)
00646     {
00647         uint32_t val = (sbf.IsEmpty() ? UINT32_MAX : sbf.GetValue(sr0));
00648         Store(val, fs);
00649     };
00650 
00651     storeSr0Field(sra.SliceIdField());
00652     storeSr0Field(sra.DualSubSliceIdField());
00653     storeSr0Field(sra.SubSliceIdField());
00654     storeSr0Field(sra.EuIdField());
00655     storeSr0Field(sra.ThreadSlotField());
00656 }
00657 
00658 MemoryTracePostProcessor::MemInsInfo::MemInsInfo(const MemIns& memIns)
00659 {
00660     offset              = memIns.offset;
00661     isWrite             = memIns.msg.IsWrite() && (memIns.msg.Opcode() != GED_DP_OPCODE_LOAD_2D_BLOCK);
00662     isBlock2D           = (memIns.msg.Opcode() == GED_DP_OPCODE_LOAD_2D_BLOCK) || (memIns.msg.Opcode() == GED_DP_OPCODE_STORE_2D_BLOCK);
00663     isScatter           = memIns.msg.IsScatter();
00664     isBTS               = memIns.msg.IsBts();
00665     isSLM               = memIns.msg.IsSlm();
00666     isScratch           = memIns.msg.IsScratch();
00667     isAtomic            = memIns.msg.IsAtomic();
00668     isFence             = memIns.msg.IsMemFence();
00669     addressWidth        = (memIns.msg.IsA64() || isBlock2D) ? 64 : 32;
00670     simdWidth           = isBlock2D ? 1 : memIns.msg.SimdWidth();
00671     bti                 = memIns.msg.Bti();
00672     elementSize         = memIns.msg.ElementSize();
00673     numElements         = memIns.msg.NumElements();
00674     addrPayloadLength   = memIns.msg.AddrPayloadLength();
00675     dataPort            = (memIns.msg.IsDp0() ? 0 :
00676                           (memIns.msg.IsDp1() ? 1 :
00677                           (memIns.msg.IsUgm() ? 2 :
00678                           (memIns.msg.IsUgml()? 3 :
00679                           (memIns.msg.IsTgm() ? 4 :
00680                           (memIns.msg.IsSlm() ?  5 : 6))))));
00681     isEOT               = memIns.msg.IsEot();
00682     isMedia             = memIns.msg.IsMedia();
00683     execSize            = memIns.msg.ExecSize();
00684     channelOffset       = memIns.msg.ChannelOffset();
00685 }
00686 
00687 /* ============================================================================================= */
00688 // GTPin_Entry
00689 /* ============================================================================================= */
00690 EXPORT_C_FUNC void GTPin_Entry(int argc, const char *argv[])
00691 {
00692     ConfigureGTPin(argc, argv);
00693     if (gKnobPhase == 1)
00694     {
00695         MemoryTracePreProcessor::Instance()->Register();
00696         atexit(MemoryTracePreProcessor::OnFini);
00697     }
00698     else
00699     {
00700         GTPIN_ASSERT_MSG((gKnobPhase == 2), "MEMORYTRACE: Invalid phase value. Should be 1 or 2, provided " + std::to_string(gKnobPhase));
00701         MemTrace::Instance()->Register();
00702         atexit(MemTrace::OnFini);
00703     }
00704 }

(Back to the list of all GTPin Sample Tools)


 All Data Structures Functions Variables Typedefs Enumerations Enumerator


  Copyright (C) 2013-2025 Intel Corporation
SPDX-License-Identifier: MIT