GTPin
GTPin: Opcodeprof Sample Tool

The Opcodeprof tool provides the dynamic frequencies of each of the kernel instructions ("instruction mix"), in the form of opcode histograms

Running the Opcodeprof tool

To run the Opcodeprof tool (in its default configuration), use the following command:

Profilers/Bin/gtpin -t opcodeprof -- app

How to understand Opcodeprof results

When you run the in-house GTPin Opcodeprof tool in its default configuration, the tool generates the directory: GTPIN_PROFILE_OPCODEPROF0. The profiling results are stored in the sub-folder: GTPIN_PROFILE_OPCODEPROF0\Session_Final. Results are provided for each kernel/shader configuration, and also in an accumulated (total) configuration.

For each binary kernel/shader that was dispatched to the device, the tool generates a directory with an extended kernel name: KernelName__CompilerGeneratedName, as shown in the following screenshot:


opcodeprof.jpg


NOTE: If GTPin does not know the name of the kernel, then a compiler-generated name -- with the format CS_asmf54af91315561f54_simd8 -- is assigned as the kernel name. In the compiler-generated name, the prefix indicates the kernel type; the suffix indicates the SIMD width to which this kernel was compiled; and the 16-digit number is the hash ID of the IR representation of this kernel. This is shown in the previous screenshot.

Each kernel/shader folder contains the resulting file opcodeprof_total.out, which summarizes the results for this specific kernel/shader. In addition, an opcodeprof_total.out summarizing the results for this specific kernel/shader. In addition, a opcodeprof_total.out file in the root directory summarizes all the application kernels/shaders together.

Each individual opcodeprof_total.out file has the following format:

DYNAMIC OPCODE HISTOGRAMS PER EXECUTION DATA TYPES
==================================================

DATA TYPE: ud

OPCODE Report :  Opcode  SIMD         Static (%)            Dynamic (%)
                 sendsc    16              2 (16.7%)          11870 (16.7%)
                  sends    16              2 (16.7%)          11870 (16.7%)
                    mov     1              2 (16.7%)          11870 (16.7%)
                    add     1              1 ( 8.3%)           5935 ( 8.3%)
                    mov     8              1 ( 8.3%)           5935 ( 8.3%)


DATA TYPE: f

OPCODE Report :  Opcode  SIMD         Static (%)            Dynamic (%)
                    pln    16              4 (33.3%)          23740 (33.3%)


// kernel name:    PS_asm537c7dc831196b2d_simd32

Static  instruction count = 12
Dynamic instruction count = 71220


[           5935]     (W)      mov (8|M0)               r21.0<1>:ud   r0.0<8;8,1>:ud                   {Compacted}
[           5935]     (W)      pln (16|M0)              r19.0<1>:f    r12.4<0;1,0>:f    r3.0<8;8,1>:f    {Compacted}
[           5935]     (W)      pln (16|M0)              r17.0<1>:f    r12.0<0;1,0>:f    r3.0<8;8,1>:f    {Compacted}
[           5935]     (W)      add (1|M0)               a0.0<1>:ud    r11.0<0;1,0>:ud   0x102:ud         {Compacted}
[           5935]     (W)      mov (1|M0)               r21.3<1>:ud   r11.1<0;1,0>:ud                 
[           5935]     (W)      pln (16|M16)             r15.0<1>:f    r12.4<0;1,0>:f    r7.0<8;8,1>:f   
[           5935]     (W)      mov (1|M0)               r21.2<1>:ud   0x0:ud                           {Compacted}
[           5935]     (W)      pln (16|M16)             r13.0<1>:f    r12.0<0;1,0>:f    r7.0<8;8,1>:f   
[           5935]              sends (16|M0)            r0:w     r21     r17     a0.0        0x28C00FC
[           5935]              sends (16|M16)           r112:w   r21     r13     a0.0        0x28C00FC
[           5935]              sendsc (16|M0)           null:w   r0      null    0x5         0x10031000
[           5935]              sendsc (16|M16)          null:w   r112    null    0x25        0x10031800 {EOT}

The results are presented as dynamic histograms of opcodes, grouped by operational data type. For each data type and for each opcode, the following information is reported: the opcode mnemonic, its operational SIMD width, the static count of the tuple (Opcode, SIMD, data type) encountered within the kernel, the percentage among all static instructions of the kernel, the dynamic count of the above tuple, and its percentage among all dynamic instructions that were counted. The following data is then reported: kernel name, total number of static instructions, and the total number of dynamic instructions. Finally this data is reported: a listing of all assembly instructions of the kernel, along with the dynamic count of each instruction.

In the root directory, the file opcodeprof_total.out summarizes the results of all kernels/shaders of the application. It provides the total number of binary kernels, the total dynamic number of instructions, the dynamic opcode histograms per data type, and the listing of all assembly kernels, along with dynamic frequencies of each instruction, as shown here:

Total number of kernels:      8
Total number of instructions: 1907624

DYNAMIC OPCODE HISTOGRAMS PER EXECUTION DATA TYPES
==================================================

DATA TYPE: ud

OPCODE Report :  Opcode  SIMD        Dynamic (%)
                  sends     8         146262 ( 7.7%)
                  sends    16          92520 ( 4.9%)
                    add     1          89247 ( 4.7%)
                    mov     1          38574 ( 2.0%)
                 sendsc    16          22560 ( 1.2%)
                    mov     8           9615 ( 0.5%)
                 sendsc     8           6342 ( 0.3%)


DATA TYPE: d

OPCODE Report :  Opcode  SIMD        Dynamic (%)
                    mov     8          69960 ( 3.7%)


DATA TYPE: uw

OPCODE Report :  Opcode  SIMD        Dynamic (%)
                    mov     1          69960 ( 3.7%)


DATA TYPE: f

OPCODE Report :  Opcode  SIMD        Dynamic (%)
                    mad     8         559680 (29.3%)
                    mul     8         285182 (14.9%)
                    add     8         279840 (14.7%)
                    mov     8         145262 ( 7.6%)
                    pln    16          45120 ( 2.4%)
                    mul    16          17408 ( 0.9%)
                    mov    16          17408 ( 0.9%)
                    pln     8          12684 ( 0.7%)




[          34980]     (W)      mov (1|M0)               r0.2<1>:ud    0x0:uw
[          34980]     (W)      add (1|M0)               a0.0<1>:ud    r2.2<0;1,0>:ud    0xA:ud           {Compacted}
[          34980]              mov (8|M0)               r119.0<1>:f   r6.0<8;8,1>:f                    {Compacted}
[          34980]     (W)      sends (16|M0)            r117:w   r0      null    a0.0        0x22843FC  //  wr:1h+?, rd:2, ?
[          34980]              mov (8|M0)               r120.0<1>:f   r7.0<8;8,1>:f                    {Compacted}
[          34980]     (W)      mov (8|M0)               r112.0<1>:d   r1.0<8;8,1>:d     

Opcodeprof tool: modes of operation

To print the total results of just the application profiling, use the following command line:

Profilers/Bin/gtpin -t opcodeprof --total_only -- app

In the above case, only one file is generated: GTPIN_PROFILE_OPCODEPROF0\Session_Final\opcodeprof_total.out.

To obtain the profiling for each HW thread, use the following command line:

Profilers/Bin/gtpin -t opcodeprof --per_thread_data --num_thread_blocks 0 -- app

In this case, the files like opcodeprof__s_0_ss_2_eu_7_tid_5.out are also generated for each binary kernel/shader in the corresponding sub-folder. In the files, S indicates Slice number, SS indicates Sub-Slice number, EU indicates Execution Unit number, and TID indicates the HW Thread ID number.

(Back to the list of all GTPin Sample Tools)

opcodeprof.h

00001 /*========================== begin_copyright_notice ============================
00002 Copyright (C) 2018-2024 Intel Corporation
00003 
00004 SPDX-License-Identifier: MIT
00005 ============================= end_copyright_notice ===========================*/
00006 
00007 /*!
00008  * @file Opcodeprof tool definitions
00009  */
00010 
00011 #ifndef OPCODEPROF_H_
00012 #define OPCODEPROF_H_
00013 
00014 #include <map>
00015 #include <vector>
00016 #include <string>
00017 #include <tuple>
00018 
00019 #include "gtpin_api.h"
00020 #include "gtpin_tool_utils.h"
00021 #include "opcodeprof_utils.h"
00022 
00023 using namespace gtpin;
00024 
00025 
00026 /* ============================================================================================= */
00027 // Class Opcodeprof
00028 /* ============================================================================================= */
00029 /*!
00030  * Implementation of the IGtTool interface for the opcodeprof tool
00031  */
00032 class Opcodeprof : public GtTool
00033 {
00034 public:
00035     /// Implementation of the IGtTool interface
00036     const char* Name() const { return "opcodeprof"; }
00037 
00038     void OnKernelBuild(IGtKernelInstrument& instrumentor);
00039     void OnKernelRun(IGtKernelDispatch& dispatcher);
00040     void OnKernelComplete(IGtKernelDispatch& dispatcher);
00041 
00042 public:
00043 
00044     static Opcodeprof* Instance();               ///< @return Single instance of this class
00045     static void OnFini() { Instance()->Fini(); } ///< Callback function registered with atexit()
00046 
00047 private:
00048     Opcodeprof() = default;
00049     Opcodeprof(const Opcodeprof&) = delete;
00050     Opcodeprof& operator = (const Opcodeprof&) = delete;
00051     ~Opcodeprof() = default;
00052 
00053     void Fini();              /// Post process and dump profiling data
00054 
00055 private:
00056     std::map<GtKernelId, OpcodeprofKernelProfile> _kernels;  ///< Collection of kernel profiles
00057 };
00058 
00059 #endif

opcodeprof.cpp

00001 /*========================== begin_copyright_notice ============================
00002 Copyright (C) 2018-2025 Intel Corporation
00003 
00004 SPDX-License-Identifier: MIT
00005 ============================= end_copyright_notice ===========================*/
00006 
00007 /*!
00008  * @file Implementation of the Opcodeprof tool
00009  */
00010 
00011 #include <fstream>
00012 #include <sstream>
00013 #include <iomanip>
00014 #include <algorithm>
00015 #include <functional>
00016 
00017 #include "opcodeprof.h"
00018 
00019 using namespace gtpin;
00020 
00021 /* ============================================================================================= */
00022 // Configuration
00023 /* ============================================================================================= */
00024 Knob<int>  knobNumThreadBuckets("num_thread_buckets", 32, "Number of thread buckets. 0 - maximum thread buckets");
00025 
00026 /* ============================================================================================= */
00027 // Opcodeprof implementation
00028 /* ============================================================================================= */
00029 Opcodeprof* Opcodeprof::Instance()
00030 {
00031     static Opcodeprof instance;
00032     return &instance;
00033 }
00034 
00035 void Opcodeprof::OnKernelBuild(IGtKernelInstrument& instrumentor)
00036 {
00037     const IGtKernel&           kernel    = instrumentor.Kernel();
00038     const IGtCfg&              cfg       = instrumentor.Cfg();
00039     const IGtGenCoder&         coder     = instrumentor.Coder();
00040     const IGtGenModel&         genModel  = kernel.GenModel();
00041     IGtProfileBufferAllocator& allocator = instrumentor.ProfileBufferAllocator();
00042     IGtVregFactory&            vregs     = coder.VregFactory();
00043     IGtInsFactory&             insF      = coder.InstructionFactory();
00044 
00045     // Initialize virtual registers
00046     GtReg addrReg = vregs.MakeMsgAddrScratch(); ///< Virtual register that holds address within profile buffer
00047 
00048     // Allocate the profile buffer. It will hold single OpcodeprofRecord per each basic block in each thread bucket
00049     uint32_t numThreadBuckets = (knobNumThreadBuckets == 0) ? genModel.MaxThreadBuckets() : knobNumThreadBuckets;
00050     uint32_t numRecords = cfg.NumBbls();
00051     GtProfileArray profileArray(sizeof(OpcodeprofRecord), numRecords, numThreadBuckets);
00052     profileArray.Allocate(allocator);
00053 
00054     // Instrument basic blocks
00055     for (auto bblPtr : cfg.Bbls())
00056     {
00057         if (!bblPtr->IsEmpty())
00058         {
00059             GtGenProcedure proc;
00060             uint32_t recordNum = bblPtr->Id();
00061 
00062             // addrReg =  address of the current thread's OpcodeprofRecord in the profile buffer
00063             profileArray.ComputeAddress(coder, proc, addrReg, recordNum);
00064 
00065             // [addrReg].freq++
00066             proc += insF.MakeAtomicInc(NullReg(), addrReg, GED_DATA_TYPE_ud);
00067 
00068             if (!proc.empty()) { proc.front()->AppendAnnotation(__func__); }
00069             InstrumentBbl(instrumentor , *bblPtr, GtIpoint::Before(), proc);
00070         }
00071     }
00072 
00073     // Create OpcodeprofKernelProfile object that represents profile of this kernel
00074     _kernels.emplace(kernel.Id(), OpcodeprofKernelProfile(kernel, cfg, profileArray));
00075 }
00076 
00077 void Opcodeprof::OnKernelRun(IGtKernelDispatch& dispatcher)
00078 {
00079     bool isProfileEnabled = false;
00080 
00081     const IGtKernel& kernel = dispatcher.Kernel();
00082     GtKernelExecDesc execDesc; dispatcher.GetExecDescriptor(execDesc);
00083     if (kernel.IsInstrumented() && IsKernelExecProfileEnabled(execDesc, kernel.GpuPlatform(), kernel.Name().Get()))
00084     {
00085         auto it = _kernels.find(kernel.Id());
00086 
00087         if (it != _kernels.end())
00088         {
00089             IGtProfileBuffer*          buffer        = dispatcher.CreateProfileBuffer(); GTPIN_ASSERT(buffer);
00090             OpcodeprofKernelProfile&   kernelProfile = it->second;
00091             const GtProfileArray&      profileArray  = kernelProfile.GetProfileArray();
00092             if (profileArray.Initialize(*buffer))
00093             {
00094                 isProfileEnabled = true;
00095             }
00096             else
00097             {
00098                 GTPIN_ERROR_MSG("OPCODEPROF: " + std::string(kernel.Name()) + " : Failed to write into memory buffer");
00099             }
00100         }
00101     }
00102     dispatcher.SetProfilingMode(isProfileEnabled);
00103 }
00104 
00105 void Opcodeprof::OnKernelComplete(IGtKernelDispatch& dispatcher)
00106 {
00107     if (!dispatcher.IsProfilingEnabled())
00108     {
00109         return; // Do nothing with unprofiled kernel dispatches
00110     }
00111 
00112     const IGtKernel& kernel = dispatcher.Kernel();
00113     auto it = _kernels.find(kernel.Id());
00114 
00115     if (it != _kernels.end())
00116     {
00117         const IGtProfileBuffer*  buffer        = dispatcher.GetProfileBuffer(); GTPIN_ASSERT(buffer);
00118         OpcodeprofKernelProfile& kernelProfile = it->second;
00119         const GtProfileArray&    profileArray  = kernelProfile.GetProfileArray();
00120 
00121         for (uint32_t recordNum = 0; recordNum != profileArray.NumRecords(); ++recordNum)
00122         {
00123             for (uint32_t threadBucket = 0; threadBucket < profileArray.NumThreadBuckets(); ++threadBucket)
00124             {
00125                 OpcodeprofRecord record;
00126                 if (!profileArray.Read(*buffer, &record, recordNum, 1, threadBucket))
00127                 {
00128                     GTPIN_ERROR_MSG("OPCODEPROF: " + std::string(kernel.Name()) + " : Failed to read from memory buffer");
00129                 }
00130                 else
00131                 {
00132                     kernelProfile.Accumulate(record, (BblId)recordNum);
00133                 }
00134             }
00135         }
00136     }
00137 }
00138 
00139 void Opcodeprof::Fini()
00140 {
00141     DumpProfile(_kernels);
00142     DumpAsm(_kernels);
00143 }
00144 
00145 /* ============================================================================================= */
00146 // GTPin_Entry
00147 /* ============================================================================================= */
00148 EXPORT_C_FUNC void GTPin_Entry(int argc, const char* argv[])
00149 {
00150     ConfigureGTPin(argc, argv);
00151     Opcodeprof::Instance()->Register();
00152     atexit(Opcodeprof::OnFini);
00153 }

(Back to the list of all GTPin Sample Tools)


 All Data Structures Functions Variables Typedefs Enumerations Enumerator


  Copyright (C) 2013-2025 Intel Corporation
SPDX-License-Identifier: MIT