|
GTPin
|
Instructions and recommendations on how to develop, build and run a GTPin tool
The list of the ready-to-use sample tools can be found in GTPin Sample Tools .
Users, planning to develop their own tool, are advised to review GTPin: API Reference to learn supported GTPin interfaces in detail.
The profiling tool communicates with the GTPin framework by means of the GTPin Tool API. With this API the tool can:
The GTPin API can be viewed as a two-level interface, which allows tools to use either Low- or High- Level Instrumentation facilities to achieve their specific goals and requirements. If needed, both interface levels can be freely mixed in the same tool.
Low Level Instrumentation interface (LLIF)
High Level Instrumentation interface (HLIF)
The GTPin Tool API is a collection of C++ abstract interfaces that are safely exposed through the library boundaries, and simple, inlinable primitives, naturally sharable between GTPin and tools.
The only C function exposed by GTPin is GTPin_GetCore(). The only C function that must be implemented by the tool is GTPin_Entry().
The diagram below depicts the major GTPin Tool API classes and the communication flow between GTPin engine and the tool:
Logically, the communication session between GTPin and the tool can be divided into five phases, described in the following sections.
The GTPin initialization flow includes a number of steps:
The prototype of the main function, that must be implemented and exposed externally by the tool, is GTPin_Entry(int argc, const char *argv[])
The main function is expected to complete the initialization phase: parse the command line, configure GTPin, and register itself with the GTPin core.
A typical implementation of the GTPin_Entry function looks like this:
EXPORT_C_FUNC void GTPin_Entry(int argc, const char *argv[]) { // Parse command line and configure GTPin ConfigureGTPin(argc, argv); // Register the tool (callbacks) with the GTPin core MyGTPinTool::Instance()->Register(); // Register the termination function atexit(MyGTPinTool::OnFini); }
where MyGTPinTool is a class that implements the IGtTool interface
class MyGTPinTool : public IGtTool { public: // Implementation of the IGtTool interface const char* Name() const override { return "My GTPin Tool"; } uint32_t ApiVersion() const override { return GTPIN_API_VERSION; } void OnKernelBuild(IGtKernelInstrument&) override; void OnKernelRun(IGtKernelDispatch&) override; void OnKernelComplete(IGtKernelDispatch&) override; // Register the MyGTPinTool instance with the GTPin core bool Register() { IGtCore* gtpinCore = GTPin_GetCore(); if (!gtpinCore->RegisterTool(*this)) { GTPIN_ERROR_MSG(std::string(Name()) + ": Failed registration with GTPin core."); return false; } gtpinCore->RegisterEventHandler(_eventHandler); return true; } static MyGTPinTool* Instance(); // Return single instance of this class static void OnFini(); // Termination handler registered with atexit() private: void SetProfileArray(GtKernelId kernelId, const GtProfileArray& profileArray); // Associate the specified profile array with the specified kernel ID GtProfileArray& GetProfileArray(GtKernelId kernelId); // Get profile array associated with the specified kernel ID private: uint64_t _totalCycleCount = 0; // Total number of cycles executed by GPU kernels uint64_t _totalExeCount = 0; // Total number of kernel executions in HW threads MyEventHandler _eventHandler; // Event handler };
Usually, the tool registers a single implementation of the IGtTool interface with GTPin. However, if needed, a tool may call IGtCore::RegisterTool function more than once to register multiple implementations of GTPin callback handlers.
Also, the IGtCore interface allows tools to unregister all or specific callbacks using the IGtCore::UnregisterTool function.
Additionally, the tool can register its own IGtEventHandler implementation to monitor and handle events reported by GTPin.
For each new binary kernel created by the compiler, GTPin invokes the IGtTool::OnKernelBuild function in the profiling tool. With this function, GTPin passes the IGtKernelInstrument object that refers to the kernel being compiled and provides the instrumentation interface for the tool.
The instrumentation interface is divided into classes that help tools to assemble analysis procedures and insert them into the original code of the kernel:
Once the analysis procedure is composed, it can be inserted into the original binary using one of the IGtKernelInstrument methods:
The following code snippet demonstrates the kernel instrumentation function in a tool that measures a) accumulated kernel cycles, b) the number of kernel dispatches to HW threads:
// Layout of records collected in the profile buffer by MyGTPinTool struct MyRecord { uint32_t cycleCount; // Number of cycles uint32_t exeCount; // Number of kernel executions in HW threads }; void MyGTPinTool::OnKernelBuild(IGtKernelInstrument& instrumentor) { const IGtKernel& kernel = instrumentor.Kernel(); // The kernel being instrumented const IGtGenCoder& coder = instrumentor.Coder(); // GEN code generator IGtProfileBufferAllocator& allocator = instrumentor.ProfileBufferAllocator(); // Profile buffer allocator IGtVregFactory& vregs = coder.VregFactory(); // Factory of virtual registers IGtInsFactory& insF = coder.InstructionFactory(); // GEN instruction factory // Create virtual registers GtReg timeReg = vregs.Make(VREG_TYPE_DWORD); GtReg addrReg = vregs.MakeMsgAddrScratch(); GtReg dataReg = vregs.MakeMsgDataScratch(); // There will be one MyRecord per each HW thread in the profile buffer. Initialize and store the corresponding profile array. GtProfileArray profileArray(sizeof(MyRecord), 1, kernel.GenModel().MaxThreadBuckets()); profileArray.Allocate(allocator); SetProfileArray(kernel.Id(), profileArray); // Generate the 'preCode' procedure that starts timer at kernel’s entry GtGenProcedure preCode; coder.StartTimer(preCode, timeReg); // timeReg = tm0.0 // Generate the 'postCode' procedure that stops timer at kernel's exits GtGenProcedure postCode; // buffer[TID].cycleCount += (tm0.0 – timeReg) coder.StopTimerExt(postCode, timeReg); // timeReg = (tm0.0 – timeReg) postCode += insF.MakeMov(dataReg, timeReg); profileArray.ComputeAddress(coder, postCode, addrReg, 0, offsetof(MyRecord, cycleCount)); postCode += insF.MakeAtomicAdd(NullReg(), addrReg, dataReg, GED_DATA_TYPE_ud); // buffer[TID].exeCount += 1 profileArray.ComputeAddress(coder, postCode, addrReg, 0, offsetof(MyRecord, exeCount)); postCode += insF.MakeAtomicInc(NullReg(), addrReg, GED_DATA_TYPE_ud); // Insert preCode at kernel entries instrumentor.InstrumentEntries(preCode); // Insert postCode at kernel exits instrumentor.InstrumentExits(postCode); }
Оnce a kernel is compiled (and instrumented), the application can dispatch it for execution in a graphics device.
The same kernel can be executed multiple times, and before each dispatch, GTPin calls the IGtTool::OnKernelRun function in the profiling tool. With this function, GTPin passes the IGtKernelDispatch object that refers to the IGtKernel instance that represents the kernel being dispatched, and provides access to the profile buffer associated with this kernel’s instance.
While handling the IGtTool::OnKernelRun callback, the tool is expected to indicate whether the instrumented or original version of the kernel code should be executed. If the tool chooses the instrumented version, it must initialize the data in the profile buffer before returning control to GTPin.
The following code snippet demonstrates an IGtTool::OnKernelRun implementation that submits the original kernel version in the first numOrigDispatches dispatches, and selects the instrumented kernel version for all other dispatches:
Knob<int> numOrigDispatches("num_orig_dispatches", 0, "Number of original kernel dispatches"); void MyGTPinTool::OnKernelRun(IGtKernelDispatch& dispatcher) { // The number of kernel dispatches so far static int dispatchCount = 0; // Skip first numOrigDispatches dispatches if (++dispatchCount >= numOrigDispatches) { // Create and initialize memory buffer for this kernel dispatch IGtProfileBuffer* buffer = dispatcher.CreateProfileBuffer(); GtProfileArray& profileArray = GetProfileArray(dispatcher.Kernel().Id()); if (profileArray.Initialize(*buffer)) { // Tell GTPin to run instrumented code dispatcher.SetProfilingMode(true); } } }
When execution of the dispatched kernel instance finishes, GTPin calls the IGtTool::OnKernelComplete function in the profiling tool. The function gets the same IGtKernelDispatch object as was passed to the corresponding IGtTool::OnKernelRun function.
Usually, the implementation of this function reads the data collected in the profile buffer and stores it in the host memory pending the post-processing phase which is performed just before the application exits.
The following code snippet demonstrates an IGtTool::OnKernelComplete implementation that accumulates counters collected from kernel dispatches:
void MyGTPinTool::OnKernelComplete(IGtKernelDispatch& dispatcher) { const IGtProfileBuffer& buffer = *dispatcher.GetProfileBuffer(); // Read all records stored in the profile buffer, and accumulate their data in _totalCycleCount and _totalExeCount GtProfileArray& profileArray = GetProfileArray(dispatcher.Kernel().Id()); for (uint32_t tBucket = 0; tBucket < profileArray.NumThreadBuckets(); ++tBucket) { MyRecord record; profileArray.Read(buffer, &record, 0, 1, tBucket); _totalCycleCount += record.cycleCount; _totalExeCount += record.exeCount; } }
This is a completely optional phase. If the tool processes (dumps) profiling results at the Kernel Completion phase, the post-processing phase can be omitted.
Alternatively, the tool developer may decide to handle all profiles collected by all kernel dispatches at the very end, just before the application exits. Usually, such a tool registers its termination handler within the GTPin_Entry() function as shown in the Initialization section.
The purpose of the termination handler is for post-processing of the profile data, and this is a completely tool-specific task. Note, however, the only availbale GTPin interface in the termination handler is IGtCore.
Below is an example of the termination handler that dumps the profiling results:
void MyGTPinTool::OnFini() { MyGTPinTool& me = *Instance(); std::cout << std::endl << "MyGTPinTool resuts:" << std::endl; std::cout << "totalCycleCount = " << me._totalCycleCount << ", totalExeCount = " << me._totalExeCount << std::endl; }
HLIF is an integral part of the GTPin API. It extends capabilities of the low-level interface by enabling complex and portable instrumentation procedures, written in a high-level language, like OpenCL C.
Historically, GTPin API was designed to support low-intrusive performance instruments via low-level, close to assembly API, called LLIF. This interface fits well into tools like measuring latency of code fragments, functions and kernels.
However, using this API to create more complex and portable analysis, is challenging. Beside of the natural limitations of the inlined instrumentation procedure (single basic block), LLIF is not easy to use. LLIF tool writers are assumed to have some knowledge about GEN ISA, as well as limitations of the GEN architecture.
Eventually, HLIF was added to GTPin API in order to provide a better solution for tools that are ready to trade off instrumentation overhead for an easier and portable interface.
HLIF overcomes the LLIF limitations by enabling high-level, portable language in analysis procedures. It does not assume GEN ISA knowledge, but may incur higher overhead, so it could be unsuitable for low-intrusive performance tools.
HLIF follows the same GTPin's communication protocol as LLIF. It is available in the same IGtTool::OnKernelBuild, IGtTool::OnKernelRun, IGtTool::OnKernelComplete callbacks as described in the following sections.
Below is a simple, fully functional example of a tool that instruments GPU workloads using HLIF API.
This example demonstrates some important HLIF features provided by the GtHliFunction, IGtHliLibrary, IGtMemoryMapper classes, as well as the extended IGtKernelInstrument interface.
The tool is naturally divided into two parts:
File my_hlif_tool.cpp. The C++ code of the tool running on CPU
class MyHlifTool : public IGtTool { GtHliFunction<uint64_t, uint64_t*> _atomicIncFunc; // Instrumentation function: uint64_t AtomicInc(uint64_t* ptr); IGtHliModuleHandle _hliModule; // Handle to the module that contains AtomicInc function map<GtKernelId, vector<uint64_t>> _bblCounters; // BBL counters per kernel public: // Constructor MyHlifTool() : _atomicIncFunc("AtomicInc") { // Register this tool with the GTPin core GTPin_GetCore()->RegisterTool(*this); // Compile and load module that contains AtomicInc function _hliModule = GTPin_GetCore()->HliLibrary().CompileModuleFromFile("hli_samples.cl"); } // Callback that instruments original code of the specified kernel void OnKernelBuild(IGtKernelInstrument& instrumentor) { // Zero-initialize BBL counters for this kernel auto& counters = _bblCounters.emplace(instrumentor.Kernel().Id(), instrumentor.Cfg().NumBbls()).first->second; // Share BBL counters between host and device memory instrumentor.MemoryMapper().Map(counters); // Link the kernel with the module that contains AtomicInc function instrumentor.LinkHliModule(_hliModule); // Insert call to AtomicInc() at each BBL entry for (auto bblPtr : instrumentor.Cfg().Bbls()) { _atomicIncFunc.InsertCallAtBbl(instrumentor, *bblPtr, GtIpoint::Before(), NullReg(), // Unused return value counters.data() + bblPtr->Id()); // arg[0]: Counter to be incremented } } // Callback that initializes profile data (BBL counters) before dispatching the specified kernel for execution void OnKernelRun(IGtKernelDispatch& dispatcher) { auto& counters = _bblCounters.at(dispatcher.Kernel().Id()); counters.assign(counters.size(), 0); dispatcher.SetProfilingMode(true); } // Callback that processes (prints out) profile data generated by the specified kernel dispatch void OnKernelComplete(IGtKernelDispatch& dispatcher) { cout << endl << string(80, '-') << endl << "Kernel: " << dispatcher.Kernel().Name() << "." << dispatcher.DispatchId() << endl << string(80, '-') << endl; auto& counters = _bblCounters.at(dispatcher.Kernel().Id()); for (uint32_t bblId = 0; bblId != counters.size(); ++bblId) { cout << "BBL: " << bblId << " Counter: " << counters[bblId] << endl; } } }; EXPORT_C_FUNC void GTPin_Entry(int argc, const char *argv[]) { static MyHlifTool myTool; }
File hli_samples.cl. The OpenCL code of the tool running on GPU
// All HLI functions must comply the IGC stack-call ABI #define IGC_STACK_CALL __attribute((annotate("igc-force-stackcall"))) // Atomically increment value of *ptr. Return old value IGC_STACK_CALL ulong AtomicInc(volatile __global ulong *ptr) { return atom_inc(ptr); }
### HLIF Module Loading and Linking
All modules that contain instrumentation functions must be registered with the HLIF library using IGtHliLibrary interface. The HLIF module is a collection of OpenCL C functions compiled to the [SPIR-V representation](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html).
HLIF Modules can be compiled by an offline compiler, or by calling the corresponding IGtHliLibrary methods.
There are several available offline OpenCL™ compilers that generate SPIR-V representation, e.g.
If HLIF module is compiled offline, it should be registered with the HLIF library by calling either IGtHliLibrary::LoadModule or IGtHliLibrary::AddModule APIs:
IGtHliLibrary& hliLib = GTPin_GetCore()->HliLibrary(); // Load SPIR-V module from the file... IGtHliModuleHandle module = hliLib.LoadModule("my_hli_module.spv"); // ... or add memory-resident content of the SPIR-V module uint8_t myHliModuleSpv[] = ...; // Content of the HLIF module in the SPIRV format IGtHliModuleHandle module = hliLib.AddModule(myHliModuleSpv);
Alternatively, the compilation can be performed at runtime, during the tool initialization phase (Initialization).
This method is even more convenient because IGtHliLibrary::CompileModule and IGtHliLibrary::CompileModuleFromFile functions not only compile the OpenCL source of HLIF modules, but also load them into HLIF library:
IGtHliLibrary& hliLib = GTPin_GetCore()->HliLibrary(); // Compile and register HLIF module specified by the OpenCL source file... IGtHliModuleHandle module = hliLib.CompileModuleFromFile("my_hli_module.cl"); // ... or compile HLIF module from the memory-resident OpenCL source char myHliModuleSource[] = ...; // Source code of the HLI module IGtHliModuleHandle module = hliLib.CompileModule(myHliModuleSource);
Once a module is registered with the HLIF library, it can be 'linked' to original kernels, and its functions can be used for instrumenting these kernels.
While HLIF module registration is a one-time operation, usually completed during the tool initialization, linking HLIF module to the kernel should be performed in the IGtTool::OnKernelBuild callback associated with the kernel:
void OnKernelBuild(IGtKernelInstrument& instrumentor) { IGtHliLibrary& hliLib = GTPin_GetCore()->HliLibrary(); for (uint32_t i = 0; i != hliLib.NumModules(); ++i) { instrumentor.LinkHliModule(hliLib.GetModule(i)); } }
### HLIF Instrumentation Function and Arguments
Insertion of the instrumentation function call into original code of the kernel involves three steps, detailed below. As it will be shown later in this paragraph, all these steps can be performed by a single GtHliFunction API
Example of the HLIF instrumentation call, not using the GtHliFunction template
MyType myVar; // Variable shared between CPU and GPU (see @ref GTPIN_MEMORY_MAPPER) // Insert a call to the HLIF function: // void MyFunc(MyType* myData, uint32_t threadId, uint8_t* grfRange); void OnKernelBuild(IGtKernelInstrument& instrumentor) { // Create a list of the MyFunc() arguments IGtIargFactory& factory = instrumentor.IargFactory(); const IGtIarg* args[] = { &factory.MakeHostPtr(&myVar), // arg0: Pointer to 'myVar', mapped to the GPU device memory &factory.MakeTid(), // arg1: Current thread ID &factory.MakeGrfRange(0, 10) // arg2: Pointer to array of [r0, r9] register values }; // Generate a procedure that assigns arguments to MyFunc parameters and calls the function GtGenProcedure proc; instrumentor.Coder().GenerateHliCall(proc, "MyFunc", nullptr, args); // Insert call to MyFunc() at each BBL entry for (auto bblPtr : instrumentor.Cfg().Bbls()) { instrumentor.InstrumentBbl(*bblPtr, GT_IPOINT_BEFORE, proc); } }
A more compact and convenient interface for generating HLIF function calls provides the GtHliFunction template class.
The template arguments of this class resemble signature of the function, as declared in the source code of the corresponding function, e.g., in the OpenCL C function declaration.
Example of the HLIF instrumentation call using the GtHliFunction template
MyType myVar; // Variable shared between CPU and GPU (see @ref GTPIN_MEMORY_MAPPER) // Descriptor of the function: void MyFunc(MyType* myData, uint32_t threadId, uint8_t* grfRange); GtHliFunction<void, MyType*, uint32_t, uint8_t> myFunc { "MyFunc" }; void OnKernelBuild(IGtKernelInstrument& instrumentor) { // Insert call to MyFunc() at each BBL entry for (auto bblPtr : instrumentor.Cfg().Bbls()) { myFunc.InsertCallAtBbl(instrumentor, *bblPtr, GT_IPOINT_BEFORE, NullReg(), // retVal: void/unused return value &myVar, // arg0: Pointer to 'myVar', mapped to the GPU device memory IargTid(), // arg1: Current thread ID IargGrfRange(0, 10) // arg2: Pointer to array of [r0, r9] register values ); } }
The GtHliFunction interface defines various methods that insert a call to the HLI function at the specified location, e.g. GtHliFunction::InsertCallAtInstruction, GtHliFunction::InsertCallAtBbl, etc.
These methods accept a variadic list of the function arguments assigned to the function parameters after the corresponding conversion implemented by GTPin.
Below are currently defined types of HLI function arguments. This is an open-ended list which is supposed to grow along with the growing base of HLIF tools.
### HLIF Memory Mapping
The host-to-device memory mapping is a GTPin service that allows tools to access shared data through the host pointers.
Depending on the sharing method, GTPin synchronizes content of mapped memory regions before kernel run, and/or after the kernel completion.
This mechanism can be used in both LLIF and HLIF tools, though it is especially useful in mapping arguments of HLIF functions.
These functions are executed by a GPU device, so all pointer parameters of the function should refer to objects in the profile buffer. However, for tools that define instrumentation functions in the host (CPU) space, addresses in the device space are unknown.
The host-to-device memory mapping interface IGtMemoryMapper and helper class GtMapped allow tools to specify host pointers in function arguments, and instruct GTPin to copy the referenced objects to/from the profile buffer in the device address space.
For detailed description of the memory mapping interface, see GTPin: Host-to-Device Memory Mapper.
Example of an HLIF function's argument specified by the host pointer
uint64_t counter; // A counter accessed by CPU and GPU void OnKernelBuild(IGtKernelInstrument& instrumentor) { // Share the counter between host and device memory instrumentor.MemoryMapper().Map(counter, GT_MMAP_SHARE); // Use counter as an HLIF function argument _atomicIncFunc.InsertCallAtKernelEntries(instrumentor, NullReg(), &counter); }
The IGtMemoryMapper::Map interface supports the following methods of memory mapping and sharing:
The IGtMemoryMapper::Map interface can be used to map raw byte ranges, as well as typed objects and arrays of objects.
Additionally, the tool can use variadic templates GtMakeShared, GtMakeSharedConst and GtMakeSharedRet to share collections of objects:
auto mySharedVars = GtMakeShared(myVar1, myVar2); // Declare myVar... variables as host<->device synchronized auto mySharedConstVars = GtMakeSharedConst(myCvar1, myCvar2); // Declare myCvar... variables as host->device synchronized auto mySharedRetVars = GtMakeSharedRet(myRvar1, myRvar2); // Declare myRvar... variables as host<-device synchronized OnKernelBuild(IGtKernelInstrument& instrumentor) { auto& mapper = instrumentor.MemoryMapper(); mySharedVars.Map(mapper); // Share myVar... variables using GT_MMAP_SHARE method mySharedConstVars.Map(mapper); // Share myCvar... variables using GT_MMAP_SHARE_CONST method mySharedRetVars.Map(mapper); // Share myRvar... variables using GT_MMAP_SHARE_RET method }
The purpose of the event handling mechanism is to inform tools about errors, warnings, and other significant events detected by GTPin.
There are various types of events, and each event type is accompanied by its corresponding details. The IGtEvent is the abstract representation of the GTPin event. Concrete implementations of IGtEvents can be found in the GTPin: Event Handling section.
Events can be monitored and handled using the IGtEventHandler, which defines the event handler interface. A tool may register its own handler using the IGtCore::RegisterEventHandler method, and it will be notified once an event occurs.
The event handler returns a EVENT_REACTION value, which instructs GTPin on the next steps, such as continuing, retrying, or terminating the GTPin profiling process.
In addition, even if a tool does not provide its own handler, the IGtCore::LastError() method allows the tool to analyze the last error detected in the current thread. This feature can be used to obtain error details, when a GTPin API indicates a failure.
Below is an example of an event handler:
class MyEventHandler : public IGtEventHandler { GtEventReaction HandleEvent(const IGtEvent& event) { std::cout << event.ToString() << std::endl; return EVENT_REACTION_DEFAULT; } GtEventReaction OnEvent(const IGtEvent& event) { return HandleEvent(event); } GtEventReaction OnEvent(const IGtKernelEvent& event) { return HandleEvent(event); } GtEventReaction OnEvent(const IGtInternalEvent& event) { return HandleEvent(event); } };
The complete code of the sample tool that was demonstrated in the above sections, can be found in the Profilers/Examples/my_gtpin_tool.cpp file
All the header files required to build a GTPin tool are located in the Profilers/Include/api directory.
The file gtpin_api.h is a wrapper that includes all the required headers.
Optionally, the tool may include headers located in the Profilers/Examples/utils directory. These headers define interfaces to the gtpintool_utils library whose primary purpose is providing utilities for the sample tools distributed within the GTPin kit.
Besides the GTpin configuration arguments described in GTPin Configuration, the tool developer can define additional arguments to customize the tool-specific functionality.
There are three types of configuration arguments (Knobs) a tool can define: int, bool, and string. The Knob definition includes the argument name (string value), its initial (default) value, and an optional help line (string value). The following example shows how to define arguments of different types:
Knob<int> myIntKnob("intArg", 10, "My int argument help line"); Knob<bool> myBoolKnob("boolArg", false, "My bool argument help line"); Knob<string> myStringKnob("stringArg", "assa", "My string argument help line");
Once a new Knob is defined, it is automatically added to the list of all configuration arguments; no additional action is required. In the tool's code, configuration arguments are used just like regular variables of the corresponding type:
std::cout << "intArg = " << myIntKnob << " boolArg = " << myBoolKnob << " stringArg = " << myStringKnob;
The tool arguments are specified in the command line the same way as the GTPin core arguments:
Profilers/Bin/gtpin -t toolname [GTPin arguments] [tool arguments] -- app [application arguments]
For example, the command:
Profilers/Bin/gtpin -t toolname --intArg 5 --boolArg --stringArg "bassa" -- app
produces the following output:
intArg = 5 boolArg = true stringArg = bassa
Another command, that runs the same tool with the default configuration:
Profilers/Bin/gtpin -t toolname -- app
produces a different output:
intArg = 10 boolArg = false stringArg = assa
Profilers/Examples/utils/knob_parser.h interface to parse the command line, it should define its Knob arguments before calling ConfigureGTPin().bool argument in the command line will inverse its default value GTPin sample tools are provided in the Profilers/Examples folder of the package. Users, that implement their own profiling tools, can take an existing sample tool as a basis, and extend or modify its code. To build the modified sample, follow these steps:
Step 1: Add new tool - optional:
Within the 'CMakeLists.txt' file, add your tool's name to the "Examples" lists:
> set ( EXAMPLES funtime ... <YOUR_TOOL_NAME>)
Add your tool to the "add_library" section:
> add_library( <TOOL_NAME> SHARED <TOOL_NAME>.cpp )
Step 2: Generate the make files:
>cd Profilers/Examples
>mkdir build
>cd build
>cmake .. -G <cmakeGenerator> -DARCH=<ia32|intel64> -DGTPIN_KIT=<gtpinKitPath> # On Windows
# or
>cmake .. -DCMAKE_BUILD_TYPE=<Release|Debug> -DARCH=<ia32|intel64> -DGTPIN_KIT=<gtpinKitPath> # On Linux
where
Step 3: Build the tool using make files generated in the previous step:
>cmake --build . --config <Release|Debug> --target install # On Windows # or >make install # On Linux
The following command runs application under GTPin and profiles GPU kernels with the tool toolname:
Profilers/Bin/gtpin -t toolname [GTPin arguments] -- app [application arguments]
where
gtpin --help command Copyright (C) 2013-2025 Intel Corporation
SPDX-License-Identifier: MIT
1.7.4