==> Building on aurorus ==> Checking for remote environment... ==> Syncing package to remote host... sending incremental file list created directory packages/composable-kernel ./ .SRCINFO 670 100% 0.00kB/s 0:00:00 670 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=3/5) .nvchecker.toml 128 100% 125.00kB/s 0:00:00 128 100% 125.00kB/s 0:00:00 (xfr#2, to-chk=2/5) PKGBUILD 1,567 100% 1.49MB/s 0:00:00 1,567 100% 1.49MB/s 0:00:00 (xfr#3, to-chk=1/5) composable-kernel-6.4.0-1.log 609 100% 594.73kB/s 0:00:00 609 100% 594.73kB/s 0:00:00 (xfr#4, to-chk=0/5) sent 1,953 bytes received 144 bytes 4,194.00 bytes/sec total size is 2,559 speedup is 1.22 ==> Patching arch to riscv64... ==> Running pkgctl build --arch riscv64 --repo extra on remote host... ==> WARNING: unsupported architecture: riscv64 ==> Building composable-kernel  -> repo: extra  -> arch: riscv64  -> worker: felix-0 ==> Building composable-kernel for [extra] (riscv64) ]2;🔵 Container arch-nspawn-34372 on aurorus.felixc.at\[?25l:: Synchronizing package databases... core downloading... extra downloading... :: Starting full system upgrade... there is nothing to do [?25h==> Building in chroot for [extra] (riscv64)... ==> Synchronizing chroot copy [/var/lib/archbuild/extra-riscv64/root] -> [felix-0]...done ==> Making package: composable-kernel 6.4.0-1 (Tue May 20 14:23:16 2025) ==> Retrieving sources...  -> Downloading composable-kernel-6.4.0.tar.gz... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0 100 437k 0 437k 0 0 186k 0 --:--:-- 0:00:02 --:--:-- 578k 100 1684k 0 1684k 0 0 496k 0 --:--:-- 0:00:03 --:--:-- 938k 100 3332k 0 3332k 0 0 765k 0 --:--:-- 0:00:04 --:--:-- 1209k 100 4270k 0 4270k 0 0 865k 0 --:--:-- 0:00:04 --:--:-- 1279k ==> Validating source files with sha256sums... composable-kernel-6.4.0.tar.gz ... Passed ]2;🔵 Container arch-nspawn-36168 on aurorus.felixc.at\==> Making package: composable-kernel 6.4.0-1 (Tue May 20 14:23:50 2025) ==> Checking runtime dependencies... ==> Installing missing dependencies... [?25lresolving dependencies... looking for conflicting packages... warning: dependency cycle detected: warning: libglvnd will be installed before its mesa dependency Package (34) New Version Net Change extra/comgr 6.4.0-4 168.66 MiB extra/default-cursors 3-1 0.00 MiB extra/fmt 11.2.0-1 0.67 MiB extra/gflags 2.2.2-5 5.39 MiB extra/google-glog 0.7.1-1 0.34 MiB extra/hsa-rocr 6.4.0-1 3.53 MiB extra/libdrm 2.4.124-1 1.18 MiB core/libedit 20250104_3.1-1 0.25 MiB extra/libglvnd 1.7.0-1 3.72 MiB extra/libpciaccess 0.18.1-2 0.05 MiB extra/libx11 1.8.12-1 9.73 MiB extra/libxau 1.0.12-1 0.02 MiB extra/libxcb 1.17.0-1 3.69 MiB extra/libxdmcp 1.1.5-1 0.13 MiB extra/libxext 1.3.6-1 0.29 MiB extra/libxshmfence 1.3.3-1 0.01 MiB extra/libxxf86vm 1.1.6-1 0.03 MiB extra/llvm-libs 19.1.7-2 126.12 MiB extra/lm_sensors 1:3.6.2-1 0.43 MiB extra/mesa 1:25.0.5-1.1 75.02 MiB core/mpdecimal 4.0.1-1 0.31 MiB extra/numactl 2.0.19-1 0.20 MiB core/pciutils 3.13.0-2 0.34 MiB core/python 3.13.3-1 108.92 MiB extra/rocm-device-libs 6.4.0-4 3.19 MiB extra/rocm-llvm 6.4.0-4 7913.67 MiB extra/rocminfo 6.4.0-1 0.06 MiB extra/rocprofiler-register 6.4.0-1 0.28 MiB extra/spirv-tools 1:1.4.313.0-1 6.44 MiB extra/wayland 1.23.1-2 0.79 MiB extra/xcb-proto 1.17.0-3 1.02 MiB extra/xorgproto 2024.1-2 1.46 MiB extra/hip-runtime-amd 6.4.0-1 8.92 MiB extra/rocm-core 6.4.0-1 0.04 MiB Total Installed Size: 8444.92 MiB :: Proceed with installation? [Y/n] checking keyring... checking package integrity... loading package files... checking for file conflicts... :: Processing package changes... installing rocm-core... installing numactl... installing libpciaccess... installing libdrm... Optional dependencies for libdrm cairo: needed for modetest tool installing xcb-proto... installing xorgproto... installing libxdmcp... installing libxau... installing libxcb... installing libx11... installing libxext... installing libglvnd... installing libxshmfence... installing libxxf86vm... installing libedit... installing llvm-libs... installing lm_sensors... Optional dependencies for lm_sensors rrdtool: for logging with sensord perl: for sensor detection and configuration convert [installed] installing spirv-tools... installing default-cursors... Optional dependencies for default-cursors adwaita-cursors: default cursor theme installing wayland... installing mesa... Optional dependencies for mesa opengl-man-pages: for the OpenGL API man pages installing rocm-llvm... installing rocm-device-libs... installing comgr... installing pciutils... Optional dependencies for pciutils which: for update-pciids [installed] grep: for update-pciids [installed] curl: for update-pciids [installed] installing mpdecimal... installing python... Optional dependencies for python python-setuptools: for building Python packages using tooling that is usually bundled with Python python-pip: for installing Python packages using tooling that is usually bundled with Python python-pipx: for installing Python software not packaged on Arch Linux sqlite: for a default database integration [installed] xz: for lzma [installed] tk: for tkinter installing hsa-rocr... installing rocminfo... installing fmt... installing gflags... installing google-glog... installing rocprofiler-register... installing hip-runtime-amd... Optional dependencies for hip-runtime-amd inetutils: Print hostname in hipconfig :: Running post-transaction hooks... (1/2) Reloading system manager configuration... Skipped: Current root is not booted. (2/2) Arming ConditionNeedsUpdate... [?25h==> Checking buildtime dependencies... ==> Installing missing dependencies... [?25lresolving dependencies... looking for conflicting packages... Package (14) New Version Net Change extra/cppdap 1.58.0-2 1.48 MiB extra/hicolor-icon-theme 0.18-1 0.05 MiB extra/jsoncpp 1.9.6-3 3.16 MiB extra/libuv 1.51.0-1 0.60 MiB extra/perl-error 0.17030-1 0.04 MiB extra/perl-mailtools 2.22-1 0.10 MiB extra/perl-timedate 2.33-7 0.08 MiB extra/rhash 1.4.4-1 0.31 MiB extra/zlib-ng 2.2.4-1 0.21 MiB extra/cmake 4.0.2-1 71.25 MiB extra/git 2.49.0-2 27.48 MiB extra/ninja 1.12.1-2 0.31 MiB extra/openmp 19.1.7-1 1.83 MiB extra/rocm-cmake 6.4.0-2 0.12 MiB Total Installed Size: 107.03 MiB :: Proceed with installation? [Y/n] checking keyring... checking package integrity... loading package files... checking for file conflicts... :: Processing package changes... installing perl-error... installing perl-timedate... installing perl-mailtools... installing zlib-ng... installing git... Optional dependencies for git git-zsh-completion: upstream zsh completion tk: gitk and git gui openssh: ssh transport and crypto man: show help with `git command --help` perl-libwww: git svn perl-term-readkey: git svn and interactive.singlekey setting perl-io-socket-ssl: git send-email TLS support perl-authen-sasl: git send-email TLS support perl-mediawiki-api: git mediawiki support perl-datetime-format-iso8601: git mediawiki support perl-lwp-protocol-https: git mediawiki https support perl-cgi: gitweb (web interface) support python: git svn & git p4 [installed] subversion: git svn org.freedesktop.secrets: keyring credential helper libsecret: libsecret credential helper [installed] installing cppdap... installing hicolor-icon-theme... installing jsoncpp... Optional dependencies for jsoncpp jsoncpp-doc: documentation installing libuv... installing rhash... installing cmake... Optional dependencies for cmake make: for unix Makefile generator [installed] ninja: for ninja generator [pending] qt6-base: cmake-gui installing ninja... installing rocm-cmake... installing openmp... Optional dependencies for openmp cuda: offloading to NVIDIA GPUs hsa-rocr: offloading to AMD GPUs [installed] :: Running post-transaction hooks... (1/4) Creating system user accounts... Creating group 'git' with GID 971. Creating user 'git' (git daemon user) with UID 971 and GID 971. (2/4) Reloading system manager configuration... Skipped: Current root is not booted. (3/4) Arming ConditionNeedsUpdate... (4/4) Checking for old perl modules... [?25h==> Retrieving sources...  -> Found composable-kernel-6.4.0.tar.gz ==> WARNING: Skipping all source file integrity checks. ==> Extracting sources...  -> Extracting composable-kernel-6.4.0.tar.gz with bsdtar ==> Starting prepare()... ==> Starting build()... -- The CXX compiler identification is Clang 19.0.0 -- The HIP compiler identification is Clang 19.0.0 -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /opt/rocm/bin/hipcc - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Detecting HIP compiler ABI info -- Detecting HIP compiler ABI info - done -- Check for working HIP compiler: /opt/rocm/lib/llvm/bin/clang++ - skipped -- Detecting HIP compile features -- Detecting HIP compile features - done -- Found Python3: /usr/bin/python3.13 (found suitable version "3.13.3", minimum required is "3.8") found components: Interpreter -- Found Git: /usr/bin/git (found version "2.49.0") fatal: not a git repository (or any of the parent directories): .git GPU_TARGETS= GPU_ARCHS= -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS -- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Success hip_version_flat=600443482 checking which targets are supported -- Performing Test COMPILER_HAS_TARGET_ID_gfx908 -- Performing Test COMPILER_HAS_TARGET_ID_gfx908 - Success -- Performing Test COMPILER_HAS_TARGET_ID_gfx90a -- Performing Test COMPILER_HAS_TARGET_ID_gfx90a - Success -- Performing Test COMPILER_HAS_TARGET_ID_gfx942 -- Performing Test COMPILER_HAS_TARGET_ID_gfx942 - Success -- Performing Test COMPILER_HAS_TARGET_ID_gfx1030 -- Performing Test COMPILER_HAS_TARGET_ID_gfx1030 - Success -- Performing Test COMPILER_HAS_TARGET_ID_gfx1100 -- Performing Test COMPILER_HAS_TARGET_ID_gfx1100 - Success -- Performing Test COMPILER_HAS_TARGET_ID_gfx1101 -- Performing Test COMPILER_HAS_TARGET_ID_gfx1101 - Success -- Performing Test COMPILER_HAS_TARGET_ID_gfx1102 -- Performing Test COMPILER_HAS_TARGET_ID_gfx1102 - Success -- Performing Test COMPILER_HAS_TARGET_ID_gfx1200 -- Performing Test COMPILER_HAS_TARGET_ID_gfx1200 - Success -- Performing Test COMPILER_HAS_TARGET_ID_gfx1201 -- Performing Test COMPILER_HAS_TARGET_ID_gfx1201 - Success Building CK for the following targets: gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201 Enabling XDL instances Enabling FP8 gemms on native architectures Enabling WMMA instances -- Performing Test HAS_NO_OFFLOAD_UNIFORM_BLOCK -- Performing Test HAS_NO_OFFLOAD_UNIFORM_BLOCK - Success Adding the fno-offload-uniform-block compiler flag -- Performing Test HAS_LSR_DROP_SOLUTION -- Performing Test HAS_LSR_DROP_SOLUTION - Success Adding the lsr-drop-solution=1 compiler flag -- Performing Test HAS_ENABLE_POST_MISCHED -- Performing Test HAS_ENABLE_POST_MISCHED - Success Adding the enable-post-misched=0 compiler flag -- Performing Test check-coerce -- Performing Test check-coerce - Success Adding the amdgpu-coerce-illegal-types=1 Adding -amdgpu-early-inline-all=true and -amdgpu-function-calls=false CMAKE_CXX_COMPILER: /opt/rocm/bin/hipcc CMAKE_HIP_COMPILER: /opt/rocm/bin/hipcc OpenMP_CXX_LIB_NAMES: libomp;libgomp;libiomp5 OpenMP_gomp_LIBRARY: OpenMP_pthread_LIBRARY: OpenMP_CXX_FLAGS: -fopenmp=libomp -Wno-unused-command-line-argument -- Build with HIP -- Clang tidy found: 19.0.0git -- Clang tidy checks: *,-abseil-*,-android-cloexec-fopen,-cert-msc30-c,-bugprone-exception-escape,-bugprone-macro-parentheses,-cert-env33-c,-cert-msc32-c,-cert-msc50-cpp,-cert-msc51-cpp,-cert-dcl37-c,-cert-dcl51-cpp,-clang-analyzer-alpha.core.CastToStruct,-clang-analyzer-optin.performance.Padding,-clang-diagnostic-deprecated-declarations,-clang-diagnostic-extern-c-compat,-clang-diagnostic-unused-command-line-argument,-cppcoreguidelines-avoid-c-arrays,-cppcoreguidelines-avoid-magic-numbers,-cppcoreguidelines-explicit-virtual-functions,-cppcoreguidelines-init-variables,-cppcoreguidelines-macro-usage,-cppcoreguidelines-non-private-member-variables-in-classes,-cppcoreguidelines-pro-bounds-array-to-pointer-decay,-cppcoreguidelines-pro-bounds-constant-array-index,-cppcoreguidelines-pro-bounds-pointer-arithmetic,-cppcoreguidelines-pro-type-member-init,-cppcoreguidelines-pro-type-reinterpret-cast,-cppcoreguidelines-pro-type-union-access,-cppcoreguidelines-pro-type-vararg,-cppcoreguidelines-special-member-functions,-fuchsia-*,-google-explicit-constructor,-google-readability-braces-around-statements,-google-readability-todo,-google-runtime-int,-google-runtime-references,-hicpp-vararg,-hicpp-braces-around-statements,-hicpp-explicit-conversions,-hicpp-named-parameter,-hicpp-no-array-decay,-hicpp-avoid-c-arrays,-hicpp-signed-bitwise,-hicpp-special-member-functions,-hicpp-uppercase-literal-suffix,-hicpp-use-auto,-hicpp-use-equals-default,-hicpp-use-override,-llvm-header-guard,-llvm-include-order,-llvmlibc-restrict-system-libc-headers,-llvmlibc-callee-namespace,-llvmlibc-implementation-in-namespace,-llvm-else-after-return,-llvm-qualified-auto,-misc-misplaced-const,-misc-non-private-member-variables-in-classes,-misc-no-recursion,-modernize-avoid-bind,-modernize-avoid-c-arrays,-modernize-pass-by-value,-modernize-use-auto,-modernize-use-default-member-init,-modernize-use-equals-default,-modernize-use-trailing-return-type,-modernize-use-transparent-functors,-performance-unnecessary-value-param,-readability-braces-around-statements,-readability-else-after-return,-readability-function-cognitive-complexity,-readability-isolate-declaration,-readability-magic-numbers,-readability-named-parameter,-readability-uppercase-literal-suffix,-readability-convert-member-functions-to-static,-readability-qualified-auto,-readability-redundant-string-init,-bugprone-narrowing-conversions,-cppcoreguidelines-narrowing-conversions,-altera-struct-pack-align,-cppcoreguidelines-prefer-member-initializer CMAKE_CXX_FLAGS: adding instance device_avg_pool2d_bwd_instance add_instance_library device_avg_pool2d_bwd_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/avg_pool2d_bwd adding instance device_avg_pool3d_bwd_instance add_instance_library device_avg_pool3d_bwd_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/avg_pool3d_bwd adding instance device_batched_gemm_instance add_instance_library device_batched_gemm_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/batched_gemm adding instance device_batched_gemm_add_relu_gemm_add_instance add_instance_library device_batched_gemm_add_relu_gemm_add_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/batched_gemm_add_relu_gemm_add adding instance device_batched_gemm_bias_permute_instance add_instance_library device_batched_gemm_bias_permute_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/batched_gemm_bias_permute adding instance device_batched_gemm_gemm_instance add_instance_library device_batched_gemm_gemm_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/batched_gemm_gemm Found only dl instances, but DL_KERNELS is not set. Skipping. skip_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/batched_gemm_multi_d adding instance device_batched_gemm_reduce_instance add_instance_library device_batched_gemm_reduce_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/batched_gemm_reduce adding instance device_batched_gemm_softmax_gemm_instance add_instance_library device_batched_gemm_softmax_gemm_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/batched_gemm_softmax_gemm adding instance device_batched_gemm_softmax_gemm_permute_instance add_instance_library device_batched_gemm_softmax_gemm_permute_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/batched_gemm_softmax_gemm_permute adding instance device_batchnorm_instance add_instance_library device_batchnorm_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/batchnorm instance should be built for all types! adding instance device_column_to_image_instance add_instance_library device_column_to_image_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/column_to_image adding instance device_contraction_bilinear_instance add_instance_library device_contraction_bilinear_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/contraction_bilinear adding instance device_contraction_scale_instance add_instance_library device_contraction_scale_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/contraction_scale adding instance device_conv1d_bwd_data_instance add_instance_library device_conv1d_bwd_data_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/conv1d_bwd_data adding instance device_conv2d_bwd_data_instance removing dl instance device_conv2d_bwd_data_dl_nhwc_kyxc_nhwk_f32_instance.cpp removing dl instance device_conv2d_bwd_data_dl_nhwc_kyxc_nhwk_f16_instance.cpp removing dl instance device_conv2d_bwd_data_dl_nhwc_kyxc_nhwk_int8_instance.cpp add_instance_library device_conv2d_bwd_data_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/conv2d_bwd_data adding instance device_conv2d_fwd_instance add_instance_library device_conv2d_fwd_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/conv2d_fwd adding instance device_conv2d_fwd_bias_relu_instance add_instance_library device_conv2d_fwd_bias_relu_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/conv2d_fwd_bias_relu adding instance device_conv2d_fwd_bias_relu_add_instance add_instance_library device_conv2d_fwd_bias_relu_add_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/conv2d_fwd_bias_relu_add adding instance device_conv3d_bwd_data_instance add_instance_library device_conv3d_bwd_data_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/conv3d_bwd_data instance should be built for all types! adding instance device_elementwise_instance add_instance_library device_elementwise_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/elementwise adding instance device_elementwise_normalization_instance add_instance_library device_elementwise_normalization_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/elementwise_normalization adding instance device_gemm_instance removing dpp instance device_gemm_dpp_f16_f16_f16_km_kn_mn_instance.cpp removing dpp instance device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp removing dpp instance device_gemm_dpp_f16_f16_f16_mk_kn_mn_instance.cpp removing dpp instance device_gemm_dpp_f16_f16_f16_mk_nk_mn_instance.cpp removing dpp instance device_gemm_dpp_f16_f16_f16_km_kn_mn_irregular_instance.cpp removing dpp instance device_gemm_dpp_f16_f16_f16_km_nk_mn_irregular_instance.cpp removing dpp instance device_gemm_dpp_f16_f16_f16_mk_kn_mn_irregular_instance.cpp removing dpp instance device_gemm_dpp_f16_f16_f16_mk_nk_mn_irregular_instance.cpp removing dl instance device_gemm_dl_f32_f32_f32_mk_kn_mn_instance.cpp removing dl instance device_gemm_dl_f32_f32_f32_mk_nk_mn_instance.cpp removing dl instance device_gemm_dl_f32_f32_f32_km_kn_mn_instance.cpp removing dl instance device_gemm_dl_f32_f32_f32_km_nk_mn_instance.cpp removing dl instance device_gemm_dl_f16_f16_f16_mk_kn_mn_instance.cpp removing dl instance device_gemm_dl_f16_f16_f16_mk_kn_mn_irregular_instance.cpp removing dl instance device_gemm_dl_f16_f16_f16_mk_nk_mn_instance.cpp removing dl instance device_gemm_dl_f16_f16_f16_mk_nk_mn_irregular_instance.cpp removing dl instance device_gemm_dl_f16_f16_f16_km_kn_mn_instance.cpp removing dl instance device_gemm_dl_f16_f16_f16_km_kn_mn_irregular_instance.cpp removing dl instance device_gemm_dl_f16_f16_f16_km_nk_mn_instance.cpp removing dl instance device_gemm_dl_f16_f16_f16_km_nk_mn_irregular_instance.cpp removing dl instance device_gemm_dl_i8_i8_i8_mk_kn_mn_instance.cpp removing dl instance device_gemm_dl_i8_i8_i8_mk_kn_mn_irregular_instance.cpp removing dl instance device_gemm_dl_i8_i8_i8_mk_nk_mn_instance.cpp removing dl instance device_gemm_dl_i8_i8_i8_mk_nk_mn_irregular_instance.cpp removing dl instance device_gemm_dl_i8_i8_i8_km_kn_mn_instance.cpp removing dl instance device_gemm_dl_i8_i8_i8_km_kn_mn_irregular_instance.cpp removing dl instance device_gemm_dl_i8_i8_i8_km_nk_mn_instance.cpp removing dl instance device_gemm_dl_i8_i8_i8_km_nk_mn_irregular_instance.cpp add_instance_library device_gemm_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm adding instance device_gemm_ab_scale_instance add_instance_library device_gemm_ab_scale_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm_ab_scale adding instance device_gemm_add_instance add_instance_library device_gemm_add_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm_add adding instance device_gemm_add_add_fastgelu_instance add_instance_library device_gemm_add_add_fastgelu_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm_add_add_fastgelu adding instance device_gemm_add_fastgelu_instance add_instance_library device_gemm_add_fastgelu_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm_add_fastgelu adding instance device_gemm_add_multiply_instance add_instance_library device_gemm_add_multiply_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm_add_multiply adding instance device_gemm_add_relu_instance add_instance_library device_gemm_add_relu_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm_add_relu adding instance device_gemm_add_relu_add_layernorm_instance add_instance_library device_gemm_add_relu_add_layernorm_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm_add_relu_add_layernorm adding instance device_gemm_add_silu_instance add_instance_library device_gemm_add_silu_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm_add_silu adding instance device_gemm_b_scale_instance add_instance_library device_gemm_b_scale_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm_b_scale adding instance device_gemm_bias_add_reduce_instance add_instance_library device_gemm_bias_add_reduce_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm_bias_add_reduce adding instance device_gemm_bilinear_instance add_instance_library device_gemm_bilinear_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm_bilinear adding instance device_gemm_fastgelu_instance add_instance_library device_gemm_fastgelu_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm_fastgelu adding instance device_gemm_multi_abd_instance add_instance_library device_gemm_multi_abd_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm_multi_abd adding instance device_gemm_multiply_add_instance add_instance_library device_gemm_multiply_add_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm_multiply_add adding instance device_gemm_multiply_multiply_instance add_instance_library device_gemm_multiply_multiply_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm_multiply_multiply adding instance device_gemm_reduce_instance add_instance_library device_gemm_reduce_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm_reduce adding instance device_gemm_splitk_instance add_instance_library device_gemm_splitk_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm_splitk adding instance device_gemm_streamk_instance add_instance_library device_gemm_streamk_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm_streamk adding instance device_gemm_universal_instance add_instance_library device_gemm_universal_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm_universal adding instance device_gemm_universal_batched_instance add_instance_library device_gemm_universal_batched_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm_universal_batched adding instance device_gemm_universal_reduce_instance add_instance_library device_gemm_universal_reduce_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm_universal_reduce adding instance device_gemm_universal_streamk_instance add_instance_library device_gemm_universal_streamk_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm_universal_streamk adding instance device_grouped_conv1d_bwd_weight_instance add_instance_library device_grouped_conv1d_bwd_weight_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_conv1d_bwd_weight adding instance device_grouped_conv1d_fwd_instance add_instance_library device_grouped_conv1d_fwd_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd adding instance device_grouped_conv2d_bwd_data_instance add_instance_library device_grouped_conv2d_bwd_data_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data adding instance device_grouped_conv2d_bwd_weight_instance add_instance_library device_grouped_conv2d_bwd_weight_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_weight adding instance device_grouped_conv2d_fwd_instance removing dl instance dl/device_grouped_conv2d_fwd_dl_gnhwc_gkyxc_gnhwk_f16_instance.cpp removing dl instance dl/device_grouped_conv2d_fwd_dl_gnhwc_gkyxc_gnhwk_f32_instance.cpp removing dl instance dl/device_grouped_conv2d_fwd_dl_nhwgc_gkyxc_nhwgk_f16_instance.cpp removing dl instance dl/device_grouped_conv2d_fwd_dl_nhwgc_gkyxc_nhwgk_f32_instance.cpp add_instance_library device_grouped_conv2d_fwd_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd adding instance device_grouped_conv2d_fwd_dynamic_op_instance add_instance_library device_grouped_conv2d_fwd_dynamic_op_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd_dynamic_op adding instance device_grouped_conv3d_bwd_data_instance add_instance_library device_grouped_conv3d_bwd_data_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_data adding instance device_grouped_conv3d_bwd_data_bilinear_instance add_instance_library device_grouped_conv3d_bwd_data_bilinear_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_data_bilinear adding instance device_grouped_conv3d_bwd_data_scale_instance add_instance_library device_grouped_conv3d_bwd_data_scale_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_data_scale adding instance device_grouped_conv3d_bwd_weight_instance add_instance_library device_grouped_conv3d_bwd_weight_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_weight adding instance device_grouped_conv3d_bwd_weight_bilinear_instance add_instance_library device_grouped_conv3d_bwd_weight_bilinear_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_weight_bilinear adding instance device_grouped_conv3d_bwd_weight_scale_instance add_instance_library device_grouped_conv3d_bwd_weight_scale_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_weight_scale adding instance device_grouped_conv3d_fwd_instance add_instance_library device_grouped_conv3d_fwd_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd adding instance device_grouped_conv3d_fwd_bilinear_instance add_instance_library device_grouped_conv3d_fwd_bilinear_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd_bilinear adding instance device_grouped_conv3d_fwd_convinvscale_instance add_instance_library device_grouped_conv3d_fwd_convinvscale_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd_convinvscale adding instance device_grouped_conv3d_fwd_convscale_instance add_instance_library device_grouped_conv3d_fwd_convscale_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd_convscale adding instance device_grouped_conv3d_fwd_convscale_add_instance add_instance_library device_grouped_conv3d_fwd_convscale_add_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd_convscale_add adding instance device_grouped_conv3d_fwd_convscale_relu_instance add_instance_library device_grouped_conv3d_fwd_convscale_relu_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd_convscale_relu adding instance device_grouped_conv3d_fwd_dynamic_op_instance add_instance_library device_grouped_conv3d_fwd_dynamic_op_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd_dynamic_op adding instance device_grouped_conv3d_fwd_scale_instance add_instance_library device_grouped_conv3d_fwd_scale_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd_scale adding instance device_grouped_conv3d_fwd_scaleadd_ab_instance add_instance_library device_grouped_conv3d_fwd_scaleadd_ab_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd_scaleadd_ab adding instance device_grouped_conv3d_fwd_scaleadd_scaleadd_relu_instance add_instance_library device_grouped_conv3d_fwd_scaleadd_scaleadd_relu_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd_scaleadd_scaleadd_relu adding instance device_grouped_gemm_instance add_instance_library device_grouped_gemm_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_gemm adding instance device_grouped_gemm_bias_instance add_instance_library device_grouped_gemm_bias_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_gemm_bias adding instance device_grouped_gemm_fastgelu_instance add_instance_library device_grouped_gemm_fastgelu_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_gemm_fastgelu adding instance device_grouped_gemm_fixed_nk_instance add_instance_library device_grouped_gemm_fixed_nk_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_gemm_fixed_nk adding instance device_grouped_gemm_fixed_nk_multi_abd_instance add_instance_library device_grouped_gemm_fixed_nk_multi_abd_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_gemm_fixed_nk_multi_abd adding instance device_grouped_gemm_tile_loop_instance add_instance_library device_grouped_gemm_tile_loop_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/grouped_gemm_tile_loop instance should be built for all types! adding instance device_image_to_column_instance add_instance_library device_image_to_column_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/image_to_column adding instance device_max_pool_bwd_instance add_instance_library device_max_pool_bwd_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/max_pool_bwd instance should be built for all types! -- Found Python3: /usr/bin/python3.13 (found version "3.13.3") found components: Interpreter Development Development.Module Development.Embed adding instance device_mha_instance add_instance_library device_mha_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/mha adding instance device_normalization_bwd_data_instance add_instance_library device_normalization_bwd_data_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/normalization_bwd_data adding instance device_normalization_bwd_gamma_beta_instance add_instance_library device_normalization_bwd_gamma_beta_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/normalization_bwd_gamma_beta adding instance device_normalization_fwd_instance add_instance_library device_normalization_fwd_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/normalization_fwd adding instance device_permute_scale_instance add_instance_library device_permute_scale_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/permute_scale adding instance device_pool2d_fwd_instance add_instance_library device_pool2d_fwd_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/pool2d_fwd adding instance device_pool3d_fwd_instance add_instance_library device_pool3d_fwd_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/pool3d_fwd adding instance device_quantization_instance removing dl instance conv2d_fwd/device_conv2d_dl_perlayer_quantization_int8_instance.cpp removing dl instance conv2d_fwd/device_conv2d_dl_perchannel_quantization_int8_instance.cpp removing dl instance conv2d_fwd/device_conv2d_dl_bias_perlayer_quantization_int8_instance.cpp removing dl instance conv2d_fwd/device_conv2d_dl_bias_perchannel_quantization_int8_instance.cpp removing dl instance gemm/device_gemm_quantization_dl_c_shuffle_i8_i8_i8_km_kn_mn_instance.cpp removing dl instance gemm/device_gemm_quantization_dl_c_shuffle_i8_i8_i8_km_nk_mn_instance.cpp removing dl instance gemm/device_gemm_quantization_dl_c_shuffle_i8_i8_i8_mk_kn_mn_instance.cpp removing dl instance gemm/device_gemm_quantization_dl_c_shuffle_i8_i8_i8_mk_nk_mn_instance.cpp add_instance_library device_quantization_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/quantization adding instance device_reduce_instance add_instance_library device_reduce_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/reduce adding instance device_softmax_instance add_instance_library device_softmax_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/softmax instance should be built for all types! adding instance device_transpose_instance add_instance_library device_transpose_instance add_instance_directory /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/transpose Adding --offload-compress flag for ckProfiler -- Configuring done (582.7s) -- Generating done (257.4s) CMake Warning: Manually-specified variables were not used by the project: INSTANCES_ONLY -- Build files have been written to: /build/composable-kernel/src/build [1/4327] Generating mha kernel (cpp) files now ... [2/4327] Building CXX object library/src/utility/CMakeFiles/utility.dir/device_memory.cpp.o [3/4327] Building CXX object library/src/utility/CMakeFiles/utility.dir/convolution_parameter.cpp.o [4/4327] Building CXX object library/src/utility/CMakeFiles/utility.dir/host_tensor.cpp.o [5/4327] Building CXX object library/src/tensor_operation_instance/gpu/avg_pool2d_bwd/CMakeFiles/device_avg_pool2d_bwd_instance.dir/device_avg_pool2d_bwd_nhwc_bf16_instance.cpp.o [6/4327] Building CXX object library/src/tensor_operation_instance/gpu/avg_pool3d_bwd/CMakeFiles/device_avg_pool3d_bwd_instance.dir/device_avg_pool3d_bwd_ndhwc_bf16_instance.cpp.o [7/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_bf16_bf16_bf16_gkm_gnk_gmn_instance.cpp.o [8/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_bf16_bf16_bf16_gmk_gnk_gmn_instance.cpp.o [9/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_knn_instance.cpp.o [10/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_f64_compute_f32_mnnn_instance.cpp.o [11/4327] Building CXX object library/src/tensor_operation_instance/gpu/avg_pool3d_bwd/CMakeFiles/device_avg_pool3d_bwd_instance.dir/device_avg_pool3d_bwd_ndhwc_f16_instance.cpp.o [12/4327] Building CXX object library/src/tensor_operation_instance/gpu/avg_pool2d_bwd/CMakeFiles/device_avg_pool2d_bwd_instance.dir/device_avg_pool2d_bwd_nhwc_f32_instance.cpp.o [13/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_compute_f32_knn_instance.cpp.o [14/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_f64_compute_f32_knnn_instance.cpp.o [15/4327] Building CXX object library/src/tensor_operation_instance/gpu/avg_pool3d_bwd/CMakeFiles/device_avg_pool3d_bwd_instance.dir/device_avg_pool3d_bwd_ndhwc_f32_instance.cpp.o [16/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_mkn_instance.cpp.o [17/4327] Building CXX object library/src/tensor_operation_instance/gpu/avg_pool2d_bwd/CMakeFiles/device_avg_pool2d_bwd_instance.dir/device_avg_pool2d_bwd_nhwc_int8_instance.cpp.o [18/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_compute_f16_kkn_instance.cpp.o [19/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_int8_int8_int8_gmk_gnk_gmn_instance.cpp.o [20/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_f32_compute_bf16_kknn_instance.cpp.o [21/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_gemm/CMakeFiles/device_batched_gemm_gemm_instance.dir/device_batched_gemm_gemm_xdl_cshuffle_f16_f16_f16_f16_gmk_gnk_gno_gmo_instance.cpp.o [22/4327] Building CXX object library/src/tensor_operation_instance/gpu/avg_pool2d_bwd/CMakeFiles/device_avg_pool2d_bwd_instance.dir/device_avg_pool2d_bwd_nhwc_f16_instance.cpp.o [23/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_f64_compute_f32_kknn_instance.cpp.o [24/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_bf16_bf16_bf16_gmk_gkn_gmn_instance.cpp.o [25/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_f64_f64_f64_f64_mknn_instance.cpp.o [26/4327] Building CXX object library/src/tensor_operation_instance/gpu/column_to_image/CMakeFiles/device_column_to_image_instance.dir/device_column_to_image_nwgc_1d_instance.cpp.o [27/4327] Building CXX object library/src/tensor_operation_instance/gpu/avg_pool2d_bwd/CMakeFiles/device_avg_pool2d_bwd_instance.dir/device_avg_pool2d_bwd_nhwc_f8_instance.cpp.o [28/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_compute_f16_mnn_instance.cpp.o [29/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_add_relu_gemm_add/CMakeFiles/device_batched_gemm_add_relu_gemm_add_instance.dir/device_batched_gemm_add_relu_gemm_add_xdl_cshuffle_f16_f16_f16_f16_gmk_gnk_gno_gmo_instance.cpp.o [30/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_mnn_instance.cpp.o [31/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f16_f16_f16_f16_compute_f32_kknn_instance.cpp.o [32/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_f16_f16_f16_gkm_gnk_gmn_instance.cpp.o [33/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_mkn_instance.cpp.o [34/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_f16_f16_f16_gmk_gnk_gmn_instance.cpp.o [35/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_f64_f64_f64_f64_mnnn_instance.cpp.o [36/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f16_f16_f16_f16_compute_f32_mnnn_instance.cpp.o [37/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_f32_f32_f32_f32_compute_f16_kknn_instance.cpp.o [38/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_f32_compute_f16_kknn_instance.cpp.o [39/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_f32_compute_f16_mknn_instance.cpp.o [40/4327] Building CXX object library/src/tensor_operation_instance/gpu/column_to_image/CMakeFiles/device_column_to_image_instance.dir/device_column_to_image_gnwc_1d_instance.cpp.o [41/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_reduce/CMakeFiles/device_batched_gemm_reduce_instance.dir/device_batched_gemm_reduce_xdl_cshuffle_f16_f16_f16_f32_f32_gkm_gkn_gmn_instance.cpp.o [42/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_f16_f16_f16_f16_compute_f32_kknn_instance.cpp.o [43/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_bf16_bf16_bf16_bf16_compute_f32_kknn_instance.cpp.o [44/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_f32_f32_f32_gkm_gkn_gmn_instance.cpp.o [45/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_f32_compute_f16_knnn_instance.cpp.o [46/4327] Building CXX object library/src/tensor_operation_instance/gpu/column_to_image/CMakeFiles/device_column_to_image_instance.dir/device_column_to_image_nhwgc_2d_instance.cpp.o [47/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_int8_int8_int8_gkm_gkn_gmn_instance.cpp.o [48/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_bf16_bf16_bf16_bf16_compute_f32_knnn_instance.cpp.o [49/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f16_f16_f16_f16_compute_f32_mknn_instance.cpp.o [50/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_f32_f32_f32_f32_compute_bf16_mnnn_instance.cpp.o [51/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_compute_f32_kkn_instance.cpp.o [52/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_f32_compute_bf16_mnnn_instance.cpp.o [53/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_mnn_instance.cpp.o [54/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_bf16_bf16_bf16_bf16_compute_f32_mknn_instance.cpp.o [55/4327] Building CXX object library/src/tensor_operation_instance/gpu/column_to_image/CMakeFiles/device_column_to_image_instance.dir/device_column_to_image_gndhwc_3d_instance.cpp.o [56/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_f32_f32_f32_gmk_gnk_gmn_instance.cpp.o [57/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_f32_f32_f32_f32_compute_bf16_mknn_instance.cpp.o [58/4327] Building CXX object library/src/tensor_operation_instance/gpu/column_to_image/CMakeFiles/device_column_to_image_instance.dir/device_column_to_image_ndhwgc_3d_instance.cpp.o [59/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_f16_f16_f16_gmk_gkn_gmn_instance.cpp.o [60/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_f64_mknn_instance.cpp.o [61/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_bf16_bf16_bf16_bf16_compute_f32_mknn_instance.cpp.o [62/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_compute_bf16_kkn_instance.cpp.o [63/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_f32_kknn_instance.cpp.o [64/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_f64_kknn_instance.cpp.o [65/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_bf16_bf16_bf16_gkm_gkn_gmn_instance.cpp.o [66/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_f32_mknn_instance.cpp.o [67/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_f64_compute_f32_mknn_instance.cpp.o [68/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_f32_knnn_instance.cpp.o [69/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_f64_mnnn_instance.cpp.o [70/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_f32_f32_f32_f32_compute_bf16_knnn_instance.cpp.o [71/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_knn_instance.cpp.o [72/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_f32_compute_bf16_mknn_instance.cpp.o [73/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_bias_permute/CMakeFiles/device_batched_gemm_bias_permute_instance.dir/device_batched_gemm_bias_permute_m2_n3_k1_xdl_c_shuffle_f16_f16_f16_f16_instance.cpp.o [74/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_bf16_bf16_bf16_compute_f32_kkn_instance.cpp.o [75/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_f32_mnnn_instance.cpp.o [76/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_f64_knnn_instance.cpp.o [77/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f16_f16_f16_f16_compute_f32_knnn_instance.cpp.o [78/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_f32_f32_f32_gmk_gkn_gmn_instance.cpp.o [79/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_f16_f16_f16_gkm_gkn_gmn_instance.cpp.o [80/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_f64_f64_f64_f64_kknn_instance.cpp.o [81/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_compute_bf16_knn_instance.cpp.o [82/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_f32_f32_f32_kkn_instance.cpp.o [83/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_gemm/CMakeFiles/device_batched_gemm_gemm_instance.dir/device_batched_gemm_gemm_xdl_cshuffle_f16_f16_f16_f16_gmk_gnk_gon_gmo_instance.cpp.o [84/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_kkn_instance.cpp.o [85/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_f64_f64_f64_knn_instance.cpp.o [86/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_lds_direct_load_f32_f32_f32_km_kn_mn_instance.cpp.o [87/4327] Building CXX object library/src/tensor_operation_instance/gpu/column_to_image/CMakeFiles/device_column_to_image_instance.dir/device_column_to_image_gnhwc_2d_instance.cpp.o [88/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_f64_f64_f64_compute_f32_kkn_instance.cpp.o [89/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_f64_f64_f64_kkn_instance.cpp.o [90/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_f64_f64_f64_mnn_instance.cpp.o [91/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_f64_f64_f64_compute_f32_mkn_instance.cpp.o [92/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_lds_direct_load_f32_f32_f32_km_nk_mn_instance.cpp.o [93/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f64_f64_f64_km_kn_mn_instance.cpp.o [94/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_compute_f16_knn_instance.cpp.o [95/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_compute_f16_mkn_instance.cpp.o [96/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_bf16_bf16_bf16_compute_f32_mkn_instance.cpp.o [97/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_f32_compute_bf16_knnn_instance.cpp.o [98/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_kn_mn_add_instance.cpp.o [99/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_f64_f64_f64_compute_f32_knn_instance.cpp.o [100/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_f64_f64_f64_compute_f32_mnn_instance.cpp.o [101/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_f64_f64_f64_f64_compute_f32_mknn_instance.cpp.o [102/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f64_f64_f64_mk_nk_mn_instance.cpp.o [103/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_f32_f32_f32_f32_compute_bf16_kknn_instance.cpp.o [104/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f16_f16_f16_compute_f32_mnn_instance.cpp.o [105/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_bf16_bf16_bf16_compute_f32_knn_instance.cpp.o [106/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_nk_mn_add_instance.cpp.o [107/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_kn_mn_irregular_default_pipeline_v2_instance.cpp.o [108/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_lds_direct_load_f32_f32_f32_mk_kn_mn_instance.cpp.o [109/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_f64_f64_f64_f64_compute_f32_mnnn_instance.cpp.o [110/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_kn_mn_irregular_interwave_pipeline_v1_instance.cpp.o [111/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_f64_f64_f64_mkn_instance.cpp.o [112/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/mk_kn_mn_add_instance.cpp.o [113/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_nk_mn_irregular_default_pipeline_v2_instance.cpp.o [114/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_nk_mn_irregular_interwave_pipeline_v1_instance.cpp.o [115/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_f32_f32_f32_gkm_gnk_gmn_instance.cpp.o [116/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/mk_nk_mn_add_instance.cpp.o [117/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_lds_direct_load_f32_f32_f32_mk_nk_mn_instance.cpp.o [118/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_f32_f32_f32_compute_f16_knn_instance.cpp.o [119/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_f32_f32_f32_f32_compute_f16_mnnn_instance.cpp.o [120/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/mk_kn_mn_irregular_default_pipeline_v1_instance.cpp.o [121/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_nk_mn_default_pipeline_v2_opt_instance.cpp.o [122/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/mk_kn_mn_irregular_default_pipeline_v2_instance.cpp.o [123/4327] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_infer_f64_instance.cpp.o [124/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_f32_f32_f32_compute_bf16_knn_instance.cpp.o [125/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f32_f32_f32_mk_kn_mn_instance.cpp.o [126/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_reduce/CMakeFiles/device_batched_gemm_reduce_instance.dir/device_batched_gemm_reduce_xdl_cshuffle_f16_f16_f16_f32_f32_gkm_gnk_gmn_instance.cpp.o [127/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_f32_f32_f32_mkn_instance.cpp.o [128/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/mk_kn_mn_default_pipeline_v2_opt_instance.cpp.o [129/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_f64_f64_f64_f64_knnn_instance.cpp.o [130/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_f32_f32_f32_f32_compute_f16_mknn_instance.cpp.o [131/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_nk_mn_irregular_default_pipeline_v1_instance.cpp.o [132/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_f16_f16_f16_compute_f32_mnn_instance.cpp.o [133/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_compute_f32_mnn_instance.cpp.o [134/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/mk_nk_mn_default_pipeline_v2_opt_instance.cpp.o [135/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_f64_f64_f64_f64_compute_f32_knnn_instance.cpp.o [136/4327] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_backward_f16_instance.cpp.o [137/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_kkn_instance.cpp.o [138/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_f16_f16_f16_compute_f32_kkn_instance.cpp.o [139/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_bf16_bf16_bf16_compute_f32_kkn_instance.cpp.o [140/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_bf16_bf16_bf16_bf16_compute_f32_mnnn_instance.cpp.o [141/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_kn_mn_default_pipeline_v2_opt_instance.cpp.o [142/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_f32_f32_f32_compute_bf16_mnn_instance.cpp.o [143/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f32_f32_f32_mk_nk_mn_instance.cpp.o [144/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_f32_f32_f32_f32_mnnn_instance.cpp.o [145/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_kn_mn_irregular_default_pipeline_v1_instance.cpp.o [146/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/mk_kn_mn_irregular_interwave_pipeline_v1_instance.cpp.o [147/4327] Building CXX object library/src/tensor_operation_instance/gpu/elementwise_normalization/CMakeFiles/device_elementwise_normalization_instance.dir/device_elementwise_normalization_f16_instance.cpp.o [148/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_bf16_bf16_bf16_compute_f32_mkn_instance.cpp.o [149/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_f16_f16_f16_f16_compute_f32_knnn_instance.cpp.o [150/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_f32_f32_f32_km_kn_mn_instance.cpp.o [151/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f64_f64_f64_km_nk_mn_instance.cpp.o [152/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_bf16_bf16_bf16_compute_f32_knn_instance.cpp.o [153/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f64_f64_f64_mk_kn_mn_instance.cpp.o [154/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_f32_f32_f32_f32_mknn_instance.cpp.o [155/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_f64_f64_f64_f64_compute_f32_kknn_instance.cpp.o [156/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_f32_f32_f32_km_nk_mn_instance.cpp.o [157/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_int8_int8_int8_gmk_gkn_gmn_instance.cpp.o [158/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f16_f16_f16_compute_f32_kkn_instance.cpp.o [159/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/mk_nk_mn_irregular_default_pipeline_v1_instance.cpp.o [160/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_add_relu_gemm_add/CMakeFiles/device_batched_gemm_add_relu_gemm_add_instance.dir/device_batched_gemm_add_relu_gemm_add_xdl_cshuffle_f16_f16_f16_f16_gmk_gnk_gon_gmo_instance.cpp.o [161/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_bf16_bf16_bf16_compute_f32_mnn_instance.cpp.o [162/4327] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_fwd/CMakeFiles/device_conv2d_fwd_instance.dir/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp.o [163/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_softmax_gemm_permute/CMakeFiles/device_batched_gemm_softmax_gemm_permute_instance.dir/device_batched_gemm_softmax_gemm_permute_xdl_cshuffle_bf16_bf16_bf16_bf16_gmk_gnk_gno_gmo_instance.cpp.o [164/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_f32_f32_f32_mk_kn_mn_instance.cpp.o [165/4327] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_forward_f32_instance.cpp.o [166/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_kn_mn_default_pipeline_v2_instance.cpp.o [167/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_kn_mn_interwave_pipeline_v1_instance.cpp.o [168/4327] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_backward_f64_instance.cpp.o [169/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/mk_nk_mn_irregular_interwave_pipeline_v1_instance.cpp.o [170/4327] Building CXX object library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_f32_instance.cpp.o [171/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_bf16_bf16_bf16_bf16_compute_f32_kknn_instance.cpp.o [172/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_bf16_bf16_bf16_bf16_compute_f32_mnnn_instance.cpp.o [173/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_reduce/CMakeFiles/device_batched_gemm_reduce_instance.dir/device_batched_gemm_reduce_xdl_cshuffle_f16_f16_f16_f32_f32_gmk_gnk_gmn_instance.cpp.o [174/4327] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_bwd_data/CMakeFiles/device_conv2d_bwd_data_instance.dir/device_conv2d_bwd_data_xdl_nhwc_kyxc_nhwk_f32_instance.cpp.o [175/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_reduce/CMakeFiles/device_batched_gemm_reduce_instance.dir/device_batched_gemm_reduce_xdl_cshuffle_f16_f16_f16_f32_f32_gmk_gkn_gmn_instance.cpp.o [176/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f16_f16_f16_compute_f32_mkn_instance.cpp.o [177/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/2D/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_f32_compute_f16_mnnn_instance.cpp.o [178/4327] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_fwd/CMakeFiles/device_conv2d_fwd_instance.dir/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_int8_instance.cpp.o [179/4327] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_backward_f32_instance.cpp.o [180/4327] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_forward_f64_instance.cpp.o [181/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/mk_nk_mn_default_pipeline_v1_instance.cpp.o [182/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_fp8_fp8_fp8_km_nk_mn_instance.cpp.o FAILED: library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_fp8_fp8_fp8_km_nk_mn_instance.cpp.o /opt/rocm/bin/hipcc -DCK_ENABLE_BF16 -DCK_ENABLE_BF8 -DCK_ENABLE_FP16 -DCK_ENABLE_FP32 -DCK_ENABLE_FP64 -DCK_ENABLE_FP8 -DCK_ENABLE_INT8 -DCK_TIME_KERNEL=1 -DCK_USE_FNUZ_FP8 -DCK_USE_GFX94 -DCK_USE_OCP_FP8 -DCK_USE_WMMA -DCK_USE_XDL -DUSE_PROF_API=1 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -I/build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/include -I/build/composable-kernel/src/composable_kernel-rocm-6.4.0/include -I/build/composable-kernel/src/build/include -O3 -DNDEBUG -std=c++17 -fPIC -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Werror -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Wno-missing-field-initializers -Wno-deprecated-declarations -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Werror -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-conversion -Wno-double-promotion -Wno-exit-time-destructors -Wno-extra-semi -Wno-float-conversion -Wno-gnu-anonymous-struct -Wno-gnu-zero-variadic-macro-arguments -Wno-missing-prototypes -Wno-nested-anon-types -Wno-padded -Wno-return-std-move-in-c++11 -Wno-shorten-64-to-32 -Wno-sign-conversion -Wno-unknown-warning-option -Wno-unused-command-line-argument -Wno-weak-vtables -Wno-covered-switch-default -Wno-unsafe-buffer-usage -Wno-unused-lambda-capture -Wno-nvcc-compat -Wno-bit-int-extension -Wno-pass-failed -Wno-switch-default -fno-offload-uniform-block -mllvm --lsr-drop-solution=1 -mllvm -enable-post-misched=0 -mllvm -amdgpu-coerce-illegal-types=1 -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false -fcolor-diagnostics --offload-compress -x hip --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx942 -MD -MT library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_fp8_fp8_fp8_km_nk_mn_instance.cpp.o -MF library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_fp8_fp8_fp8_km_nk_mn_instance.cpp.o.d -o library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_fp8_fp8_fp8_km_nk_mn_instance.cpp.o -c /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm/device_gemm_xdl_c_shuffle_fp8_fp8_fp8_km_nk_mn_instance.cpp PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: /opt/rocm/lib/llvm/bin/clang-19 -cc1 -triple amdgcn-amd-amdhsa -aux-triple riscv64-unknown-linux-gnu -Werror=atomic-alignment -emit-obj -disable-free -clear-ast-before-backend -main-file-name device_gemm_xdl_c_shuffle_fp8_fp8_fp8_km_nk_mn_instance.cpp -mrelocation-model pic -pic-level 2 -fhalf-no-semantic-interposition -mframe-pointer=none -fno-rounding-math -mconstructor-aliases -aux-target-cpu generic-rv64 -aux-target-feature +m -aux-target-feature +a -aux-target-feature +f -aux-target-feature +d -aux-target-feature +c -aux-target-feature +zicsr -aux-target-feature +zmmul -aux-target-feature -b -aux-target-feature -e -aux-target-feature -h -aux-target-feature -shcounterenw -aux-target-feature -shgatpa -aux-target-feature -shtvala -aux-target-feature -shvsatpa -aux-target-feature -shvstvala -aux-target-feature -shvstvecd -aux-target-feature -smaia -aux-target-feature -smcdeleg -aux-target-feature -smcsrind -aux-target-feature -smepmp -aux-target-feature -smstateen -aux-target-feature -ssaia -aux-target-feature -ssccfg -aux-target-feature -ssccptr -aux-target-feature -sscofpmf -aux-target-feature -sscounterenw -aux-target-feature -sscsrind -aux-target-feature -ssstateen -aux-target-feature -ssstrict -aux-target-feature -sstc -aux-target-feature -sstvala -aux-target-feature -sstvecd -aux-target-feature -ssu64xl -aux-target-feature -svade -aux-target-feature -svadu -aux-target-feature -svbare -aux-target-feature -svinval -aux-target-feature -svnapot -aux-target-feature -svpbmt -aux-target-feature -v -aux-target-feature -xcvalu -aux-target-feature -xcvbi -aux-target-feature -xcvbitmanip -aux-target-feature -xcvelw -aux-target-feature -xcvmac -aux-target-feature -xcvmem -aux-target-feature -xcvsimd -aux-target-feature -xsfcease -aux-target-feature -xsfvcp -aux-target-feature -xsfvfnrclipxfqf -aux-target-feature -xsfvfwmaccqqq -aux-target-feature -xsfvqmaccdod -aux-target-feature -xsfvqmaccqoq -aux-target-feature -xsifivecdiscarddlone -aux-target-feature -xsifivecflushdlone -aux-target-feature -xtheadba -aux-target-feature -xtheadbb -aux-target-feature -xtheadbs -aux-target-feature -xtheadcmo -aux-target-feature -xtheadcondmov -aux-target-feature -xtheadfmemidx -aux-target-feature -xtheadmac -aux-target-feature -xtheadmemidx -aux-target-feature -xtheadmempair -aux-target-feature -xtheadsync -aux-target-feature -xtheadvdot -aux-target-feature -xventanacondops -aux-target-feature -xwchc -aux-target-feature -za128rs -aux-target-feature -za64rs -aux-target-feature -zaamo -aux-target-feature -zabha -aux-target-feature -zacas -aux-target-feature -zalrsc -aux-target-feature -zama16b -aux-target-feature -zawrs -aux-target-feature -zba -aux-target-feature -zbb -aux-target-feature -zbc -aux-target-feature -zbkb -aux-target-feature -zbkc -aux-target-feature -zbkx -aux-target-feature -zbs -aux-target-feature -zca -aux-target-feature -zcb -aux-target-feature -zcd -aux-target-feature -zce -aux-target-feature -zcf -aux-target-feature -zcmop -aux-target-feature -zcmp -aux-target-feature -zcmt -aux-target-feature -zdinx -aux-target-feature -zfa -aux-target-feature -zfbfmin -aux-target-feature -zfh -aux-target-feature -zfhmin -aux-target-feature -zfinx -aux-target-feature -zhinx -aux-target-feature -zhinxmin -aux-target-feature -zic64b -aux-target-feature -zicbom -aux-target-feature -zicbop -aux-target-feature -zicboz -aux-target-feature -ziccamoa -aux-target-feature -ziccif -aux-target-feature -zicclsm -aux-target-feature -ziccrse -aux-target-feature -zicntr -aux-target-feature -zicond -aux-target-feature -zifencei -aux-target-feature -zihintntl -aux-target-feature -zihintpause -aux-target-feature -zihpm -aux-target-feature -zimop -aux-target-feature -zk -aux-target-feature -zkn -aux-target-feature -zknd -aux-target-feature -zkne -aux-target-feature -zknh -aux-target-feature -zkr -aux-target-feature -zks -aux-target-feature -zksed -aux-target-feature -zksh -aux-target-feature -zkt -aux-target-feature -ztso -aux-target-feature -zvbb -aux-target-feature -zvbc -aux-target-feature -zve32f -aux-target-feature -zve32x -aux-target-feature -zve64d -aux-target-feature -zve64f -aux-target-feature -zve64x -aux-target-feature -zvfbfmin -aux-target-feature -zvfbfwma -aux-target-feature -zvfh -aux-target-feature -zvfhmin -aux-target-feature -zvkb -aux-target-feature -zvkg -aux-target-feature -zvkn -aux-target-feature -zvknc -aux-target-feature -zvkned -aux-target-feature -zvkng -aux-target-feature -zvknha -aux-target-feature -zvknhb -aux-target-feature -zvks -aux-target-feature -zvksc -aux-target-feature -zvksed -aux-target-feature -zvksg -aux-target-feature -zvksh -aux-target-feature -zvkt -aux-target-feature -zvl1024b -aux-target-feature -zvl128b -aux-target-feature -zvl16384b -aux-target-feature -zvl2048b -aux-target-feature -zvl256b -aux-target-feature -zvl32768b -aux-target-feature -zvl32b -aux-target-feature -zvl4096b -aux-target-feature -zvl512b -aux-target-feature -zvl64b -aux-target-feature -zvl65536b -aux-target-feature -zvl8192b -aux-target-feature -experimental-smmpm -aux-target-feature -experimental-smnpm -aux-target-feature -experimental-ssnpm -aux-target-feature -experimental-sspm -aux-target-feature -experimental-ssqosid -aux-target-feature -experimental-supm -aux-target-feature -experimental-zalasr -aux-target-feature -experimental-zicfilp -aux-target-feature -experimental-zicfiss -aux-target-feature +relax -fcuda-is-device -mllvm -amdgpu-internalize-symbols -fcuda-allow-variadic-functions -fvisibility=hidden -fapply-global-visibility-to-externs -mlink-builtin-bitcode /opt/rocm/amdgcn/bitcode/hip.bc -mlink-builtin-bitcode /opt/rocm/amdgcn/bitcode/ocml.bc -mlink-builtin-bitcode /opt/rocm/amdgcn/bitcode/ockl.bc -mlink-builtin-bitcode /opt/rocm/amdgcn/bitcode/oclc_daz_opt_off.bc -mlink-builtin-bitcode /opt/rocm/amdgcn/bitcode/oclc_unsafe_math_off.bc -mlink-builtin-bitcode /opt/rocm/amdgcn/bitcode/oclc_finite_only_off.bc -mlink-builtin-bitcode /opt/rocm/amdgcn/bitcode/oclc_correctly_rounded_sqrt_on.bc -mlink-builtin-bitcode /opt/rocm/amdgcn/bitcode/oclc_wavefrontsize64_on.bc -mlink-builtin-bitcode /opt/rocm/amdgcn/bitcode/oclc_isa_version_908.bc -mlink-builtin-bitcode /opt/rocm/amdgcn/bitcode/oclc_abi_version_600.bc -target-cpu gfx908 -debugger-tuning=gdb -fdebug-compilation-dir=/build/composable-kernel/src/build -resource-dir /opt/rocm/lib/llvm/lib/clang/19 -dependency-file library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_fp8_fp8_fp8_km_nk_mn_instance.cpp.o.d -MT library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_fp8_fp8_fp8_km_nk_mn_instance.cpp.o -sys-header-deps -internal-isystem /opt/rocm/lib/llvm/lib/clang/19/include/cuda_wrappers -idirafter /opt/rocm/include -include __clang_hip_runtime_wrapper.h -D CK_ENABLE_BF16 -D CK_ENABLE_BF8 -D CK_ENABLE_FP16 -D CK_ENABLE_FP32 -D CK_ENABLE_FP64 -D CK_ENABLE_FP8 -D CK_ENABLE_INT8 -D CK_TIME_KERNEL=1 -D CK_USE_FNUZ_FP8 -D CK_USE_GFX94 -D CK_USE_OCP_FP8 -D CK_USE_WMMA -D CK_USE_XDL -D USE_PROF_API=1 -D __HIP_PLATFORM_AMD__=1 -D __HIP_PLATFORM_HCC__=1 -I /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/include -I /build/composable-kernel/src/composable_kernel-rocm-6.4.0/include -I /build/composable-kernel/src/build/include -D NDEBUG -internal-isystem /usr/lib/gcc/riscv64-unknown-linux-gnu/15.1.1/../../../../include/c++/15.1.1 -internal-isystem /usr/lib/gcc/riscv64-unknown-linux-gnu/15.1.1/../../../../include/c++/15.1.1/riscv64-unknown-linux-gnu -internal-isystem /usr/lib/gcc/riscv64-unknown-linux-gnu/15.1.1/../../../../include/c++/15.1.1/backward -internal-isystem /usr/lib/gcc/riscv64-unknown-linux-gnu/15.1.1/../../../../include/c++/15.1.1 -internal-isystem /usr/lib/gcc/riscv64-unknown-linux-gnu/15.1.1/../../../../include/c++/15.1.1/riscv64-unknown-linux-gnu -internal-isystem /usr/lib/gcc/riscv64-unknown-linux-gnu/15.1.1/../../../../include/c++/15.1.1/backward -internal-isystem /opt/rocm/lib/llvm/lib/clang/19/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/riscv64-unknown-linux-gnu/15.1.1/../../../../riscv64-unknown-linux-gnu/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /opt/rocm/lib/llvm/lib/clang/19/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/riscv64-unknown-linux-gnu/15.1.1/../../../../riscv64-unknown-linux-gnu/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -source-date-epoch 1747750951 -O3 -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Werror -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Wno-missing-field-initializers -Wno-deprecated-declarations -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Werror -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-conversion -Wno-double-promotion -Wno-exit-time-destructors -Wno-extra-semi -Wno-float-conversion -Wno-gnu-anonymous-struct -Wno-gnu-zero-variadic-macro-arguments -Wno-missing-prototypes -Wno-nested-anon-types -Wno-padded -Wno-return-std-move-in-c++11 -Wno-shorten-64-to-32 -Wno-sign-conversion -Wno-unknown-warning-option -Wno-unused-command-line-argument -Wno-weak-vtables -Wno-covered-switch-default -Wno-unsafe-buffer-usage -Wno-unused-lambda-capture -Wno-nvcc-compat -Wno-bit-int-extension -Wno-pass-failed -Wno-switch-default -std=c++17 -fdeprecated-macro -fno-autolink -ferror-limit 19 -fhip-new-launch-api -fno-signed-char -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fcxx-exceptions -fexceptions -fcolor-diagnostics -vectorize-loops -vectorize-slp -mllvm --lsr-drop-solution=1 -mllvm -enable-post-misched=0 -mllvm -amdgpu-coerce-illegal-types=1 -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false -cuid=cc0211adb95d687 -fcuda-allow-variadic-functions -fno-offload-uniform-block -faddrsig -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o /tmp/device_gemm_xdl_c_shuffle_fp8_fp8_fp8_km_nk_mn_instance-gfx908-53b619.o -x hip /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm/device_gemm_xdl_c_shuffle_fp8_fp8_fp8_km_nk_mn_instance.cpp 1. parser at end of file 2. Optimizer 3. Running pass "require,function(invalidate),require,cgscc(devirt<4>(inline,inline,function-attrs,argpromotion,openmp-opt-cgscc,function(amdgpu-promote-kernel-arguments,infer-address-spaces,amdgpu-lower-kernel-attributes,amdgpu-promote-alloca-to-vector),function(sroa,early-cse,speculative-execution,jump-threading,correlated-propagation,simplifycfg,instcombine,aggressive-instcombine,libcalls-shrinkwrap,amdgpu-usenative,amdgpu-simplifylib,tailcallelim,simplifycfg,reassociate,constraint-elimination,loop-mssa(loop-instsimplify,loop-simplifycfg,licm,loop-rotate,licm,simple-loop-unswitch),simplifycfg,instcombine,loop(loop-idiom,indvars,simple-loop-unswitch,loop-deletion,loop-unroll-full),sroa,vector-combine,mldst-motion,gvn<>,sccp,bdce,instcombine,amdgpu-usenative,amdgpu-simplifylib,jump-threading,correlated-propagation,adce,memcpyopt,dse,move-auto-init,loop-mssa(licm),coro-elide,simplifycfg,instcombine,amdgpu-usenative,amdgpu-simplifylib),function-attrs,function(require),coro-split)),function(invalidate),cgscc(devirt<4>())" on module "/build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm/device_gemm_xdl_c_shuffle_fp8_fp8_fp8_km_nk_mn_instance.cpp" 4. Running pass "cgscc(devirt<4>(inline,inline,function-attrs,argpromotion,openmp-opt-cgscc,function(amdgpu-promote-kernel-arguments,infer-address-spaces,amdgpu-lower-kernel-attributes,amdgpu-promote-alloca-to-vector),function(sroa,early-cse,speculative-execution,jump-threading,correlated-propagation,simplifycfg,instcombine,aggressive-instcombine,libcalls-shrinkwrap,amdgpu-usenative,amdgpu-simplifylib,tailcallelim,simplifycfg,reassociate,constraint-elimination,loop-mssa(loop-instsimplify,loop-simplifycfg,licm,loop-rotate,licm,simple-loop-unswitch),simplifycfg,instcombine,loop(loop-idiom,indvars,simple-loop-unswitch,loop-deletion,loop-unroll-full),sroa,vector-combine,mldst-motion,gvn<>,sccp,bdce,instcombine,amdgpu-usenative,amdgpu-simplifylib,jump-threading,correlated-propagation,adce,memcpyopt,dse,move-auto-init,loop-mssa(licm),coro-elide,simplifycfg,instcombine,amdgpu-usenative,amdgpu-simplifylib),function-attrs,function(require),coro-split))" on module "/build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm/device_gemm_xdl_c_shuffle_fp8_fp8_fp8_km_nk_mn_instance.cpp" 5. Running pass "sccp" on function "_ZNK2ck6detail15static_for_implINS_8SequenceIJLi0ELi1ELi2ELi3ELi4ELi5ELi6ELi7EEEEEclIZNS_43GridwiseGemm_k0mk1_k0nk1_mn_xdl_cshuffle_v1INS_13tensor_layout4gemm11ColumnMajorES9_NS8_8RowMajorENS_8f8_ocp_tESB_fSB_SB_NS_16tensor_operation12element_wise11PassThroughESE_SE_LNSC_6device18GemmSpecializationE0ELNS_25InMemoryDataOperationEnumE0ELi1ELi256ELi256ELi128ELi64ELi16ELi16ELi32ELi32ELi4ELi2ENS2_IJLi4ELi64ELi1EEEENS2_IJLi0ELi2ELi1EEEESJ_Li1ELi4ELi16ELb0ELi1ESI_NS2_IJLi1ELi0ELi2EEEESK_Li2ELi16ELi16ELb0ELi1ELi1ELi1ENS2_IJLi1ELi64ELi1ELi4EEEELi16ELNS_13LoopSchedulerE0ELNS_15PipelineVersionE0ESB_SB_E3RunILb1EEEvPKSB_SR_PSB_PvRKNSO_7ProblemEEUlT_E_EEvSX_" #0 0x0000002ab4a3f7b6 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/rocm/lib/llvm/bin/clang-19+0x26d17b6) #1 0x0000002ab4a3cf60 (/opt/rocm/lib/llvm/bin/clang-19+0x26cef60) #2 0x0000003fa04ad800 (linux-vdso.so.1+0x800) #3 0x0000002ab8572ac6 llvm::SCCPSolver::isConstant(llvm::ValueLatticeElement const&) (/opt/rocm/lib/llvm/bin/clang-19+0x6204ac6) #4 0x0000003fc121db70 clang++: error: unable to execute command: Segmentation fault (core dumped) clang++: error: clang frontend command failed due to signal (use -v to see invocation) clang version 19.0.0git (/startdir/rocm-llvm c7fe45cf4b819c5991fe208aaa96edf142730f1d) Target: riscv64-unknown-linux-gnu Thread model: posix InstalledDir: /opt/rocm/lib/llvm/bin Build config: +assertions clang++: note: diagnostic msg: Error generating preprocessed source(s). failed to execute:/opt/rocm/lib/llvm/bin/clang++ --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx942 -DCK_ENABLE_BF16 -DCK_ENABLE_BF8 -DCK_ENABLE_FP16 -DCK_ENABLE_FP32 -DCK_ENABLE_FP64 -DCK_ENABLE_FP8 -DCK_ENABLE_INT8 -DCK_TIME_KERNEL=1 -DCK_USE_FNUZ_FP8 -DCK_USE_GFX94 -DCK_USE_OCP_FP8 -DCK_USE_WMMA -DCK_USE_XDL -DUSE_PROF_API=1 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -I/build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/include -I/build/composable-kernel/src/composable_kernel-rocm-6.4.0/include -I/build/composable-kernel/src/build/include -O3 -DNDEBUG -std=c++17 -fPIC -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Werror -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Wno-missing-field-initializers -Wno-deprecated-declarations -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Werror -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-conversion -Wno-double-promotion -Wno-exit-time-destructors -Wno-extra-semi -Wno-float-conversion -Wno-gnu-anonymous-struct -Wno-gnu-zero-variadic-macro-arguments -Wno-missing-prototypes -Wno-nested-anon-types -Wno-padded -Wno-return-std-move-in-c++11 -Wno-shorten-64-to-32 -Wno-sign-conversion -Wno-unknown-warning-option -Wno-unused-command-line-argument -Wno-weak-vtables -Wno-covered-switch-default -Wno-unsafe-buffer-usage -Wno-unused-lambda-capture -Wno-nvcc-compat -Wno-bit-int-extension -Wno-pass-failed -Wno-switch-default -fno-offload-uniform-block -mllvm --lsr-drop-solution=1 -mllvm -enable-post-misched=0 -mllvm -amdgpu-coerce-illegal-types=1 -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false -fcolor-diagnostics --offload-compress -x hip -MD -MT library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_fp8_fp8_fp8_km_nk_mn_instance.cpp.o -MF library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_fp8_fp8_fp8_km_nk_mn_instance.cpp.o.d -o "library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_fp8_fp8_fp8_km_nk_mn_instance.cpp.o" -c /build/composable-kernel/src/composable_kernel-rocm-6.4.0/library/src/tensor_operation_instance/gpu/gemm/device_gemm_xdl_c_shuffle_fp8_fp8_fp8_km_nk_mn_instance.cpp [183/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_f16_f16_f16_km_nk_mn_instance.cpp.o [184/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_2_stage_f16_f16_f16_mk_nk_mn_instance.cpp.o [185/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_lds_direct_load_f16_f16_f16_mk_nk_mn_instance.cpp.o [186/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_f32_f32_f32_f32_compute_f16_knnn_instance.cpp.o [187/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_nk_mn_default_pipeline_v2_instance.cpp.o [188/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_softmax_gemm/CMakeFiles/device_batched_gemm_softmax_gemm_instance.dir/device_batched_gemm_softmax_gemm_xdl_cshuffle_f16_f16_f16_f16_gmk_gnk_gno_gmo_instance.cpp.o [189/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/mk_nk_mn_default_pipeline_v2_instance.cpp.o [190/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_fastgelu/CMakeFiles/device_gemm_add_fastgelu_instance.dir/device_gemm_add_fastgelu_xdl_c_shuffle_f16_i8_f16_f16_mk_kn_mn_mn_instance.cpp.o [191/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_ab_scale/CMakeFiles/device_gemm_ab_scale_instance.dir/device_gemm_ab_scale_xdl_f8_f8_bf16/device_gemm_ab_scale_xdl_f8_f8_bf16_mk_nk_mn_128_128_128_comp_kpadding_instance.cpp.o [192/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_f32_f32_f32_mnn_instance.cpp.o [193/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_compute_f32_mkn_instance.cpp.o [194/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f32_f32_f32_km_nk_mn_instance.cpp.o [195/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_relu/CMakeFiles/device_gemm_add_relu_instance.dir/device_gemm_add_relu_xdl_c_shuffle_f16_i8_f16_f16_mk_kn_mn_mn_instance.cpp.o [196/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_f32_f32_f32_compute_f16_kkn_instance.cpp.o [197/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f32_f32_f32_km_kn_mn_instance.cpp.o [198/4327] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_backward_bf16_instance.cpp.o [199/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_bf16_bf16_bf16_km_kn_mn_instance.cpp.o [200/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_f32_f32_f32_mk_nk_mn_instance.cpp.o [201/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add/CMakeFiles/device_gemm_add_instance.dir/device_gemm_add_xdl_c_shuffle_bf16_i8_bf16_bf16_mk_kn_mn_mn_instance.cpp.o [202/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_relu/CMakeFiles/device_gemm_add_relu_instance.dir/device_gemm_add_relu_xdl_c_shuffle_bf16_i8_bf16_bf16_mk_kn_mn_mn_instance.cpp.o [203/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_ab_scale/CMakeFiles/device_gemm_ab_scale_instance.dir/device_gemm_ab_scale_xdl_f8_f8_bf16/device_gemm_ab_scale_xdl_f8_f8_bf16_mk_nk_mn_128_128_128_comp_mnpadding_instance.cpp.o [204/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_fastgelu/CMakeFiles/device_gemm_add_fastgelu_instance.dir/device_gemm_add_fastgelu_xdl_c_shuffle_bf16_i8_bf16_bf16_mk_kn_mn_mn_instance.cpp.o [205/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_compute_bf16_mnn_instance.cpp.o [206/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/mk_kn_mn_default_pipeline_v2_instance.cpp.o [207/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_i8_i8_i8_mk_nk_mn_instance.cpp.o [208/4327] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_infer_bf16_instance.cpp.o [209/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_bf16_bf16_bf16_km_nk_mn_instance.cpp.o [210/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_i8_i8_i8_km_nk_mn_instance.cpp.o [211/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/mk_kn_mn_default_pipeline_v1_instance.cpp.o [212/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_ab_scale/CMakeFiles/device_gemm_ab_scale_instance.dir/device_gemm_ab_scale_xdl_f8_f8_bf16/device_gemm_ab_scale_xdl_f8_f8_bf16_mk_nk_mn_128_128_128_comp_mnkpadding_instance.cpp.o [213/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_kn_mn_default_pipeline_v1_instance.cpp.o [214/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_i8_i8_i8_km_kn_mn_instance.cpp.o [215/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_int8_int8_int8_gkm_gnk_gmn_instance.cpp.o [216/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/mk_kn_mn_interwave_pipeline_v1_instance.cpp.o [217/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_f16_f16_f16_f16_compute_f32_mknn_instance.cpp.o [218/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_nk_mn_default_pipeline_v1_instance.cpp.o [219/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_i8_i8_i8_mk_kn_mn_instance.cpp.o [220/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_nk_mn_interwave_pipeline_v1_instance.cpp.o [221/4327] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_forward_f16_instance.cpp.o [222/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_compute_bf16_mkn_instance.cpp.o [223/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_ab_scale/CMakeFiles/device_gemm_ab_scale_instance.dir/device_gemm_ab_scale_xdl_f8_f8_bf16/device_gemm_ab_scale_xdl_f8_f8_bf16_mk_nk_mn_128_128_128_comp_default_instance.cpp.o [224/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add/CMakeFiles/device_gemm_add_instance.dir/device_gemm_add_xdl_c_shuffle_f16_i8_f16_f16_mk_kn_mn_mn_instance.cpp.o [225/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_f16_f16_f16_compute_f32_mkn_instance.cpp.o [226/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_f32_f32_f32_knn_instance.cpp.o [227/4327] Building CXX object library/src/tensor_operation_instance/gpu/conv3d_bwd_data/CMakeFiles/device_conv3d_bwd_data_instance.dir/device_conv3d_bwd_data_xdl_ndhwc_kzyxc_ndhwk_bf16_instance.cpp.o [228/4327] Building CXX object library/src/tensor_operation_instance/gpu/conv3d_bwd_data/CMakeFiles/device_conv3d_bwd_data_instance.dir/device_conv3d_bwd_data_xdl_ndhwc_kzyxc_ndhwk_f32_instance.cpp.o [229/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_f32_f32_f32_f32_kknn_instance.cpp.o [230/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f16_f16_f16_compute_f32_knn_instance.cpp.o [231/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/mk_nk_mn_irregular_default_pipeline_v2_instance.cpp.o [232/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_silu/CMakeFiles/device_gemm_add_silu_instance.dir/device_gemm_add_silu_xdl_c_shuffle_f16_i8_f16_f16_mk_kn_mn_mn_instance.cpp.o [233/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_f16_f16_f16_compute_f32_knn_instance.cpp.o [234/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_f32_f32_f32_compute_bf16_kkn_instance.cpp.o [235/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/mk_nk_mn_interwave_pipeline_v1_instance.cpp.o [236/4327] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_forward_bf16_instance.cpp.o [237/4327] Building CXX object library/src/tensor_operation_instance/gpu/conv3d_bwd_data/CMakeFiles/device_conv3d_bwd_data_instance.dir/device_conv3d_bwd_data_xdl_ndhwc_kzyxc_ndhwk_int8_instance.cpp.o [238/4327] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_fwd/CMakeFiles/device_conv2d_fwd_instance.dir/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp.o [239/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_softmax_gemm_permute/CMakeFiles/device_batched_gemm_softmax_gemm_permute_instance.dir/device_batched_gemm_bias_softmax_gemm_permute_xdl_cshuffle_f16_f16_f16_f16_gmk_gnk_gno_gmo_instance.cpp.o [240/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_silu/CMakeFiles/device_gemm_add_silu_instance.dir/device_gemm_add_silu_xdl_c_shuffle_bf16_i8_bf16_bf16_mk_kn_mn_mn_instance.cpp.o [241/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_multiply/CMakeFiles/device_gemm_add_multiply_instance.dir/device_gemm_add_multiply_xdl_c_shuffle_f16_f16_f16_f16_f16_km_nk_mn_mn_mn_instance.cpp.o [242/4327] Building CXX object library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_int8_instance.cpp.o [243/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_f32_f32_f32_f32_knnn_instance.cpp.o [244/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bias_add_reduce/CMakeFiles/device_gemm_bias_add_reduce_instance.dir/device_gemm_bias_add_mean_squaremean_xdl_cshuffle_f16_f16_f16_f32_f32_mk_nk_mn_instance.cpp.o [245/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_bf16_bf16_bf16_bf16_compute_f32_knnn_instance.cpp.o [246/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_f32_f32_f32_compute_bf16_mkn_instance.cpp.o [247/4327] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_infer_f32_instance.cpp.o [248/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/6D/device_contraction_bilinear_m6_n6_k6_xdl_c_shuffle_f16_f16_f16_f16_compute_f32_mnnn_instance.cpp.o [249/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_f32_f32_f32_compute_f16_mnn_instance.cpp.o [250/4327] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_fwd/CMakeFiles/device_conv2d_fwd_instance.dir/device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk_f16_instance.cpp.o [251/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_fastgelu/CMakeFiles/device_gemm_add_fastgelu_instance.dir/device_gemm_add_fastgelu_xdl_c_shuffle_f16_f16_f16_f16_km_kn_mn_mn_instance.cpp.o [252/4327] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_bwd_data/CMakeFiles/device_conv2d_bwd_data_instance.dir/device_conv2d_bwd_data_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp.o [253/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_add_fastgelu/CMakeFiles/device_gemm_add_add_fastgelu_instance.dir/device_gemm_add_add_fastgelu_xdl_c_shuffle_f16_f16_f16_f16_f16_mk_nk_mn_mn_mn_instance.cpp.o [254/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/2D/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_bf16_bf16_bf16_compute_f32_mnn_instance.cpp.o [255/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bias_add_reduce/CMakeFiles/device_gemm_bias_add_reduce_instance.dir/device_gemm_bias_add_mean_squaremean_xdl_cshuffle_f16_f16_f16_f32_f32_mk_kn_mn_instance.cpp.o [256/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_add_fastgelu/CMakeFiles/device_gemm_add_add_fastgelu_instance.dir/device_gemm_add_add_fastgelu_xdl_c_shuffle_f16_f16_f16_f16_f16_mk_kn_mn_mn_mn_instance.cpp.o [257/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_add_fastgelu/CMakeFiles/device_gemm_add_add_fastgelu_instance.dir/device_gemm_add_add_fastgelu_xdl_c_shuffle_f16_f16_f16_f16_f16_km_nk_mn_mn_mn_instance.cpp.o [258/4327] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_fwd_bias_relu/CMakeFiles/device_conv2d_fwd_bias_relu_instance.dir/device_conv2d_fwd_xdl_c_shuffle_bias_relu_nhwc_kyxc_nhwk_f16_instance.cpp.o [259/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_add_fastgelu/CMakeFiles/device_gemm_add_add_fastgelu_instance.dir/device_gemm_add_add_fastgelu_xdl_c_shuffle_f16_f16_f16_f16_f16_km_kn_mn_mn_mn_instance.cpp.o [260/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_fastgelu/CMakeFiles/device_gemm_add_fastgelu_instance.dir/device_gemm_add_fastgelu_xdl_c_shuffle_f16_f16_f16_f16_km_nk_mn_mn_instance.cpp.o [261/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_multiply/CMakeFiles/device_gemm_add_multiply_instance.dir/device_gemm_add_multiply_xdl_c_shuffle_f16_f16_f16_f16_f16_mk_nk_mn_mn_mn_instance.cpp.o [262/4327] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_bwd_data/CMakeFiles/device_conv2d_bwd_data_instance.dir/device_conv2d_bwd_data_xdl_nhwc_kyxc_nhwk_f16_instance.cpp.o [263/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_multiply/CMakeFiles/device_gemm_add_multiply_instance.dir/device_gemm_add_multiply_xdl_c_shuffle_f16_f16_f16_f16_f16_km_kn_mn_mn_mn_instance.cpp.o [264/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_fastgelu/CMakeFiles/device_gemm_add_fastgelu_instance.dir/device_gemm_add_fastgelu_xdl_c_shuffle_f16_f16_f16_f16_mk_nk_mn_mn_instance.cpp.o [265/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_f16_f16_f16_mk_nk_mn_instance.cpp.o [266/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_fp8_fp8_fp8_mk_kn_mn_v1_padded_instance.cpp.o [267/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_fastgelu/CMakeFiles/device_gemm_add_fastgelu_instance.dir/device_gemm_add_fastgelu_xdl_c_shuffle_f16_f16_f16_f16_mk_kn_mn_mn_instance.cpp.o [268/4327] Building CXX object library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_bf16_instance.cpp.o [269/4327] Building CXX object library/src/tensor_operation_instance/gpu/elementwise/CMakeFiles/device_elementwise_instance.dir/device_normalize_instance.cpp.o [270/4327] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_infer_f16_instance.cpp.o [271/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_f16_f16_f16_km_kn_mn_instance.cpp.o [272/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_multiply/CMakeFiles/device_gemm_add_multiply_instance.dir/device_gemm_add_multiply_xdl_c_shuffle_f16_f16_f16_f16_f16_mk_kn_mn_mn_mn_instance.cpp.o [273/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_bf16_bf16_bf16_mk_nk_mn_instance.cpp.o [274/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_fp8_fp8_fp8_mk_kn_mn_v1_interwave_default_instance.cpp.o [275/4327] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_bwd_data/CMakeFiles/device_conv2d_bwd_data_instance.dir/device_conv2d_bwd_data_xdl_nhwc_kyxc_nhwk_int8_instance.cpp.o [276/4327] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_fwd/CMakeFiles/device_conv2d_fwd_instance.dir/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f16_instance.cpp.o [277/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_softmax_gemm_permute/CMakeFiles/device_batched_gemm_softmax_gemm_permute_instance.dir/device_batched_gemm_softmax_gemm_permute_xdl_cshuffle_f16_f16_f16_f16_gmk_gnk_gno_gmo_instance.cpp.o [278/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bias_add_reduce/CMakeFiles/device_gemm_bias_add_reduce_instance.dir/device_gemm_bias_add_mean_squaremean_xdl_cshuffle_f16_f16_f16_f32_f32_km_kn_mn_instance.cpp.o [279/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_ab_scale/CMakeFiles/device_gemm_ab_scale_instance.dir/device_gemm_ab_scale_xdl_f8_f8_bf16/device_gemm_ab_scale_xdl_f8_f8_bf16_mk_nk_mn_128_128_128_mem_v1_mnkpadding_instance.cpp.o [280/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_ab_scale/CMakeFiles/device_gemm_ab_scale_instance.dir/device_gemm_ab_scale_xdl_f8_f8_bf16/device_gemm_ab_scale_xdl_f8_f8_bf16_mk_nk_mn_128_128_128_mem_v1_kpadding_instance.cpp.o [281/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_relu_add_layernorm/CMakeFiles/device_gemm_add_relu_add_layernorm_instance.dir/device_gemm_add_relu_add_xdl_c_shuffle_layernorm_f16_mk_nk_mn_mn_mn_instance.cpp.o [282/4327] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/6D/device_contraction_scale_m6_n6_k6_xdl_c_shuffle_f32_f32_f32_compute_f16_mkn_instance.cpp.o [283/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_wmma_f16_f16_f16_mk_nk_mn_instance.cpp.o [284/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_fp8_fp8_fp8_mk_kn_mn_v1_default_instance.cpp.o [285/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_fp8_fp8_fp8_mk_nk_mn_instance.cpp.o [286/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_ab_scale/CMakeFiles/device_gemm_ab_scale_instance.dir/device_gemm_ab_scale_xdl_f8_f8_bf16/device_gemm_ab_scale_xdl_f8_f8_bf16_mk_nk_mn_128_128_128_mem_v1_default_instance.cpp.o [287/4327] Building CXX object library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_f16_instance.cpp.o [288/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_fp8_fp8_fp8_mk_kn_mn_v2_default_instance.cpp.o [289/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_relu_add_layernorm/CMakeFiles/device_gemm_add_relu_add_layernorm_instance.dir/device_gemm_add_relu_add_xdl_c_shuffle_layernorm_f16_km_kn_mn_mn_mn_instance.cpp.o [290/4327] Building CXX object library/src/tensor_operation_instance/gpu/conv3d_bwd_data/CMakeFiles/device_conv3d_bwd_data_instance.dir/device_conv3d_bwd_data_xdl_ndhwc_kzyxc_ndhwk_f16_instance.cpp.o [291/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_b_scale/CMakeFiles/device_gemm_b_scale_instance.dir/device_gemm_b_scale_xdl_f16_i4_f16/device_gemm_b_scale_xdl_f16_i4_f16_mk_nk_mn_mem_v2_default_instance.cpp.o [292/4327] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_softmax_gemm_permute/CMakeFiles/device_batched_gemm_softmax_gemm_permute_instance.dir/device_batched_gemm_bias_softmax_gemm_permute_xdl_cshuffle_bf16_bf16_bf16_bf16_gmk_gnk_gno_gmo_instance.cpp.o [293/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_fp8_fp8_fp8_km_kn_mn_instance.cpp.o [294/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_wmma_int8_int8_int8_mk_nk_mn_instance.cpp.o [295/4327] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_fwd_bias_relu_add/CMakeFiles/device_conv2d_fwd_bias_relu_add_instance.dir/device_conv2d_fwd_xdl_c_shuffle_bias_relu_add_nhwc_kyxc_nhwk_f16_instance.cpp.o [296/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_wmma_bf16_bf16_bf16_km_nk_mn_instance.cpp.o [297/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_wmma_bf16_bf16_bf16_mk_nk_mn_instance.cpp.o [298/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_fp8_fp8_fp8_mk_kn_mn_v1_interwave_padded_instance.cpp.o [299/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_relu_add_layernorm/CMakeFiles/device_gemm_add_relu_add_layernorm_instance.dir/device_gemm_add_relu_add_xdl_c_shuffle_layernorm_f16_mk_kn_mn_mn_mn_instance.cpp.o [300/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_wmma_int8_int8_int8_km_nk_mn_instance.cpp.o [301/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_wmma_f16_f16_f16_km_kn_mn_instance.cpp.o [302/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_relu_add_layernorm/CMakeFiles/device_gemm_add_relu_add_layernorm_instance.dir/device_gemm_add_relu_add_xdl_c_shuffle_layernorm_f16_km_nk_mn_mn_mn_instance.cpp.o [303/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_f16_f16_f16_mk_kn_mn_instance.cpp.o [304/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_wmma_bf16_bf16_bf16_mk_kn_mn_instance.cpp.o [305/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_bf16_bf16_bf16_mk_kn_mn_instance.cpp.o [306/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_fp8_fp8_fp8_mk_kn_mn_v2_padded_instance.cpp.o [307/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_wmma_int8_int8_int8_mk_kn_mn_instance.cpp.o [308/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_wmma_f16_f16_f16_km_nk_mn_instance.cpp.o [309/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_wmma_f16_f16_f16_mk_kn_mn_instance.cpp.o [310/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_wmma_bf16_bf16_bf16_km_kn_mn_instance.cpp.o [311/4327] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_wmma_int8_int8_int8_km_kn_mn_instance.cpp.o ninja: build stopped: subcommand failed. ==> ERROR: A failure occurred in build().  Aborting... ==> ERROR: Build failed, check /var/lib/archbuild/extra-riscv64/felix-0/build [?25h[?25h[?25hreceiving incremental file list composable-kernel-6.4.0-1-riscv64-build.log composable-kernel-6.4.0-1-riscv64-prepare.log sent 62 bytes received 12,212 bytes 24,548.00 bytes/sec total size is 119,688 speedup is 9.75