Available on
target_arch=amdgpu only.Expand description
amdgpu intrinsics
The reference is the LLVM amdgpu guide and the LLVM implementation. The order of intrinsics here follows the order in the LLVM implementation.
Functionsยง
- llvm_
ballot ๐ - llvm_
dispatch_ ๐id - llvm_
ds_ ๐ โbpermute - llvm_
ds_ ๐ โpermute - llvm_
endpgm ๐ - llvm_
groupstaticsize ๐ - llvm_
inverse_ ๐ballot - llvm_
mbcnt_ ๐hi - llvm_
mbcnt_ ๐lo - llvm_
perm ๐ โ - llvm_
permlane16_ ๐ โswap - llvm_
permlane16_ ๐ โu32 - llvm_
permlane16_ ๐ โvar - llvm_
permlane32_ ๐ โswap - llvm_
permlane64_ ๐ โu32 - llvm_
permlanex16_ ๐ โu32 - llvm_
permlanex16_ ๐ โvar - llvm_
readfirstlane_ ๐u32 - llvm_
readfirstlane_ ๐u64 - llvm_
readlane_ ๐ โu32 - llvm_
readlane_ ๐ โu64 - llvm_
s_ ๐barrier - llvm_
s_ ๐ โbarrier_ signal - llvm_
s_ ๐ โbarrier_ signal_ isfirst - llvm_
s_ ๐ โbarrier_ wait - llvm_
s_ ๐ โget_ barrier_ state - llvm_
s_ ๐get_ waveid_ in_ workgroup - llvm_
s_ ๐getpc - llvm_
s_ ๐memrealtime - llvm_
s_ ๐sethalt - llvm_
s_ ๐sleep - llvm_
sched_ ๐ โbarrier - llvm_
sched_ ๐ โgroup_ barrier - llvm_
update_ ๐ โdpp - llvm_
wave_ ๐barrier - llvm_
wave_ ๐id - llvm_
wave_ ๐reduce_ add - llvm_
wave_ ๐reduce_ and - llvm_
wave_ ๐reduce_ max - llvm_
wave_ ๐reduce_ min - llvm_
wave_ ๐reduce_ or - llvm_
wave_ ๐reduce_ umax - llvm_
wave_ ๐reduce_ umin - llvm_
wave_ ๐reduce_ xor - llvm_
wavefrontsize ๐ - llvm_
workgroup_ ๐id_ x - llvm_
workgroup_ ๐id_ y - llvm_
workgroup_ ๐id_ z - llvm_
workitem_ ๐id_ x - llvm_
workitem_ ๐id_ y - llvm_
workitem_ ๐id_ z - llvm_
writelane_ ๐ โu32 - llvm_
writelane_ ๐ โu64 - ballot
Experimental - Returns a bitfield (
u32oru64) containing the result of its i1 argument in all active lanes, and zero in all inactive lanes. - dispatch_
id Experimental - Returns the id of the dispatch that is currently executed.
- ds_
bpermute โExperimental - Gather data across all lanes in a wavefront.
- ds_
permute โExperimental - Scatter data across all lanes in a wavefront.
- endpgm
Experimental - Stop execution of the wavefront.
- groupstaticsize
Experimental - Returns the size of statically allocated shared memory for this program in bytes.
- inverse_
ballot Experimental - Indexes into the
valuewith the current lane id and returns for each lane if the corresponding bit is set. - mbcnt_
hi Experimental - Masked bit count, high 32 lanes.
- mbcnt_
lo Experimental - Masked bit count, low 32 lanes.
- permโ
Experimental - Permute a 64-bit value.
- permlane16_
swap โExperimental - Provide direct access to
v_permlane16_swap_b32instruction on supported targets. - permlane16_
u32 โExperimental - Performs arbitrary gather-style operation within a row (16 contiguous lanes) of the second input operand.
- permlane16_
var โExperimental - Performs arbitrary gather-style operation within a row (16 contiguous lanes) of the second input operand.
- permlane32_
swap โExperimental - Provide direct access to
v_permlane32_swap_b32instruction on supported targets. - permlane64_
u32 โExperimental - Swap
valuebetween upper and lower 32 lanes in a wavefront. - permlanex16_
u32 โExperimental - Performs arbitrary gather-style operation across two rows (16 contiguous lanes) of the second input operand.
- permlanex16_
var โExperimental - Performs arbitrary gather-style operation across two rows (16 contiguous lanes) of the second input operand.
- readfirstlane_
u32 Experimental - Get
valuefrom the first active lane in the wavefront. - readfirstlane_
u64 Experimental - Get
valuefrom the first active lane in the wavefront. - readlane_
u32 โExperimental - Get
valuefrom the lane at indexlanein the wavefront. - readlane_
u64 โExperimental - Get
valuefrom the lane at indexlanein the wavefront. - s_
barrier Experimental - Synchronize all wavefronts in a workgroup.
- s_
barrier_ โsignal Experimental - Signal a specific barrier type.
- s_
barrier_ โsignal_ isfirst Experimental - Signal a specific barrier type.
- s_
barrier_ โwait Experimental - Wait for a specific barrier type.
- s_
get_ โbarrier_ state Experimental - Get the state of a specific barrier type.
- s_
get_ waveid_ in_ workgroup Experimental - Get the index of the current wavefront in the workgroup.
- s_getpc
Experimental - Returns the current process counter.
- s_
memrealtime Experimental - Measures time based on a fixed frequency.
- s_
sethalt Experimental - Stop execution of the kernel.
- s_sleep
Experimental - Sleeps for approximately
COUNT * 64cycles. - sched_
barrier โExperimental - Prevent movement of some instruction types.
- sched_
group_ โbarrier Experimental - Creates schedule groups with specific properties to create custom scheduling pipelines.
- update_
dpp โExperimental - The
update_dppintrinsic represents theupdate.dppoperation in AMDGPU. It takes an old value, a source operand, a DPP control operand, a row mask, a bank mask, and a bound control. This operation is equivalent to a sequence ofv_mov_b32operations. - wave_
barrier Experimental - A barrier for only the threads within the current wavefront.
- wave_id
Experimental - Get the index of the current wavefront in the workgroup.
- wave_
reduce_ add Experimental - Performs an arithmetic add reduction on the values provided by each lane in the wavefront.
- wave_
reduce_ and Experimental - Performs a logical and reduction on the unsigned values provided by each lane in the wavefront.
- wave_
reduce_ max Experimental - Performs an arithmetic max reduction on the signed values provided by each lane in the wavefront.
- wave_
reduce_ min Experimental - Performs an arithmetic min reduction on the signed values provided by each lane in the wavefront.
- wave_
reduce_ or Experimental - Performs a logical or reduction on the unsigned values provided by each lane in the wavefront.
- wave_
reduce_ umax Experimental - Performs an arithmetic max reduction on the unsigned values provided by each lane in the wavefront.
- wave_
reduce_ umin Experimental - Performs an arithmetic min reduction on the unsigned values provided by each lane in the wavefront.
- wave_
reduce_ xor Experimental - Performs a logical xor reduction on the unsigned values provided by each lane in the wavefront.
- wavefrontsize
Experimental - Returns the number of threads in a wavefront.
- workgroup_
id_ x Experimental - Returns the x coordinate of the workgroup index within the dispatch.
- workgroup_
id_ y Experimental - Returns the y coordinate of the workgroup index within the dispatch.
- workgroup_
id_ z Experimental - Returns the z coordinate of the workgroup index within the dispatch.
- workitem_
id_ x Experimental - Returns the x coordinate of the workitem index within the workgroup.
- workitem_
id_ y Experimental - Returns the y coordinate of the workitem index within the workgroup.
- workitem_
id_ z Experimental - Returns the z coordinate of the workitem index within the workgroup.
- writelane_
u32 โExperimental - Return
valuefor the lane at indexlanein the wavefront. Returndefaultfor all other lanes. - writelane_
u64 โExperimental - Return
valuefor the lane at indexlanein the wavefront. Returndefaultfor all other lanes.