Available on x86 or x86-64 only.
MacrosΒ§
- cmp_asm π
- fpclass_
asm π
FunctionsΒ§
- vaddph π β
- vaddsh π β
- vcmpsh π β
- vcomish π β
- vcvtdq2ph_
128 π β - vcvtdq2ph_
256 π β - vcvtdq2ph_
512 π β - vcvtpd2ph_
128 π β - vcvtpd2ph_
256 π β - vcvtpd2ph_
512 π β - vcvtph2dq_
128 π β - vcvtph2dq_
256 π β - vcvtph2dq_
512 π β - vcvtph2pd_
128 π β - vcvtph2pd_
256 π β - vcvtph2pd_
512 π β - vcvtph2psx_
128 π β - vcvtph2psx_
256 π β - vcvtph2psx_
512 π β - vcvtph2qq_
128 π β - vcvtph2qq_
256 π β - vcvtph2qq_
512 π β - vcvtph2udq_
128 π β - vcvtph2udq_
256 π β - vcvtph2udq_
512 π β - vcvtph2uqq_
128 π β - vcvtph2uqq_
256 π β - vcvtph2uqq_
512 π β - vcvtph2uw_
128 π β - vcvtph2uw_
256 π β - vcvtph2uw_
512 π β - vcvtph2w_
128 π β - vcvtph2w_
256 π β - vcvtph2w_
512 π β - vcvtps2phx_
128 π β - vcvtps2phx_
256 π β - vcvtps2phx_
512 π β - vcvtqq2ph_
128 π β - vcvtqq2ph_
256 π β - vcvtqq2ph_
512 π β - vcvtsd2sh π β
- vcvtsh2sd π β
- vcvtsh2si32 π β
- vcvtsh2ss π β
- vcvtsh2usi32 π β
- vcvtsi2sh π β
- vcvtss2sh π β
- vcvttph2dq_
128 π β - vcvttph2dq_
256 π β - vcvttph2dq_
512 π β - vcvttph2qq_
128 π β - vcvttph2qq_
256 π β - vcvttph2qq_
512 π β - vcvttph2udq_
128 π β - vcvttph2udq_
256 π β - vcvttph2udq_
512 π β - vcvttph2uqq_
128 π β - vcvttph2uqq_
256 π β - vcvttph2uqq_
512 π β - vcvttph2uw_
128 π β - vcvttph2uw_
256 π β - vcvttph2uw_
512 π β - vcvttph2w_
128 π β - vcvttph2w_
256 π β - vcvttph2w_
512 π β - vcvttsh2si32 π β
- vcvttsh2usi32 π β
- vcvtudq2ph_
128 π β - vcvtudq2ph_
256 π β - vcvtudq2ph_
512 π β - vcvtuqq2ph_
128 π β - vcvtuqq2ph_
256 π β - vcvtuqq2ph_
512 π β - vcvtusi2sh π β
- vcvtuw2ph_
128 π β - vcvtuw2ph_
256 π β - vcvtuw2ph_
512 π β - vcvtw2ph_
128 π β - vcvtw2ph_
256 π β - vcvtw2ph_
512 π β - vdivph π β
- vdivsh π β
- vfcmaddcph_
mask3_ π β128 - vfcmaddcph_
mask3_ π β256 - vfcmaddcph_
mask3_ π β512 - vfcmaddcph_
maskz_ π β128 - vfcmaddcph_
maskz_ π β256 - vfcmaddcph_
maskz_ π β512 - vfcmaddcsh_
mask π β - vfcmaddcsh_
maskz π β - vfcmulcph_
128 π β - vfcmulcph_
256 π β - vfcmulcph_
512 π β - vfcmulcsh π β
- vfmaddcph_
mask3_ π β128 - vfmaddcph_
mask3_ π β256 - vfmaddcph_
mask3_ π β512 - vfmaddcph_
maskz_ π β128 - vfmaddcph_
maskz_ π β256 - vfmaddcph_
maskz_ π β512 - vfmaddcsh_
mask π β - vfmaddcsh_
maskz π β - vfmaddph_
512 π β - vfmaddsh π β
- vfmaddsubph_
128 π β - vfmaddsubph_
256 π β - vfmaddsubph_
512 π β - vfmulcph_
128 π β - vfmulcph_
256 π β - vfmulcph_
512 π β - vfmulcsh π β
- vfpclasssh π β
- vgetexpph_
128 π β - vgetexpph_
256 π β - vgetexpph_
512 π β - vgetexpsh π β
- vgetmantph_
128 π β - vgetmantph_
256 π β - vgetmantph_
512 π β - vgetmantsh π β
- vmaxph_
128 π β - vmaxph_
256 π β - vmaxph_
512 π β - vmaxsh π β
- vminph_
128 π β - vminph_
256 π β - vminph_
512 π β - vminsh π β
- vmulph π β
- vmulsh π β
- vrcpph_
128 π β - vrcpph_
256 π β - vrcpph_
512 π β - vrcpsh π β
- vreduceph_
128 π β - vreduceph_
256 π β - vreduceph_
512 π β - vreducesh π β
- vrndscaleph_
128 π β - vrndscaleph_
256 π β - vrndscaleph_
512 π β - vrndscalesh π β
- vrsqrtph_
128 π β - vrsqrtph_
256 π β - vrsqrtph_
512 π β - vrsqrtsh π β
- vscalefph_
128 π β - vscalefph_
256 π β - vscalefph_
512 π β - vscalefsh π β
- vsqrtph_
512 π β - vsqrtsh π β
- vsubph π β
- vsubsh π β
- _mm256_
abs_ ph Experimental avx512fp16
andavx512vl
- Finds the absolute value of each packed half-precision (16-bit) floating-point element in v2, storing the result in dst.
- _mm256_
add_ ph Experimental avx512fp16
andavx512vl
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
- _mm256_
castpd_ ph Experimental avx512fp16
- Cast vector of type
__m256d
to type__m256h
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm256_
castph128_ ph256 Experimental avx512fp16
- Cast vector of type
__m128h
to type__m256h
. The upper 8 elements of the result are undefined. In practice, the upper elements are zeroed. This intrinsic can generate thevzeroupper
instruction, but most of the time it does not generate any instructions. - _mm256_
castph256_ ph128 Experimental avx512fp16
- Cast vector of type
__m256h
to type__m128h
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm256_
castph_ pd Experimental avx512fp16
- Cast vector of type
__m256h
to type__m256d
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm256_
castph_ ps Experimental avx512fp16
- Cast vector of type
__m256h
to type__m256
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm256_
castph_ si256 Experimental avx512fp16
- Cast vector of type
__m256h
to type__m256i
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm256_
castps_ ph Experimental avx512fp16
- Cast vector of type
__m256
to type__m256h
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm256_
castsi256_ ph Experimental avx512fp16
- Cast vector of type
__m256i
to type__m256h
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm256_
cmp_ ph_ mask Experimental avx512fp16
andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm256_
cmul_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
conj_ pch Experimental avx512fp16
andavx512vl
- Compute the complex conjugates of complex numbers in a, and store the results in dst. Each complex number
is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
cvtepi16_ ph Experimental avx512fp16
andavx512vl
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm256_
cvtepi32_ ph Experimental avx512fp16
andavx512vl
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm256_
cvtepi64_ ph Experimental avx512fp16
andavx512vl
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
- _mm256_
cvtepu16_ ph Experimental avx512fp16
andavx512vl
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm256_
cvtepu32_ ph Experimental avx512fp16
andavx512vl
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm256_
cvtepu64_ ph Experimental avx512fp16
andavx512vl
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
- _mm256_
cvtpd_ ph Experimental avx512fp16
andavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
- _mm256_
cvtph_ epi16 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
- _mm256_
cvtph_ epi32 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm256_
cvtph_ epi64 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
- _mm256_
cvtph_ epu16 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
- _mm256_
cvtph_ epu32 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst.
- _mm256_
cvtph_ epu64 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
- _mm256_
cvtph_ pd Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm256_
cvtsh_ h Experimental avx512fp16
- Copy the lower half-precision (16-bit) floating-point element from
a
todst
. - _mm256_
cvttph_ epi16 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm256_
cvttph_ epi32 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm256_
cvttph_ epi64 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
- _mm256_
cvttph_ epu16 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
- _mm256_
cvttph_ epu32 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
- _mm256_
cvttph_ epu64 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
- _mm256_
cvtxph_ ps Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm256_
cvtxps_ ph Experimental avx512fp16
andavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm256_
div_ ph Experimental avx512fp16
andavx512vl
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst.
- _mm256_
fcmadd_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed
of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
fcmul_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
fmadd_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm256_
fmadd_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
- _mm256_
fmaddsub_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
- _mm256_
fmsub_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
- _mm256_
fmsubadd_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
- _mm256_
fmul_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm256_
fnmadd_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
- _mm256_
fnmsub_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
- _mm256_
fpclass_ ph_ mask Experimental avx512fp16
andavx512vl
- Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm256_
getexp_ ph Experimental avx512fp16
andavx512vl
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst.
This intrinsic essentially calculates
floor(log2(x))
for each element. - _mm256_
getmant_ ph Experimental avx512fp16
andavx512vl
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst. This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. - _mm256_
load_ βph Experimental avx512fp16
andavx512vl
- Load 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address must be aligned to 32 bytes or a general-protection exception may be generated.
- _mm256_
loadu_ βph Experimental avx512fp16
andavx512vl
- Load 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address does not need to be aligned to any particular boundary.
- _mm256_
mask3_ fcmadd_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is
copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
mask3_ fmadd_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using writemask k (the element is copied from c when the corresponding
mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm256_
mask3_ fmadd_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm256_
mask3_ fmaddsub_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm256_
mask3_ fmsub_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm256_
mask3_ fmsubadd_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm256_
mask3_ fnmadd_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm256_
mask3_ fnmsub_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm256_
mask_ add_ ph Experimental avx512fp16
andavx512vl
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ blend_ ph Experimental avx512fp16
andavx512vl
- Blend packed half-precision (16-bit) floating-point elements from a and b using control mask k, and store the results in dst.
- _mm256_
mask_ cmp_ ph_ mask Experimental avx512fp16
andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
mask_ cmul_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
mask_ conj_ pch Experimental avx512fp16
andavx512vl
- Compute the complex conjugates of complex numbers in a, and store the results in dst using writemask k
(the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two
adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
mask_ cvtepi16_ ph Experimental avx512fp16
andavx512vl
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_
mask_ cvtepi32_ ph Experimental avx512fp16
andavx512vl
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_
mask_ cvtepi64_ ph Experimental avx512fp16
andavx512vl
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm256_
mask_ cvtepu16_ ph Experimental avx512fp16
andavx512vl
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_
mask_ cvtepu32_ ph Experimental avx512fp16
andavx512vl
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_
mask_ cvtepu64_ ph Experimental avx512fp16
andavx512vl
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm256_
mask_ cvtpd_ ph Experimental avx512fp16
andavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm256_
mask_ cvtph_ epi16 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ cvtph_ epi32 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ cvtph_ epi64 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ cvtph_ epu16 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ cvtph_ epu32 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ cvtph_ epu64 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ cvtph_ pd Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_
mask_ cvttph_ epi16 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ cvttph_ epi32 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ cvttph_ epi64 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ cvttph_ epu16 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ cvttph_ epu32 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ cvttph_ epu64 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ cvtxph_ ps Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_
mask_ cvtxps_ ph Experimental avx512fp16
andavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_
mask_ div_ ph Experimental avx512fp16
andavx512vl
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ fcmadd_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is
copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
mask_ fcmul_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
mask_ fmadd_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using writemask k (the element is copied from a when the corresponding mask
bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point
elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm256_
mask_ fmadd_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm256_
mask_ fmaddsub_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm256_
mask_ fmsub_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm256_
mask_ fmsubadd_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm256_
mask_ fmul_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm256_
mask_ fnmadd_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm256_
mask_ fnmsub_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm256_
mask_ fpclass_ ph_ mask Experimental avx512fp16
andavx512vl
- Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm256_
mask_ getexp_ ph Experimental avx512fp16
andavx512vl
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k
(elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates
floor(log2(x))
for each element. - _mm256_
mask_ getmant_ ph Experimental avx512fp16
andavx512vl
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. - _mm256_
mask_ max_ ph Experimental avx512fp16
andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm256_
mask_ min_ ph Experimental avx512fp16
andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm256_
mask_ mul_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm256_
mask_ mul_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ rcp_ ph Experimental avx512fp16
andavx512vl
- Compute the approximate reciprocal of packed 16-bit floating-point elements in
a
and stores the results indst
using writemaskk
(elements are copied fromsrc
when the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12
. - _mm256_
mask_ reduce_ ph Experimental avx512fp16
andavx512vl
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ roundscale_ ph Experimental avx512fp16
andavx512vl
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ rsqrt_ ph Experimental avx512fp16
andavx512vl
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst using writemask k (elements are copied from src when
the corresponding mask bit is not set).
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm256_
mask_ scalef_ ph Experimental avx512fp16
andavx512vl
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ sqrt_ ph Experimental avx512fp16
andavx512vl
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
mask_ sub_ ph Experimental avx512fp16
andavx512vl
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_
maskz_ add_ ph Experimental avx512fp16
andavx512vl
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ cmul_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
maskz_ conj_ pch Experimental avx512fp16
andavx512vl
- Compute the complex conjugates of complex numbers in a, and store the results in dst using zeromask k
(the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
maskz_ cvtepi16_ ph Experimental avx512fp16
andavx512vl
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ cvtepi32_ ph Experimental avx512fp16
andavx512vl
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ cvtepi64_ ph Experimental avx512fp16
andavx512vl
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm256_
maskz_ cvtepu16_ ph Experimental avx512fp16
andavx512vl
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ cvtepu32_ ph Experimental avx512fp16
andavx512vl
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ cvtepu64_ ph Experimental avx512fp16
andavx512vl
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm256_
maskz_ cvtpd_ ph Experimental avx512fp16
andavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm256_
maskz_ cvtph_ epi16 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ cvtph_ epi32 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ cvtph_ epi64 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ cvtph_ epu16 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ cvtph_ epu32 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ cvtph_ epu64 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ cvtph_ pd Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ cvttph_ epi16 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ cvttph_ epi32 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ cvttph_ epi64 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ cvttph_ epu16 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ cvttph_ epu32 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ cvttph_ epu64 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ cvtxph_ ps Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ cvtxps_ ph Experimental avx512fp16
andavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ div_ ph Experimental avx512fp16
andavx512vl
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ fcmadd_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is
zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
maskz_ fcmul_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm256_
maskz_ fmadd_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask
bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point
elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm256_
maskz_ fmadd_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ fmaddsub_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ fmsub_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ fmsubadd_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ fmul_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm256_
maskz_ fnmadd_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ fnmsub_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ getexp_ ph Experimental avx512fp16
andavx512vl
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask
k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates
floor(log2(x))
for each element. - _mm256_
maskz_ getmant_ ph Experimental avx512fp16
andavx512vl
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. - _mm256_
maskz_ max_ ph Experimental avx512fp16
andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm256_
maskz_ min_ ph Experimental avx512fp16
andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm256_
maskz_ mul_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm256_
maskz_ mul_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ rcp_ ph Experimental avx512fp16
andavx512vl
- Compute the approximate reciprocal of packed 16-bit floating-point elements in
a
and stores the results indst
using zeromaskk
(elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12
. - _mm256_
maskz_ reduce_ ph Experimental avx512fp16
andavx512vl
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ roundscale_ ph Experimental avx512fp16
andavx512vl
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ rsqrt_ ph Experimental avx512fp16
andavx512vl
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst using zeromask k (elements are zeroed out when the
corresponding mask bit is not set).
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm256_
maskz_ scalef_ ph Experimental avx512fp16
andavx512vl
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ sqrt_ ph Experimental avx512fp16
andavx512vl
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
maskz_ sub_ ph Experimental avx512fp16
andavx512vl
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_
max_ ph Experimental avx512fp16
andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm256_
min_ ph Experimental avx512fp16
andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm256_
mul_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm256_
mul_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
- _mm256_
permutex2var_ ph Experimental avx512fp16
andavx512vl
- Shuffle half-precision (16-bit) floating-point elements in a and b using the corresponding selector and index in idx, and store the results in dst.
- _mm256_
permutexvar_ ph Experimental avx512fp16
andavx512vl
- Shuffle half-precision (16-bit) floating-point elements in a using the corresponding index in idx, and store the results in dst.
- _mm256_
rcp_ ph Experimental avx512fp16
andavx512vl
- Compute the approximate reciprocal of packed 16-bit floating-point elements in
a
and stores the results indst
. The maximum relative error for this approximation is less than1.5*2^-12
. - _mm256_
reduce_ add_ ph Experimental avx512fp16
andavx512vl
- Reduce the packed half-precision (16-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
- _mm256_
reduce_ max_ ph Experimental avx512fp16
andavx512vl
- Reduce the packed half-precision (16-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
- _mm256_
reduce_ min_ ph Experimental avx512fp16
andavx512vl
- Reduce the packed half-precision (16-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
- _mm256_
reduce_ mul_ ph Experimental avx512fp16
andavx512vl
- Reduce the packed half-precision (16-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
- _mm256_
reduce_ ph Experimental avx512fp16
andavx512vl
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
- _mm256_
roundscale_ ph Experimental avx512fp16
andavx512vl
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
- _mm256_
rsqrt_ ph Experimental avx512fp16
andavx512vl
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst.
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm256_
scalef_ ph Experimental avx512fp16
andavx512vl
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm256_
set1_ ph Experimental avx512fp16
- Broadcast the half-precision (16-bit) floating-point value a to all elements of dst.
- _mm256_
set_ ph Experimental avx512fp16
- Set packed half-precision (16-bit) floating-point elements in dst with the supplied values.
- _mm256_
setr_ ph Experimental avx512fp16
- Set packed half-precision (16-bit) floating-point elements in dst with the supplied values in reverse order.
- _mm256_
setzero_ ph Experimental avx512fp16
andavx512vl
- Return vector of type __m256h with all elements set to zero.
- _mm256_
sqrt_ ph Experimental avx512fp16
andavx512vl
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst.
- _mm256_
store_ βph Experimental avx512fp16
andavx512vl
- Store 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from a into memory. The address must be aligned to 32 bytes or a general-protection exception may be generated.
- _mm256_
storeu_ βph Experimental avx512fp16
andavx512vl
- Store 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from a into memory. The address does not need to be aligned to any particular boundary.
- _mm256_
sub_ ph Experimental avx512fp16
andavx512vl
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst.
- _mm256_
undefined_ ph Experimental avx512fp16
andavx512vl
- Return vector of type
__m256h
with indetermination elements. Despite using the word βundefinedβ (following Intelβs naming scheme), this non-deterministically picks some valid value and is not equivalent tomem::MaybeUninit
. In practice, this is typically equivalent tomem::zeroed
. - _mm256_
zextph128_ ph256 Experimental avx512fp16
- Cast vector of type
__m256h
to type__m128h
. The upper 8 elements of the result are zeroed. This intrinsic can generate thevzeroupper
instruction, but most of the time it does not generate any instructions. - _mm512_
abs_ ph Experimental avx512fp16
- Finds the absolute value of each packed half-precision (16-bit) floating-point element in v2, storing the result in dst.
- _mm512_
add_ ph Experimental avx512fp16
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
- _mm512_
add_ round_ ph Experimental avx512fp16
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
castpd_ ph Experimental avx512fp16
- Cast vector of type
__m512d
to type__m512h
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm512_
castph128_ ph512 Experimental avx512fp16
- Cast vector of type
__m128h
to type__m512h
. The upper 24 elements of the result are undefined. In practice, the upper elements are zeroed. This intrinsic can generate thevzeroupper
instruction, but most of the time it does not generate any instructions. - _mm512_
castph256_ ph512 Experimental avx512fp16
- Cast vector of type
__m256h
to type__m512h
. The upper 16 elements of the result are undefined. In practice, the upper elements are zeroed. This intrinsic can generate thevzeroupper
instruction, but most of the time it does not generate any instructions. - _mm512_
castph512_ ph128 Experimental avx512fp16
- Cast vector of type
__m512h
to type__m128h
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm512_
castph512_ ph256 Experimental avx512fp16
- Cast vector of type
__m512h
to type__m256h
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm512_
castph_ pd Experimental avx512fp16
- Cast vector of type
__m512h
to type__m512d
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm512_
castph_ ps Experimental avx512fp16
- Cast vector of type
__m512h
to type__m512
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm512_
castph_ si512 Experimental avx512fp16
- Cast vector of type
__m512h
to type__m512i
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm512_
castps_ ph Experimental avx512fp16
- Cast vector of type
__m512
to type__m512h
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm512_
castsi512_ ph Experimental avx512fp16
- Cast vector of type
__m512i
to type__m512h
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm512_
cmp_ ph_ mask Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_
cmp_ round_ ph_ mask Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_
cmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
cmul_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
conj_ pch Experimental avx512fp16
- Compute the complex conjugates of complex numbers in a, and store the results in dst. Each complex number
is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
cvt_ roundepi16_ ph Experimental avx512fp16
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvt_ roundepi32_ ph Experimental avx512fp16
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvt_ roundepi64_ ph Experimental avx512fp16
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvt_ roundepu16_ ph Experimental avx512fp16
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvt_ roundepu32_ ph Experimental avx512fp16
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvt_ roundepu64_ ph Experimental avx512fp16
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvt_ roundpd_ ph Experimental avx512fp16
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvt_ roundph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
- _mm512_
cvt_ roundph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm512_
cvt_ roundph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
- _mm512_
cvt_ roundph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
- _mm512_
cvt_ roundph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst.
- _mm512_
cvt_ roundph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
- _mm512_
cvt_ roundph_ pd Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm512_
cvtepi16_ ph Experimental avx512fp16
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvtepi32_ ph Experimental avx512fp16
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvtepi64_ ph Experimental avx512fp16
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvtepu16_ ph Experimental avx512fp16
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvtepu32_ ph Experimental avx512fp16
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvtepu64_ ph Experimental avx512fp16
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvtpd_ ph Experimental avx512fp16
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvtph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
- _mm512_
cvtph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm512_
cvtph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
- _mm512_
cvtph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
- _mm512_
cvtph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst.
- _mm512_
cvtph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
- _mm512_
cvtph_ pd Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm512_
cvtsh_ h Experimental avx512fp16
- Copy the lower half-precision (16-bit) floating-point element from
a
todst
. - _mm512_
cvtt_ roundph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm512_
cvtt_ roundph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm512_
cvtt_ roundph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
- _mm512_
cvtt_ roundph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
- _mm512_
cvtt_ roundph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
- _mm512_
cvtt_ roundph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
- _mm512_
cvttph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm512_
cvttph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm512_
cvttph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
- _mm512_
cvttph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
- _mm512_
cvttph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
- _mm512_
cvttph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
- _mm512_
cvtx_ roundph_ ps Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm512_
cvtx_ roundps_ ph Experimental avx512fp16
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
cvtxph_ ps Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm512_
cvtxps_ ph Experimental avx512fp16
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_
div_ ph Experimental avx512fp16
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst.
- _mm512_
div_ round_ ph Experimental avx512fp16
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
fcmadd_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed
of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
fcmadd_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed
of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
fcmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
fcmul_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, - _mm512_
fmadd_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
fmadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
- _mm512_
fmadd_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
fmadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
- _mm512_
fmaddsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
- _mm512_
fmaddsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
- _mm512_
fmsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
- _mm512_
fmsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
- _mm512_
fmsubadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
- _mm512_
fmsubadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
- _mm512_
fmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed
of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
fmul_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed
of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. Rounding is done according to the rounding parameter, which can be one of: - _mm512_
fnmadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
- _mm512_
fnmadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
- _mm512_
fnmsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
- _mm512_
fnmsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
- _mm512_
fpclass_ ph_ mask Experimental avx512fp16
- Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm512_
getexp_ ph Experimental avx512fp16
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst.
This intrinsic essentially calculates
floor(log2(x))
for each element. - _mm512_
getexp_ round_ ph Experimental avx512fp16
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst.
This intrinsic essentially calculates
floor(log2(x))
for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter - _mm512_
getmant_ ph Experimental avx512fp16
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst. This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. - _mm512_
getmant_ round_ ph Experimental avx512fp16
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst. This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter - _mm512_
load_ βph Experimental avx512fp16
- Load 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address must be aligned to 64 bytes or a general-protection exception may be generated.
- _mm512_
loadu_ βph Experimental avx512fp16
- Load 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address does not need to be aligned to any particular boundary.
- _mm512_
mask3_ fcmadd_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is
copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
mask3_ fcmadd_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c using writemask k (the element is copied from c when the corresponding
mask bit is not set), and store the results in dst. Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
mask3_ fmadd_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using writemask k (the element is copied from c when the corresponding
mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
mask3_ fmadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_
mask3_ fmadd_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using writemask k (the element is copied from c when the corresponding
mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
mask3_ fmadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_
mask3_ fmaddsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_
mask3_ fmaddsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_
mask3_ fmsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_
mask3_ fmsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_
mask3_ fmsubadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_
mask3_ fmsubadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_
mask3_ fnmadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_
mask3_ fnmadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_
mask3_ fnmsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_
mask3_ fnmsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_
mask_ add_ ph Experimental avx512fp16
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ add_ round_ ph Experimental avx512fp16
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
mask_ blend_ ph Experimental avx512fp16
- Blend packed half-precision (16-bit) floating-point elements from a and b using control mask k, and store the results in dst.
- _mm512_
mask_ cmp_ ph_ mask Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
mask_ cmp_ round_ ph_ mask Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
mask_ cmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
mask_ cmul_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
mask_ conj_ pch Experimental avx512fp16
- Compute the complex conjugates of complex numbers in a, and store the results in dst using writemask k
(the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two
adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
mask_ cvt_ roundepi16_ ph Experimental avx512fp16
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ cvt_ roundepi32_ ph Experimental avx512fp16
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ cvt_ roundepi64_ ph Experimental avx512fp16
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ cvt_ roundepu16_ ph Experimental avx512fp16
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ cvt_ roundepu32_ ph Experimental avx512fp16
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ cvt_ roundepu64_ ph Experimental avx512fp16
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ cvt_ roundpd_ ph Experimental avx512fp16
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ cvt_ roundph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ cvt_ roundph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ cvt_ roundph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ cvt_ roundph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ cvt_ roundph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ cvt_ roundph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ cvt_ roundph_ pd Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ cvtepi16_ ph Experimental avx512fp16
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ cvtepi32_ ph Experimental avx512fp16
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ cvtepi64_ ph Experimental avx512fp16
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ cvtepu16_ ph Experimental avx512fp16
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ cvtepu32_ ph Experimental avx512fp16
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ cvtepu64_ ph Experimental avx512fp16
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ cvtpd_ ph Experimental avx512fp16
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ cvtph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ cvtph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ cvtph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ cvtph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ cvtph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ cvtph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ cvtph_ pd Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ cvtt_ roundph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ cvtt_ roundph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ cvtt_ roundph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ cvtt_ roundph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ cvtt_ roundph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ cvtt_ roundph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ cvttph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ cvttph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ cvttph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ cvttph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ cvttph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ cvttph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ cvtx_ roundph_ ps Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ cvtx_ roundps_ ph Experimental avx512fp16
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ cvtxph_ ps Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ cvtxps_ ph Experimental avx512fp16
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_
mask_ div_ ph Experimental avx512fp16
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ div_ round_ ph Experimental avx512fp16
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
mask_ fcmadd_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is
copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
mask_ fcmadd_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is
copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
mask_ fcmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
mask_ fcmul_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
mask_ fmadd_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using writemask k (the element is copied from a when the corresponding mask
bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point
elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
mask_ fmadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_
mask_ fmadd_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using writemask k (the element is copied from a when the corresponding mask
bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point
elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
mask_ fmadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_
mask_ fmaddsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_
mask_ fmaddsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_
mask_ fmsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_
mask_ fmsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_
mask_ fmsubadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_
mask_ fmsubadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_
mask_ fmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
mask_ fmul_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. Rounding is done according to the rounding parameter, which can be one of: - _mm512_
mask_ fnmadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_
mask_ fnmadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_
mask_ fnmsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_
mask_ fnmsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_
mask_ fpclass_ ph_ mask Experimental avx512fp16
- Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm512_
mask_ getexp_ ph Experimental avx512fp16
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k
(elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates
floor(log2(x))
for each element. - _mm512_
mask_ getexp_ round_ ph Experimental avx512fp16
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k
(elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates
floor(log2(x))
for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter - _mm512_
mask_ getmant_ ph Experimental avx512fp16
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. - _mm512_
mask_ getmant_ round_ ph Experimental avx512fp16
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter - _mm512_
mask_ max_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm512_
mask_ max_ round_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm512_
mask_ min_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm512_
mask_ min_ round_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm512_
mask_ mul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
mask_ mul_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ mul_ round_ pch Experimental avx512fp16
- Multiply the packed complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
mask_ mul_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
mask_ rcp_ ph Experimental avx512fp16
- Compute the approximate reciprocal of packed 16-bit floating-point elements in
a
and stores the results indst
using writemaskk
(elements are copied fromsrc
when the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12
. - _mm512_
mask_ reduce_ ph Experimental avx512fp16
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ reduce_ round_ ph Experimental avx512fp16
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ roundscale_ ph Experimental avx512fp16
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ roundscale_ round_ ph Experimental avx512fp16
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm512_
mask_ rsqrt_ ph Experimental avx512fp16
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst using writemask k (elements are copied from src when
the corresponding mask bit is not set).
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm512_
mask_ scalef_ ph Experimental avx512fp16
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ scalef_ round_ ph Experimental avx512fp16
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ sqrt_ ph Experimental avx512fp16
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ sqrt_ round_ ph Experimental avx512fp16
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
mask_ sub_ ph Experimental avx512fp16
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_
mask_ sub_ round_ ph Experimental avx512fp16
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
maskz_ add_ ph Experimental avx512fp16
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ add_ round_ ph Experimental avx512fp16
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
maskz_ cmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
maskz_ cmul_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
maskz_ conj_ pch Experimental avx512fp16
- Compute the complex conjugates of complex numbers in a, and store the results in dst using zeromask k
(the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
maskz_ cvt_ roundepi16_ ph Experimental avx512fp16
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvt_ roundepi32_ ph Experimental avx512fp16
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvt_ roundepi64_ ph Experimental avx512fp16
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvt_ roundepu16_ ph Experimental avx512fp16
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvt_ roundepu32_ ph Experimental avx512fp16
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvt_ roundepu64_ ph Experimental avx512fp16
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvt_ roundpd_ ph Experimental avx512fp16
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvt_ roundph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvt_ roundph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvt_ roundph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvt_ roundph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvt_ roundph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvt_ roundph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvt_ roundph_ pd Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvtepi16_ ph Experimental avx512fp16
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvtepi32_ ph Experimental avx512fp16
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvtepi64_ ph Experimental avx512fp16
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvtepu16_ ph Experimental avx512fp16
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvtepu32_ ph Experimental avx512fp16
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvtepu64_ ph Experimental avx512fp16
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvtpd_ ph Experimental avx512fp16
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvtph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvtph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvtph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvtph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvtph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvtph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvtph_ pd Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvtt_ roundph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvtt_ roundph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvtt_ roundph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvtt_ roundph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvtt_ roundph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvtt_ roundph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvttph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvttph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvttph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvttph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvttph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvttph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvtx_ roundph_ ps Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvtx_ roundps_ ph Experimental avx512fp16
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvtxph_ ps Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ cvtxps_ ph Experimental avx512fp16
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ div_ ph Experimental avx512fp16
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ div_ round_ ph Experimental avx512fp16
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
maskz_ fcmadd_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is
zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
maskz_ fcmadd_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c using zeromask k (the element is zeroed out when the corresponding
mask bit is not set), and store the results in dst. Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
maskz_ fcmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
maskz_ fcmul_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm512_
maskz_ fmadd_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask
bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point
elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
maskz_ fmadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ fmadd_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask
bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point
elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
maskz_ fmadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ fmaddsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ fmaddsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ fmsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ fmsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ fmsubadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ fmsubadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ fmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
maskz_ fmul_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. Rounding is done according to the rounding parameter, which can be one of: - _mm512_
maskz_ fnmadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ fnmadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ fnmsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ fnmsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ getexp_ ph Experimental avx512fp16
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask
k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates
floor(log2(x))
for each element. - _mm512_
maskz_ getexp_ round_ ph Experimental avx512fp16
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask
k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates
floor(log2(x))
for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter - _mm512_
maskz_ getmant_ ph Experimental avx512fp16
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. - _mm512_
maskz_ getmant_ round_ ph Experimental avx512fp16
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter - _mm512_
maskz_ max_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm512_
maskz_ max_ round_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm512_
maskz_ min_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm512_
maskz_ min_ round_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm512_
maskz_ mul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
maskz_ mul_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ mul_ round_ pch Experimental avx512fp16
- Multiply the packed complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
maskz_ mul_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
maskz_ rcp_ ph Experimental avx512fp16
- Compute the approximate reciprocal of packed 16-bit floating-point elements in
a
and stores the results indst
using zeromaskk
(elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12
. - _mm512_
maskz_ reduce_ ph Experimental avx512fp16
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ reduce_ round_ ph Experimental avx512fp16
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ roundscale_ ph Experimental avx512fp16
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ roundscale_ round_ ph Experimental avx512fp16
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm512_
maskz_ rsqrt_ ph Experimental avx512fp16
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst using zeromask k (elements are zeroed out when the
corresponding mask bit is not set).
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm512_
maskz_ scalef_ ph Experimental avx512fp16
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ scalef_ round_ ph Experimental avx512fp16
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ sqrt_ ph Experimental avx512fp16
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ sqrt_ round_ ph Experimental avx512fp16
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
maskz_ sub_ ph Experimental avx512fp16
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_
maskz_ sub_ round_ ph Experimental avx512fp16
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
max_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm512_
max_ round_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm512_
min_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm512_
min_ round_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm512_
mul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
mul_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
- _mm512_
mul_ round_ pch Experimental avx512fp16
- Multiply the packed complex numbers in a and b, and store the results in dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm512_
mul_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
permutex2var_ ph Experimental avx512fp16
- Shuffle half-precision (16-bit) floating-point elements in a and b using the corresponding selector and index in idx, and store the results in dst.
- _mm512_
permutexvar_ ph Experimental avx512fp16
- Shuffle half-precision (16-bit) floating-point elements in a using the corresponding index in idx, and store the results in dst.
- _mm512_
rcp_ ph Experimental avx512fp16
- Compute the approximate reciprocal of packed 16-bit floating-point elements in
a
and stores the results indst
. The maximum relative error for this approximation is less than1.5*2^-12
. - _mm512_
reduce_ add_ ph Experimental avx512fp16
- Reduce the packed half-precision (16-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
- _mm512_
reduce_ max_ ph Experimental avx512fp16
- Reduce the packed half-precision (16-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
- _mm512_
reduce_ min_ ph Experimental avx512fp16
- Reduce the packed half-precision (16-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
- _mm512_
reduce_ βmul_ ph Experimental avx512fp16
- Reduce the packed half-precision (16-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
- _mm512_
reduce_ ph Experimental avx512fp16
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
- _mm512_
reduce_ round_ ph Experimental avx512fp16
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
- _mm512_
roundscale_ ph Experimental avx512fp16
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
- _mm512_
roundscale_ round_ ph Experimental avx512fp16
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm512_
rsqrt_ ph Experimental avx512fp16
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst.
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm512_
scalef_ ph Experimental avx512fp16
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm512_
scalef_ round_ ph Experimental avx512fp16
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm512_
set1_ ph Experimental avx512fp16
- Broadcast the half-precision (16-bit) floating-point value a to all elements of dst.
- _mm512_
set_ ph Experimental avx512fp16
- Set packed half-precision (16-bit) floating-point elements in dst with the supplied values.
- _mm512_
setr_ ph Experimental avx512fp16
- Set packed half-precision (16-bit) floating-point elements in dst with the supplied values in reverse order.
- _mm512_
setzero_ ph Experimental avx512fp16
- Return vector of type __m512h with all elements set to zero.
- _mm512_
sqrt_ ph Experimental avx512fp16
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst.
- _mm512_
sqrt_ round_ ph Experimental avx512fp16
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
store_ βph Experimental avx512fp16
- Store 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from a into memory. The address must be aligned to 64 bytes or a general-protection exception may be generated.
- _mm512_
storeu_ βph Experimental avx512fp16
- Store 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from a into memory. The address does not need to be aligned to any particular boundary.
- _mm512_
sub_ ph Experimental avx512fp16
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst.
- _mm512_
sub_ round_ ph Experimental avx512fp16
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_
undefined_ ph Experimental avx512fp16
- Return vector of type
__m512h
with indetermination elements. Despite using the word βundefinedβ (following Intelβs naming scheme), this non-deterministically picks some valid value and is not equivalent tomem::MaybeUninit
. In practice, this is typically equivalent tomem::zeroed
. - _mm512_
zextph128_ ph512 Experimental avx512fp16
- Cast vector of type
__m128h
to type__m512h
. The upper 24 elements of the result are zeroed. This intrinsic can generate thevzeroupper
instruction, but most of the time it does not generate any instructions. - _mm512_
zextph256_ ph512 Experimental avx512fp16
- Cast vector of type
__m256h
to type__m512h
. The upper 16 elements of the result are zeroed. This intrinsic can generate thevzeroupper
instruction, but most of the time it does not generate any instructions. - _mm_
abs_ ph Experimental avx512fp16
andavx512vl
- Finds the absolute value of each packed half-precision (16-bit) floating-point element in v2, storing the results in dst.
- _mm_
add_ ph Experimental avx512fp16
andavx512vl
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
- _mm_
add_ round_ sh Experimental avx512fp16
- Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_
add_ sh Experimental avx512fp16
- Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
castpd_ ph Experimental avx512fp16
- Cast vector of type
__m128d
to type__m128h
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm_
castph_ pd Experimental avx512fp16
- Cast vector of type
__m128h
to type__m128d
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm_
castph_ ps Experimental avx512fp16
- Cast vector of type
__m128h
to type__m128
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm_
castph_ si128 Experimental avx512fp16
- Cast vector of type
__m128h
to type__m128i
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm_
castps_ ph Experimental avx512fp16
- Cast vector of type
__m128
to type__m128h
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm_
castsi128_ ph Experimental avx512fp16
- Cast vector of type
__m128i
to type__m128h
. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. - _mm_
cmp_ ph_ mask Experimental avx512fp16
andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm_
cmp_ round_ sh_ mask Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_
cmp_ sh_ mask Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k.
- _mm_
cmul_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
cmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, - _mm_
cmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, - _mm_
comi_ round_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and return the boolean result (0 or 1). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_
comi_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and return the boolean result (0 or 1).
- _mm_
comieq_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for equality, and return the boolean result (0 or 1).
- _mm_
comige_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than-or-equal, and return the boolean result (0 or 1).
- _mm_
comigt_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than, and return the boolean result (0 or 1).
- _mm_
comile_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than-or-equal, and return the boolean result (0 or 1).
- _mm_
comilt_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than, and return the boolean result (0 or 1).
- _mm_
comineq_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for not-equal, and return the boolean result (0 or 1).
- _mm_
conj_ pch Experimental avx512fp16
andavx512vl
- Compute the complex conjugates of complex numbers in a, and store the results in dst. Each complex
number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines
the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
cvt_ roundi32_ sh Experimental avx512fp16
- Convert the signed 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
cvt_ roundsd_ sh Experimental avx512fp16
- Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
cvt_ roundsh_ i32 Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
- _mm_
cvt_ roundsh_ sd Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
- _mm_
cvt_ roundsh_ ss Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_
cvt_ roundsh_ u32 Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer, and store the result in dst.
- _mm_
cvt_ roundss_ sh Experimental avx512fp16
- Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
cvt_ roundu32_ sh Experimental avx512fp16
- Convert the unsigned 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
cvtepi16_ ph Experimental avx512fp16
andavx512vl
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm_
cvtepi32_ ph Experimental avx512fp16
andavx512vl
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
- _mm_
cvtepi64_ ph Experimental avx512fp16
andavx512vl
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 96 bits of dst are zeroed out.
- _mm_
cvtepu16_ ph Experimental avx512fp16
andavx512vl
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm_
cvtepu32_ ph Experimental avx512fp16
andavx512vl
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
- _mm_
cvtepu64_ ph Experimental avx512fp16
andavx512vl
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 96 bits of dst are zeroed out.
- _mm_
cvti32_ sh Experimental avx512fp16
- Convert the signed 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
cvtpd_ ph Experimental avx512fp16
andavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 96 bits of dst are zeroed out.
- _mm_
cvtph_ epi16 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
- _mm_
cvtph_ epi32 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm_
cvtph_ epi64 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
- _mm_
cvtph_ epu16 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
- _mm_
cvtph_ epu32 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm_
cvtph_ epu64 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
- _mm_
cvtph_ pd Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm_
cvtsd_ sh Experimental avx512fp16
- Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
cvtsh_ h Experimental avx512fp16
- Copy the lower half-precision (16-bit) floating-point element from
a
todst
. - _mm_
cvtsh_ i32 Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
- _mm_
cvtsh_ sd Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
- _mm_
cvtsh_ ss Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_
cvtsh_ u32 Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer, and store the result in dst.
- _mm_
cvtsi16_ si128 Experimental avx512fp16
- Copy 16-bit integer a to the lower elements of dst, and zero the upper elements of dst.
- _mm_
cvtsi128_ si16 Experimental avx512fp16
- Copy the lower 16-bit integer in a to dst.
- _mm_
cvtss_ sh Experimental avx512fp16
- Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
cvtt_ roundsh_ i32 Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
- _mm_
cvtt_ roundsh_ u32 Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer with truncation, and store the result in dst.
- _mm_
cvttph_ epi16 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm_
cvttph_ epi32 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm_
cvttph_ epi64 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
- _mm_
cvttph_ epu16 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
- _mm_
cvttph_ epu32 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
- _mm_
cvttph_ epu64 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
- _mm_
cvttsh_ i32 Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
- _mm_
cvttsh_ u32 Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer with truncation, and store the result in dst.
- _mm_
cvtu32_ sh Experimental avx512fp16
- Convert the unsigned 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
cvtxph_ ps Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm_
cvtxps_ ph Experimental avx512fp16
andavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm_
div_ ph Experimental avx512fp16
andavx512vl
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst.
- _mm_
div_ round_ sh Experimental avx512fp16
- Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_
div_ sh Experimental avx512fp16
- Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
fcmadd_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed
of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
fcmadd_ round_ sch Experimental avx512fp16
- Multiply the lower complex number in a by the complex conjugate of the lower complex number in b,
accumulate to the lower complex number in c, and store the result in the lower elements of dst,
and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
fcmadd_ sch Experimental avx512fp16
- Multiply the lower complex number in a by the complex conjugate of the lower complex number in b,
accumulate to the lower complex number in c, and store the result in the lower elements of dst,
and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
fcmul_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
fcmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, - _mm_
fcmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
fmadd_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
fmadd_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
- _mm_
fmadd_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and
store the result in the lower elements of dst. Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
fmadd_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
fmadd_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and
store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the
upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
fmadd_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
fmaddsub_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
- _mm_
fmsub_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
- _mm_
fmsub_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
fmsub_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
fmsubadd_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
- _mm_
fmul_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
fmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the results in dst. Each complex number is composed
of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
fmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the results in dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
fnmadd_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
- _mm_
fnmadd_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
fnmadd_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
fnmsub_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
- _mm_
fnmsub_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
fnmsub_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
fpclass_ ph_ mask Experimental avx512fp16
andavx512vl
- Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm_
fpclass_ sh_ mask Experimental avx512fp16
- Test the lower half-precision (16-bit) floating-point element in a for special categories specified by imm8, and store the result in mask vector k. imm can be a combination of:
- _mm_
getexp_ ph Experimental avx512fp16
andavx512vl
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst.
This intrinsic essentially calculates
floor(log2(x))
for each element. - _mm_
getexp_ round_ sh Experimental avx512fp16
- Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision
(16-bit) floating-point number representing the integer exponent, store the result in the lower element
of dst, and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially
calculates
floor(log2(x))
for the lower element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter - _mm_
getexp_ sh Experimental avx512fp16
- Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision
(16-bit) floating-point number representing the integer exponent, store the result in the lower element
of dst, and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially
calculates
floor(log2(x))
for the lower element. - _mm_
getmant_ ph Experimental avx512fp16
andavx512vl
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst. This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. - _mm_
getmant_ round_ sh Experimental avx512fp16
- Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store
the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper
elements of dst. This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter - _mm_
getmant_ sh Experimental avx512fp16
- Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store
the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper
elements of dst. This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. - _mm_
load_ βph Experimental avx512fp16
andavx512vl
- Load 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address must be aligned to 16 bytes or a general-protection exception may be generated.
- _mm_
load_ βsh Experimental avx512fp16
- Load a half-precision (16-bit) floating-point element from memory into the lower element of a new vector, and zero the upper elements
- _mm_
loadu_ βph Experimental avx512fp16
andavx512vl
- Load 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address does not need to be aligned to any particular boundary.
- _mm_
mask3_ fcmadd_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is
copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
mask3_ fcmadd_ round_ sch Experimental avx512fp16
- Multiply the lower complex number in a by the complex conjugate of the lower complex number in b,
accumulate to the lower complex number in c, and store the result in the lower elements of dst using
writemask k (the element is copied from c when the corresponding mask bit is not set), and copy the upper
6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
mask3_ fcmadd_ sch Experimental avx512fp16
- Multiply the lower complex number in a by the complex conjugate of the lower complex number in b,
accumulate to the lower complex number in c, and store the result in the lower elements of dst using
writemask k (the element is copied from c when the corresponding mask bit is not set), and copy the upper
6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
mask3_ fmadd_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using writemask k (the element is copied from c when the corresponding
mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mask3_ fmadd_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm_
mask3_ fmadd_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and
store the result in the lower elements of dst using writemask k (elements are copied from c when
mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst.
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements,
which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mask3_ fmadd_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_
mask3_ fmadd_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and
store the result in the lower elements of dst using writemask k (elements are copied from c when
mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst.
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements,
which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mask3_ fmadd_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_
mask3_ fmaddsub_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm_
mask3_ fmsub_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm_
mask3_ fmsub_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_
mask3_ fmsub_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_
mask3_ fmsubadd_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm_
mask3_ fnmadd_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm_
mask3_ fnmadd_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_
mask3_ fnmadd_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_
mask3_ fnmsub_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm_
mask3_ fnmsub_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_
mask3_ fnmsub_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_
mask_ add_ ph Experimental avx512fp16
andavx512vl
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ add_ round_ sh Experimental avx512fp16
- Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_
mask_ add_ sh Experimental avx512fp16
- Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
- _mm_
mask_ blend_ ph Experimental avx512fp16
andavx512vl
- Blend packed half-precision (16-bit) floating-point elements from a and b using control mask k, and store the results in dst.
- _mm_
mask_ cmp_ ph_ mask Experimental avx512fp16
andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
mask_ cmp_ round_ sh_ mask Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k using zeromask k1. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_
mask_ cmp_ sh_ mask Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k using zeromask k1.
- _mm_
mask_ cmul_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
mask_ cmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
mask_ cmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, - _mm_
mask_ conj_ pch Experimental avx512fp16
andavx512vl
- Compute the complex conjugates of complex numbers in a, and store the results in dst using writemask k
(the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two
adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
mask_ cvt_ roundsd_ sh Experimental avx512fp16
- Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ cvt_ roundsh_ sd Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_
mask_ cvt_ roundsh_ ss Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_
mask_ cvt_ roundss_ sh Experimental avx512fp16
- Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ cvtepi16_ ph Experimental avx512fp16
andavx512vl
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm_
mask_ cvtepi32_ ph Experimental avx512fp16
andavx512vl
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm_
mask_ cvtepi64_ ph Experimental avx512fp16
andavx512vl
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
- _mm_
mask_ cvtepu16_ ph Experimental avx512fp16
andavx512vl
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm_
mask_ cvtepu32_ ph Experimental avx512fp16
andavx512vl
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm_
mask_ cvtepu64_ ph Experimental avx512fp16
andavx512vl
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
- _mm_
mask_ cvtpd_ ph Experimental avx512fp16
andavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
- _mm_
mask_ cvtph_ epi16 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ cvtph_ epi32 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ cvtph_ epi64 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ cvtph_ epu16 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ cvtph_ epu32 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ cvtph_ epu64 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ cvtph_ pd Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm_
mask_ cvtsd_ sh Experimental avx512fp16
- Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ cvtsh_ sd Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_
mask_ cvtsh_ ss Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_
mask_ cvtss_ sh Experimental avx512fp16
- Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ cvttph_ epi16 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ cvttph_ epi32 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ cvttph_ epi64 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ cvttph_ epu16 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ cvttph_ epu32 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ cvttph_ epu64 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ cvtxph_ ps Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm_
mask_ cvtxps_ ph Experimental avx512fp16
andavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm_
mask_ div_ ph Experimental avx512fp16
andavx512vl
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ div_ round_ sh Experimental avx512fp16
- Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_
mask_ div_ sh Experimental avx512fp16
- Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
- _mm_
mask_ fcmadd_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is
copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
mask_ fcmadd_ round_ sch Experimental avx512fp16
- Multiply the lower complex number in a by the complex conjugate of the lower complex number in b,
accumulate to the lower complex number in c, and store the result in the lower elements of dst using
writemask k (the element is copied from a when the corresponding mask bit is not set), and copy the upper
6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
mask_ fcmadd_ sch Experimental avx512fp16
- Multiply the lower complex number in a by the complex conjugate of the lower complex number in b,
accumulate to the lower complex number in c, and store the result in the lower elements of dst using
writemask k (the element is copied from a when the corresponding mask bit is not set), and copy the upper
6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
mask_ fcmul_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
mask_ fcmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
mask_ fcmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
mask_ fmadd_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using writemask k (the element is copied from a when the corresponding
mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mask_ fmadd_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm_
mask_ fmadd_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and
store the result in the lower elements of dst using writemask k (elements are copied from a when
mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst.
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements,
which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mask_ fmadd_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ fmadd_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and
store the result in the lower elements of dst using writemask k (elements are copied from a when
mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst.
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements,
which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mask_ fmadd_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ fmaddsub_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm_
mask_ fmsub_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm_
mask_ fmsub_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ fmsub_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ fmsubadd_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm_
mask_ fmul_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mask_ fmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mask_ fmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mask_ fnmadd_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm_
mask_ fnmadd_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ fnmadd_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ fnmsub_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm_
mask_ fnmsub_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ fnmsub_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ fpclass_ ph_ mask Experimental avx512fp16
andavx512vl
- Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm_
mask_ fpclass_ sh_ mask Experimental avx512fp16
- Test the lower half-precision (16-bit) floating-point element in a for special categories specified by imm8, and store the result in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm_
mask_ getexp_ ph Experimental avx512fp16
andavx512vl
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k
(elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates
floor(log2(x))
for each element. - _mm_
mask_ getexp_ round_ sh Experimental avx512fp16
- Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision
(16-bit) floating-point number representing the integer exponent, store the result in the lower element
of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7
packed elements from a to the upper elements of dst. This intrinsic essentially calculates
floor(log2(x))
for the lower element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter - _mm_
mask_ getexp_ sh Experimental avx512fp16
- Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision
(16-bit) floating-point number representing the integer exponent, store the result in the lower element
of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7
packed elements from a to the upper elements of dst. This intrinsic essentially calculates
floor(log2(x))
for the lower element. - _mm_
mask_ getmant_ ph Experimental avx512fp16
andavx512vl
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. - _mm_
mask_ getmant_ round_ sh Experimental avx512fp16
- Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store
the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set),
and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter - _mm_
mask_ getmant_ sh Experimental avx512fp16
- Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store
the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set),
and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. - _mm_
mask_ βload_ sh Experimental avx512fp16
- Load a half-precision (16-bit) floating-point element from memory into the lower element of a new vector using writemask k (the element is copied from src when mask bit 0 is not set), and zero the upper elements.
- _mm_
mask_ max_ ph Experimental avx512fp16
andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_
mask_ max_ round_ sh Experimental avx512fp16
andavx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_
mask_ max_ sh Experimental avx512fp16
andavx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_
mask_ min_ ph Experimental avx512fp16
andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_
mask_ min_ round_ sh Experimental avx512fp16
andavx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_
mask_ min_ sh Experimental avx512fp16
andavx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_
mask_ move_ sh Experimental avx512fp16
- Move the lower half-precision (16-bit) floating-point element from b to the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ mul_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mask_ mul_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ mul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using
writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 6 packed
elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mask_ mul_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_
mask_ mul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using
writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 6 packed
elements from a to the upper elements of dst. Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mask_ mul_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
- _mm_
mask_ rcp_ ph Experimental avx512fp16
andavx512vl
- Compute the approximate reciprocal of packed 16-bit floating-point elements in
a
and stores the results indst
using writemaskk
(elements are copied fromsrc
when the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12
. - _mm_
mask_ rcp_ sh Experimental avx512fp16
- Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in b,
store the result in the lower element of dst using writemask k (the element is copied from src when
mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm_
mask_ reduce_ ph Experimental avx512fp16
andavx512vl
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ reduce_ round_ sh Experimental avx512fp16
- Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ reduce_ sh Experimental avx512fp16
- Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ roundscale_ ph Experimental avx512fp16
andavx512vl
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ roundscale_ round_ sh Experimental avx512fp16
- Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ roundscale_ sh Experimental avx512fp16
- Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ rsqrt_ ph Experimental avx512fp16
andavx512vl
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst using writemask k (elements are copied from src when
the corresponding mask bit is not set).
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm_
mask_ rsqrt_ sh Experimental avx512fp16
- Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point
element in b, store the result in the lower element of dst using writemask k (the element is copied from src
when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm_
mask_ scalef_ ph Experimental avx512fp16
andavx512vl
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ scalef_ round_ sh Experimental avx512fp16
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ scalef_ sh Experimental avx512fp16
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ sqrt_ ph Experimental avx512fp16
andavx512vl
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ sqrt_ round_ sh Experimental avx512fp16
- Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_
mask_ sqrt_ sh Experimental avx512fp16
- Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mask_ βstore_ sh Experimental avx512fp16
- Store the lower half-precision (16-bit) floating-point element from a into memory using writemask k
- _mm_
mask_ sub_ ph Experimental avx512fp16
andavx512vl
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_
mask_ sub_ round_ sh Experimental avx512fp16
- Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_
mask_ sub_ sh Experimental avx512fp16
- Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
- _mm_
maskz_ add_ ph Experimental avx512fp16
andavx512vl
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ add_ round_ sh Experimental avx512fp16
- Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_
maskz_ add_ sh Experimental avx512fp16
- Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
- _mm_
maskz_ cmul_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
maskz_ cmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
maskz_ cmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, - _mm_
maskz_ conj_ pch Experimental avx512fp16
andavx512vl
- Compute the complex conjugates of complex numbers in a, and store the results in dst using zeromask k
(the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
maskz_ cvt_ roundsd_ sh Experimental avx512fp16
- Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ cvt_ roundsh_ sd Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_
maskz_ cvt_ roundsh_ ss Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_
maskz_ cvt_ roundss_ sh Experimental avx512fp16
- Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ cvtepi16_ ph Experimental avx512fp16
andavx512vl
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ cvtepi32_ ph Experimental avx512fp16
andavx512vl
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm_
maskz_ cvtepi64_ ph Experimental avx512fp16
andavx512vl
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
- _mm_
maskz_ cvtepu16_ ph Experimental avx512fp16
andavx512vl
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ cvtepu32_ ph Experimental avx512fp16
andavx512vl
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm_
maskz_ cvtepu64_ ph Experimental avx512fp16
andavx512vl
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
- _mm_
maskz_ cvtpd_ ph Experimental avx512fp16
andavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
- _mm_
maskz_ cvtph_ epi16 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ cvtph_ epi32 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ cvtph_ epi64 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ cvtph_ epu16 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ cvtph_ epu32 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ cvtph_ epu64 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ cvtph_ pd Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ cvtsd_ sh Experimental avx512fp16
- Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ cvtsh_ sd Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_
maskz_ cvtsh_ ss Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_
maskz_ cvtss_ sh Experimental avx512fp16
- Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ cvttph_ epi16 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ cvttph_ epi32 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ cvttph_ epi64 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ cvttph_ epu16 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ cvttph_ epu32 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ cvttph_ epu64 Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ cvtxph_ ps Experimental avx512fp16
andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ cvtxps_ ph Experimental avx512fp16
andavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm_
maskz_ div_ ph Experimental avx512fp16
andavx512vl
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ div_ round_ sh Experimental avx512fp16
- Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_
maskz_ div_ sh Experimental avx512fp16
- Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
- _mm_
maskz_ fcmadd_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is
zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
maskz_ fcmadd_ round_ sch Experimental avx512fp16
- Multiply the lower complex number in a by the complex conjugate of the lower complex number in b,
accumulate to the lower complex number in c using zeromask k (the element is zeroed out when the corresponding
mask bit is not set), and store the result in the lower elements of dst, and copy the upper 6 packed elements
from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
maskz_ fcmadd_ sch Experimental avx512fp16
- Multiply the lower complex number in a by the complex conjugate of the lower complex number in b,
accumulate to the lower complex number in c, and store the result in the lower elements of dst using
zeromask k (the element is zeroed out when the corresponding mask bit is not set), and copy the upper
6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
maskz_ fcmul_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
maskz_ fcmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
maskz_ fcmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1]
. - _mm_
maskz_ fmadd_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask
bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point
elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
maskz_ fmadd_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ fmadd_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and
store the result in the lower elements of dst using zeromask k (elements are zeroed out when mask
bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each
complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
maskz_ fmadd_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ fmadd_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and
store the result in the lower elements of dst using zeromask k (elements are zeroed out when mask
bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each
complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
maskz_ fmadd_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ fmaddsub_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ fmsub_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ fmsub_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ fmsub_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ fmsubadd_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ fmul_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
maskz_ fmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
maskz_ fmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
maskz_ fnmadd_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ fnmadd_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ fnmadd_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ fnmsub_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ fnmsub_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ fnmsub_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ getexp_ ph Experimental avx512fp16
andavx512vl
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask
k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates
floor(log2(x))
for each element. - _mm_
maskz_ getexp_ round_ sh Experimental avx512fp16
- Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision
(16-bit) floating-point number representing the integer exponent, store the result in the lower element
of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed
elements from a to the upper elements of dst. This intrinsic essentially calculates
floor(log2(x))
for the lower element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter - _mm_
maskz_ getexp_ sh Experimental avx512fp16
- Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision
(16-bit) floating-point number representing the integer exponent, store the result in the lower element
of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed
elements from a to the upper elements of dst. This intrinsic essentially calculates
floor(log2(x))
for the lower element. - _mm_
maskz_ getmant_ ph Experimental avx512fp16
andavx512vl
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. - _mm_
maskz_ getmant_ round_ sh Experimental avx512fp16
- Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store
the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set),
and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter - _mm_
maskz_ getmant_ sh Experimental avx512fp16
- Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store
the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set),
and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates
Β±(2^k)*|x.significand|
, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. - _mm_
maskz_ βload_ sh Experimental avx512fp16
- Load a half-precision (16-bit) floating-point element from memory into the lower element of a new vector using zeromask k (the element is zeroed out when mask bit 0 is not set), and zero the upper elements.
- _mm_
maskz_ max_ ph Experimental avx512fp16
andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_
maskz_ max_ round_ sh Experimental avx512fp16
andavx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_
maskz_ max_ sh Experimental avx512fp16
andavx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_
maskz_ min_ ph Experimental avx512fp16
andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_
maskz_ min_ round_ sh Experimental avx512fp16
andavx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_
maskz_ min_ sh Experimental avx512fp16
andavx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_
maskz_ move_ sh Experimental avx512fp16
- Move the lower half-precision (16-bit) floating-point element from b to the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ mul_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
maskz_ mul_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ mul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using
zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements
from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
maskz_ mul_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_
maskz_ mul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using
zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements
from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
maskz_ mul_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
- _mm_
maskz_ rcp_ ph Experimental avx512fp16
andavx512vl
- Compute the approximate reciprocal of packed 16-bit floating-point elements in
a
and stores the results indst
using zeromaskk
(elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12
. - _mm_
maskz_ rcp_ sh Experimental avx512fp16
- Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in b,
store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0
is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm_
maskz_ reduce_ ph Experimental avx512fp16
andavx512vl
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ reduce_ round_ sh Experimental avx512fp16
- Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ reduce_ sh Experimental avx512fp16
- Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ roundscale_ ph Experimental avx512fp16
andavx512vl
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ roundscale_ round_ sh Experimental avx512fp16
- Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ roundscale_ sh Experimental avx512fp16
- Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ rsqrt_ ph Experimental avx512fp16
andavx512vl
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst using zeromask k (elements are zeroed out when the
corresponding mask bit is not set).
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm_
maskz_ rsqrt_ sh Experimental avx512fp16
- Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point
element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when
mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm_
maskz_ scalef_ ph Experimental avx512fp16
andavx512vl
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ scalef_ round_ sh Experimental avx512fp16
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ scalef_ sh Experimental avx512fp16
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ sqrt_ ph Experimental avx512fp16
andavx512vl
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ sqrt_ round_ sh Experimental avx512fp16
- Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_
maskz_ sqrt_ sh Experimental avx512fp16
- Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
maskz_ sub_ ph Experimental avx512fp16
andavx512vl
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_
maskz_ sub_ round_ sh Experimental avx512fp16
- Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_
maskz_ sub_ sh Experimental avx512fp16
- Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
- _mm_
max_ ph Experimental avx512fp16
andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_
max_ round_ sh Experimental avx512fp16
andavx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_
max_ sh Experimental avx512fp16
andavx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_
min_ ph Experimental avx512fp16
andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_
min_ round_ sh Experimental avx512fp16
andavx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_
min_ sh Experimental avx512fp16
andavx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_
move_ sh Experimental avx512fp16
- Move the lower half-precision (16-bit) floating-point element from b to the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
mul_ pch Experimental avx512fp16
andavx512vl
- Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mul_ ph Experimental avx512fp16
andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
- _mm_
mul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst,
and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mul_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_
mul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst,
and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number
complex = vec.fp16[0] + i * vec.fp16[1]
. - _mm_
mul_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
permutex2var_ ph Experimental avx512fp16
andavx512vl
- Shuffle half-precision (16-bit) floating-point elements in a and b using the corresponding selector and index in idx, and store the results in dst.
- _mm_
permutexvar_ ph Experimental avx512fp16
andavx512vl
- Shuffle half-precision (16-bit) floating-point elements in a using the corresponding index in idx, and store the results in dst.
- _mm_
rcp_ ph Experimental avx512fp16
andavx512vl
- Compute the approximate reciprocal of packed 16-bit floating-point elements in
a
and stores the results indst
. The maximum relative error for this approximation is less than1.5*2^-12
. - _mm_
rcp_ sh Experimental avx512fp16
- Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in b,
store the result in the lower element of dst, and copy the upper 7 packed elements from a to the
upper elements of dst.
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm_
reduce_ add_ ph Experimental avx512fp16
andavx512vl
- Reduce the packed half-precision (16-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
- _mm_
reduce_ max_ ph Experimental avx512fp16
andavx512vl
- Reduce the packed half-precision (16-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
- _mm_
reduce_ min_ ph Experimental avx512fp16
andavx512vl
- Reduce the packed half-precision (16-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
- _mm_
reduce_ mul_ ph Experimental avx512fp16
andavx512vl
- Reduce the packed half-precision (16-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
- _mm_
reduce_ ph Experimental avx512fp16
andavx512vl
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
- _mm_
reduce_ round_ sh Experimental avx512fp16
- Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
reduce_ sh Experimental avx512fp16
- Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
roundscale_ ph Experimental avx512fp16
andavx512vl
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
- _mm_
roundscale_ round_ sh Experimental avx512fp16
- Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
roundscale_ sh Experimental avx512fp16
- Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
rsqrt_ ph Experimental avx512fp16
andavx512vl
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst.
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm_
rsqrt_ sh Experimental avx512fp16
- Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point
element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a
to the upper elements of dst.
The maximum relative error for this approximation is less than
1.5*2^-12
. - _mm_
scalef_ ph Experimental avx512fp16
andavx512vl
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm_
scalef_ round_ sh Experimental avx512fp16
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
scalef_ sh Experimental avx512fp16
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
set1_ ph Experimental avx512fp16
- Broadcast the half-precision (16-bit) floating-point value a to all elements of dst.
- _mm_
set_ ph Experimental avx512fp16
- Set packed half-precision (16-bit) floating-point elements in dst with the supplied values.
- _mm_
set_ sh Experimental avx512fp16
- Copy half-precision (16-bit) floating-point elements from a to the lower element of dst and zero the upper 7 elements.
- _mm_
setr_ ph Experimental avx512fp16
- Set packed half-precision (16-bit) floating-point elements in dst with the supplied values in reverse order.
- _mm_
setzero_ ph Experimental avx512fp16
andavx512vl
- Return vector of type __m128h with all elements set to zero.
- _mm_
sqrt_ ph Experimental avx512fp16
andavx512vl
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst.
- _mm_
sqrt_ round_ sh Experimental avx512fp16
- Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_
sqrt_ sh Experimental avx512fp16
- Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
store_ βph Experimental avx512fp16
andavx512vl
- Store 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from a into memory. The address must be aligned to 16 bytes or a general-protection exception may be generated.
- _mm_
store_ βsh Experimental avx512fp16
- Store the lower half-precision (16-bit) floating-point element from a into memory.
- _mm_
storeu_ βph Experimental avx512fp16
andavx512vl
- Store 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from a into memory. The address does not need to be aligned to any particular boundary.
- _mm_
sub_ ph Experimental avx512fp16
andavx512vl
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst.
- _mm_
sub_ round_ sh Experimental avx512fp16
- Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_
sub_ sh Experimental avx512fp16
- Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_
ucomieq_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for equality, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
- _mm_
ucomige_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
- _mm_
ucomigt_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
- _mm_
ucomile_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
- _mm_
ucomilt_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
- _mm_
ucomineq_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for not-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
- _mm_
undefined_ ph Experimental avx512fp16
andavx512vl
- Return vector of type
__m128h
with indetermination elements. Despite using the word βundefinedβ (following Intelβs naming scheme), this non-deterministically picks some valid value and is not equivalent tomem::MaybeUninit
. In practice, this is typically equivalent tomem::zeroed
.