Module avx512fp16

Source
Available on x86 or x86-64 only.

MacrosΒ§

cmp_asm πŸ”’
fpclass_asm πŸ”’

FunctionsΒ§

vaddph πŸ”’ ⚠
vaddsh πŸ”’ ⚠
vcmpsh πŸ”’ ⚠
vcomish πŸ”’ ⚠
vcvtdq2ph_128 πŸ”’ ⚠
vcvtdq2ph_256 πŸ”’ ⚠
vcvtdq2ph_512 πŸ”’ ⚠
vcvtpd2ph_128 πŸ”’ ⚠
vcvtpd2ph_256 πŸ”’ ⚠
vcvtpd2ph_512 πŸ”’ ⚠
vcvtph2dq_128 πŸ”’ ⚠
vcvtph2dq_256 πŸ”’ ⚠
vcvtph2dq_512 πŸ”’ ⚠
vcvtph2pd_128 πŸ”’ ⚠
vcvtph2pd_256 πŸ”’ ⚠
vcvtph2pd_512 πŸ”’ ⚠
vcvtph2psx_128 πŸ”’ ⚠
vcvtph2psx_256 πŸ”’ ⚠
vcvtph2psx_512 πŸ”’ ⚠
vcvtph2qq_128 πŸ”’ ⚠
vcvtph2qq_256 πŸ”’ ⚠
vcvtph2qq_512 πŸ”’ ⚠
vcvtph2udq_128 πŸ”’ ⚠
vcvtph2udq_256 πŸ”’ ⚠
vcvtph2udq_512 πŸ”’ ⚠
vcvtph2uqq_128 πŸ”’ ⚠
vcvtph2uqq_256 πŸ”’ ⚠
vcvtph2uqq_512 πŸ”’ ⚠
vcvtph2uw_128 πŸ”’ ⚠
vcvtph2uw_256 πŸ”’ ⚠
vcvtph2uw_512 πŸ”’ ⚠
vcvtph2w_128 πŸ”’ ⚠
vcvtph2w_256 πŸ”’ ⚠
vcvtph2w_512 πŸ”’ ⚠
vcvtps2phx_128 πŸ”’ ⚠
vcvtps2phx_256 πŸ”’ ⚠
vcvtps2phx_512 πŸ”’ ⚠
vcvtqq2ph_128 πŸ”’ ⚠
vcvtqq2ph_256 πŸ”’ ⚠
vcvtqq2ph_512 πŸ”’ ⚠
vcvtsd2sh πŸ”’ ⚠
vcvtsh2sd πŸ”’ ⚠
vcvtsh2si32 πŸ”’ ⚠
vcvtsh2ss πŸ”’ ⚠
vcvtsh2usi32 πŸ”’ ⚠
vcvtsi2sh πŸ”’ ⚠
vcvtss2sh πŸ”’ ⚠
vcvttph2dq_128 πŸ”’ ⚠
vcvttph2dq_256 πŸ”’ ⚠
vcvttph2dq_512 πŸ”’ ⚠
vcvttph2qq_128 πŸ”’ ⚠
vcvttph2qq_256 πŸ”’ ⚠
vcvttph2qq_512 πŸ”’ ⚠
vcvttph2udq_128 πŸ”’ ⚠
vcvttph2udq_256 πŸ”’ ⚠
vcvttph2udq_512 πŸ”’ ⚠
vcvttph2uqq_128 πŸ”’ ⚠
vcvttph2uqq_256 πŸ”’ ⚠
vcvttph2uqq_512 πŸ”’ ⚠
vcvttph2uw_128 πŸ”’ ⚠
vcvttph2uw_256 πŸ”’ ⚠
vcvttph2uw_512 πŸ”’ ⚠
vcvttph2w_128 πŸ”’ ⚠
vcvttph2w_256 πŸ”’ ⚠
vcvttph2w_512 πŸ”’ ⚠
vcvttsh2si32 πŸ”’ ⚠
vcvttsh2usi32 πŸ”’ ⚠
vcvtudq2ph_128 πŸ”’ ⚠
vcvtudq2ph_256 πŸ”’ ⚠
vcvtudq2ph_512 πŸ”’ ⚠
vcvtuqq2ph_128 πŸ”’ ⚠
vcvtuqq2ph_256 πŸ”’ ⚠
vcvtuqq2ph_512 πŸ”’ ⚠
vcvtusi2sh πŸ”’ ⚠
vcvtuw2ph_128 πŸ”’ ⚠
vcvtuw2ph_256 πŸ”’ ⚠
vcvtuw2ph_512 πŸ”’ ⚠
vcvtw2ph_128 πŸ”’ ⚠
vcvtw2ph_256 πŸ”’ ⚠
vcvtw2ph_512 πŸ”’ ⚠
vdivph πŸ”’ ⚠
vdivsh πŸ”’ ⚠
vfcmaddcph_mask3_128 πŸ”’ ⚠
vfcmaddcph_mask3_256 πŸ”’ ⚠
vfcmaddcph_mask3_512 πŸ”’ ⚠
vfcmaddcph_maskz_128 πŸ”’ ⚠
vfcmaddcph_maskz_256 πŸ”’ ⚠
vfcmaddcph_maskz_512 πŸ”’ ⚠
vfcmaddcsh_mask πŸ”’ ⚠
vfcmaddcsh_maskz πŸ”’ ⚠
vfcmulcph_128 πŸ”’ ⚠
vfcmulcph_256 πŸ”’ ⚠
vfcmulcph_512 πŸ”’ ⚠
vfcmulcsh πŸ”’ ⚠
vfmaddcph_mask3_128 πŸ”’ ⚠
vfmaddcph_mask3_256 πŸ”’ ⚠
vfmaddcph_mask3_512 πŸ”’ ⚠
vfmaddcph_maskz_128 πŸ”’ ⚠
vfmaddcph_maskz_256 πŸ”’ ⚠
vfmaddcph_maskz_512 πŸ”’ ⚠
vfmaddcsh_mask πŸ”’ ⚠
vfmaddcsh_maskz πŸ”’ ⚠
vfmaddph_512 πŸ”’ ⚠
vfmaddsh πŸ”’ ⚠
vfmaddsubph_128 πŸ”’ ⚠
vfmaddsubph_256 πŸ”’ ⚠
vfmaddsubph_512 πŸ”’ ⚠
vfmulcph_128 πŸ”’ ⚠
vfmulcph_256 πŸ”’ ⚠
vfmulcph_512 πŸ”’ ⚠
vfmulcsh πŸ”’ ⚠
vfpclasssh πŸ”’ ⚠
vgetexpph_128 πŸ”’ ⚠
vgetexpph_256 πŸ”’ ⚠
vgetexpph_512 πŸ”’ ⚠
vgetexpsh πŸ”’ ⚠
vgetmantph_128 πŸ”’ ⚠
vgetmantph_256 πŸ”’ ⚠
vgetmantph_512 πŸ”’ ⚠
vgetmantsh πŸ”’ ⚠
vmaxph_128 πŸ”’ ⚠
vmaxph_256 πŸ”’ ⚠
vmaxph_512 πŸ”’ ⚠
vmaxsh πŸ”’ ⚠
vminph_128 πŸ”’ ⚠
vminph_256 πŸ”’ ⚠
vminph_512 πŸ”’ ⚠
vminsh πŸ”’ ⚠
vmulph πŸ”’ ⚠
vmulsh πŸ”’ ⚠
vrcpph_128 πŸ”’ ⚠
vrcpph_256 πŸ”’ ⚠
vrcpph_512 πŸ”’ ⚠
vrcpsh πŸ”’ ⚠
vreduceph_128 πŸ”’ ⚠
vreduceph_256 πŸ”’ ⚠
vreduceph_512 πŸ”’ ⚠
vreducesh πŸ”’ ⚠
vrndscaleph_128 πŸ”’ ⚠
vrndscaleph_256 πŸ”’ ⚠
vrndscaleph_512 πŸ”’ ⚠
vrndscalesh πŸ”’ ⚠
vrsqrtph_128 πŸ”’ ⚠
vrsqrtph_256 πŸ”’ ⚠
vrsqrtph_512 πŸ”’ ⚠
vrsqrtsh πŸ”’ ⚠
vscalefph_128 πŸ”’ ⚠
vscalefph_256 πŸ”’ ⚠
vscalefph_512 πŸ”’ ⚠
vscalefsh πŸ”’ ⚠
vsqrtph_512 πŸ”’ ⚠
vsqrtsh πŸ”’ ⚠
vsubph πŸ”’ ⚠
vsubsh πŸ”’ ⚠
_mm256_abs_phExperimentalavx512fp16 and avx512vl
Finds the absolute value of each packed half-precision (16-bit) floating-point element in v2, storing the result in dst.
_mm256_add_phExperimentalavx512fp16 and avx512vl
Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
_mm256_castpd_phExperimentalavx512fp16
Cast vector of type __m256d to type __m256h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_castph128_ph256Experimentalavx512fp16
Cast vector of type __m128h to type __m256h. The upper 8 elements of the result are undefined. In practice, the upper elements are zeroed. This intrinsic can generate the vzeroupper instruction, but most of the time it does not generate any instructions.
_mm256_castph256_ph128Experimentalavx512fp16
Cast vector of type __m256h to type __m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_castph_pdExperimentalavx512fp16
Cast vector of type __m256h to type __m256d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_castph_psExperimentalavx512fp16
Cast vector of type __m256h to type __m256. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_castph_si256Experimentalavx512fp16
Cast vector of type __m256h to type __m256i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_castps_phExperimentalavx512fp16
Cast vector of type __m256 to type __m256h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_castsi256_phExperimentalavx512fp16
Cast vector of type __m256i to type __m256h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_cmp_ph_maskExperimentalavx512fp16 and avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
_mm256_cmul_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_conj_pchExperimentalavx512fp16 and avx512vl
Compute the complex conjugates of complex numbers in a, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_cvtepi16_phExperimentalavx512fp16 and avx512vl
Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm256_cvtepi32_phExperimentalavx512fp16 and avx512vl
Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm256_cvtepi64_phExperimentalavx512fp16 and avx512vl
Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
_mm256_cvtepu16_phExperimentalavx512fp16 and avx512vl
Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm256_cvtepu32_phExperimentalavx512fp16 and avx512vl
Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm256_cvtepu64_phExperimentalavx512fp16 and avx512vl
Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
_mm256_cvtpd_phExperimentalavx512fp16 and avx512vl
Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
_mm256_cvtph_epi16Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
_mm256_cvtph_epi32Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
_mm256_cvtph_epi64Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
_mm256_cvtph_epu16Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
_mm256_cvtph_epu32Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst.
_mm256_cvtph_epu64Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
_mm256_cvtph_pdExperimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
_mm256_cvtsh_hExperimentalavx512fp16
Copy the lower half-precision (16-bit) floating-point element from a to dst.
_mm256_cvttph_epi16Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
_mm256_cvttph_epi32Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
_mm256_cvttph_epi64Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
_mm256_cvttph_epu16Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
_mm256_cvttph_epu32Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
_mm256_cvttph_epu64Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
_mm256_cvtxph_psExperimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
_mm256_cvtxps_phExperimentalavx512fp16 and avx512vl
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm256_div_phExperimentalavx512fp16 and avx512vl
Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst.
_mm256_fcmadd_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_fcmul_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_fmadd_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_fmadd_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
_mm256_fmaddsub_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
_mm256_fmsub_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
_mm256_fmsubadd_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
_mm256_fmul_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_fnmadd_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
_mm256_fnmsub_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
_mm256_fpclass_ph_maskExperimentalavx512fp16 and avx512vl
Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
_mm256_getexp_phExperimentalavx512fp16 and avx512vl
Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
_mm256_getmant_phExperimentalavx512fp16 and avx512vl
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm256_load_ph⚠Experimentalavx512fp16 and avx512vl
Load 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address must be aligned to 32 bytes or a general-protection exception may be generated.
_mm256_loadu_ph⚠Experimentalavx512fp16 and avx512vl
Load 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address does not need to be aligned to any particular boundary.
_mm256_mask3_fcmadd_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_mask3_fmadd_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_mask3_fmadd_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm256_mask3_fmaddsub_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm256_mask3_fmsub_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm256_mask3_fmsubadd_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm256_mask3_fnmadd_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm256_mask3_fnmsub_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm256_mask_add_phExperimentalavx512fp16 and avx512vl
Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_blend_phExperimentalavx512fp16 and avx512vl
Blend packed half-precision (16-bit) floating-point elements from a and b using control mask k, and store the results in dst.
_mm256_mask_cmp_ph_maskExperimentalavx512fp16 and avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_cmul_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_mask_conj_pchExperimentalavx512fp16 and avx512vl
Compute the complex conjugates of complex numbers in a, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_mask_cvtepi16_phExperimentalavx512fp16 and avx512vl
Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm256_mask_cvtepi32_phExperimentalavx512fp16 and avx512vl
Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm256_mask_cvtepi64_phExperimentalavx512fp16 and avx512vl
Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm256_mask_cvtepu16_phExperimentalavx512fp16 and avx512vl
Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm256_mask_cvtepu32_phExperimentalavx512fp16 and avx512vl
Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm256_mask_cvtepu64_phExperimentalavx512fp16 and avx512vl
Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm256_mask_cvtpd_phExperimentalavx512fp16 and avx512vl
Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm256_mask_cvtph_epi16Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvtph_epi32Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvtph_epi64Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvtph_epu16Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvtph_epu32Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvtph_epu64Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvtph_pdExperimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm256_mask_cvttph_epi16Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvttph_epi32Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvttph_epi64Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvttph_epu16Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvttph_epu32Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvttph_epu64Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_cvtxph_psExperimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm256_mask_cvtxps_phExperimentalavx512fp16 and avx512vl
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm256_mask_div_phExperimentalavx512fp16 and avx512vl
Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_fcmadd_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_mask_fcmul_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_mask_fmadd_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_mask_fmadd_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm256_mask_fmaddsub_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm256_mask_fmsub_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm256_mask_fmsubadd_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm256_mask_fmul_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_mask_fnmadd_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm256_mask_fnmsub_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm256_mask_fpclass_ph_maskExperimentalavx512fp16 and avx512vl
Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
_mm256_mask_getexp_phExperimentalavx512fp16 and avx512vl
Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm256_mask_getmant_phExperimentalavx512fp16 and avx512vl
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm256_mask_max_phExperimentalavx512fp16 and avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm256_mask_min_phExperimentalavx512fp16 and avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm256_mask_mul_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_mask_mul_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_rcp_phExperimentalavx512fp16 and avx512vl
Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm256_mask_reduce_phExperimentalavx512fp16 and avx512vl
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_roundscale_phExperimentalavx512fp16 and avx512vl
Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_rsqrt_phExperimentalavx512fp16 and avx512vl
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm256_mask_scalef_phExperimentalavx512fp16 and avx512vl
Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_sqrt_phExperimentalavx512fp16 and avx512vl
Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_sub_phExperimentalavx512fp16 and avx512vl
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_add_phExperimentalavx512fp16 and avx512vl
Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cmul_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_maskz_conj_pchExperimentalavx512fp16 and avx512vl
Compute the complex conjugates of complex numbers in a, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_maskz_cvtepi16_phExperimentalavx512fp16 and avx512vl
Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtepi32_phExperimentalavx512fp16 and avx512vl
Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtepi64_phExperimentalavx512fp16 and avx512vl
Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm256_maskz_cvtepu16_phExperimentalavx512fp16 and avx512vl
Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtepu32_phExperimentalavx512fp16 and avx512vl
Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtepu64_phExperimentalavx512fp16 and avx512vl
Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm256_maskz_cvtpd_phExperimentalavx512fp16 and avx512vl
Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm256_maskz_cvtph_epi16Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtph_epi32Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtph_epi64Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtph_epu16Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtph_epu32Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtph_epu64Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtph_pdExperimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvttph_epi16Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvttph_epi32Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvttph_epi64Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvttph_epu16Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvttph_epu32Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvttph_epu64Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtxph_psExperimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_cvtxps_phExperimentalavx512fp16 and avx512vl
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_div_phExperimentalavx512fp16 and avx512vl
Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_fcmadd_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_maskz_fcmul_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm256_maskz_fmadd_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_maskz_fmadd_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm256_maskz_fmaddsub_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm256_maskz_fmsub_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm256_maskz_fmsubadd_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm256_maskz_fmul_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_maskz_fnmadd_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm256_maskz_fnmsub_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm256_maskz_getexp_phExperimentalavx512fp16 and avx512vl
Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm256_maskz_getmant_phExperimentalavx512fp16 and avx512vl
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm256_maskz_max_phExperimentalavx512fp16 and avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm256_maskz_min_phExperimentalavx512fp16 and avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm256_maskz_mul_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_maskz_mul_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_rcp_phExperimentalavx512fp16 and avx512vl
Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm256_maskz_reduce_phExperimentalavx512fp16 and avx512vl
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_roundscale_phExperimentalavx512fp16 and avx512vl
Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_rsqrt_phExperimentalavx512fp16 and avx512vl
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm256_maskz_scalef_phExperimentalavx512fp16 and avx512vl
Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_sqrt_phExperimentalavx512fp16 and avx512vl
Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_sub_phExperimentalavx512fp16 and avx512vl
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_max_phExperimentalavx512fp16 and avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm256_min_phExperimentalavx512fp16 and avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm256_mul_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm256_mul_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
_mm256_permutex2var_phExperimentalavx512fp16 and avx512vl
Shuffle half-precision (16-bit) floating-point elements in a and b using the corresponding selector and index in idx, and store the results in dst.
_mm256_permutexvar_phExperimentalavx512fp16 and avx512vl
Shuffle half-precision (16-bit) floating-point elements in a using the corresponding index in idx, and store the results in dst.
_mm256_rcp_phExperimentalavx512fp16 and avx512vl
Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm256_reduce_add_phExperimentalavx512fp16 and avx512vl
Reduce the packed half-precision (16-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
_mm256_reduce_max_phExperimentalavx512fp16 and avx512vl
Reduce the packed half-precision (16-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
_mm256_reduce_min_phExperimentalavx512fp16 and avx512vl
Reduce the packed half-precision (16-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
_mm256_reduce_mul_phExperimentalavx512fp16 and avx512vl
Reduce the packed half-precision (16-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
_mm256_reduce_phExperimentalavx512fp16 and avx512vl
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
_mm256_roundscale_phExperimentalavx512fp16 and avx512vl
Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
_mm256_rsqrt_phExperimentalavx512fp16 and avx512vl
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm256_scalef_phExperimentalavx512fp16 and avx512vl
Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
_mm256_set1_phExperimentalavx512fp16
Broadcast the half-precision (16-bit) floating-point value a to all elements of dst.
_mm256_set_phExperimentalavx512fp16
Set packed half-precision (16-bit) floating-point elements in dst with the supplied values.
_mm256_setr_phExperimentalavx512fp16
Set packed half-precision (16-bit) floating-point elements in dst with the supplied values in reverse order.
_mm256_setzero_phExperimentalavx512fp16 and avx512vl
Return vector of type __m256h with all elements set to zero.
_mm256_sqrt_phExperimentalavx512fp16 and avx512vl
Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst.
_mm256_store_ph⚠Experimentalavx512fp16 and avx512vl
Store 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from a into memory. The address must be aligned to 32 bytes or a general-protection exception may be generated.
_mm256_storeu_ph⚠Experimentalavx512fp16 and avx512vl
Store 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from a into memory. The address does not need to be aligned to any particular boundary.
_mm256_sub_phExperimentalavx512fp16 and avx512vl
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst.
_mm256_undefined_phExperimentalavx512fp16 and avx512vl
Return vector of type __m256h with indetermination elements. Despite using the word β€œundefined” (following Intel’s naming scheme), this non-deterministically picks some valid value and is not equivalent to mem::MaybeUninit. In practice, this is typically equivalent to mem::zeroed.
_mm256_zextph128_ph256Experimentalavx512fp16
Cast vector of type __m256h to type __m128h. The upper 8 elements of the result are zeroed. This intrinsic can generate the vzeroupper instruction, but most of the time it does not generate any instructions.
_mm512_abs_phExperimentalavx512fp16
Finds the absolute value of each packed half-precision (16-bit) floating-point element in v2, storing the result in dst.
_mm512_add_phExperimentalavx512fp16
Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
_mm512_add_round_phExperimentalavx512fp16
Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
_mm512_castpd_phExperimentalavx512fp16
Cast vector of type __m512d to type __m512h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm512_castph128_ph512Experimentalavx512fp16
Cast vector of type __m128h to type __m512h. The upper 24 elements of the result are undefined. In practice, the upper elements are zeroed. This intrinsic can generate the vzeroupper instruction, but most of the time it does not generate any instructions.
_mm512_castph256_ph512Experimentalavx512fp16
Cast vector of type __m256h to type __m512h. The upper 16 elements of the result are undefined. In practice, the upper elements are zeroed. This intrinsic can generate the vzeroupper instruction, but most of the time it does not generate any instructions.
_mm512_castph512_ph128Experimentalavx512fp16
Cast vector of type __m512h to type __m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm512_castph512_ph256Experimentalavx512fp16
Cast vector of type __m512h to type __m256h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm512_castph_pdExperimentalavx512fp16
Cast vector of type __m512h to type __m512d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm512_castph_psExperimentalavx512fp16
Cast vector of type __m512h to type __m512. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm512_castph_si512Experimentalavx512fp16
Cast vector of type __m512h to type __m512i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm512_castps_phExperimentalavx512fp16
Cast vector of type __m512 to type __m512h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm512_castsi512_phExperimentalavx512fp16
Cast vector of type __m512i to type __m512h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm512_cmp_ph_maskExperimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
_mm512_cmp_round_ph_maskExperimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
_mm512_cmul_pchExperimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_cmul_round_pchExperimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_conj_pchExperimentalavx512fp16
Compute the complex conjugates of complex numbers in a, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_cvt_roundepi16_phExperimentalavx512fp16
Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvt_roundepi32_phExperimentalavx512fp16
Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvt_roundepi64_phExperimentalavx512fp16
Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvt_roundepu16_phExperimentalavx512fp16
Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvt_roundepu32_phExperimentalavx512fp16
Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvt_roundepu64_phExperimentalavx512fp16
Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvt_roundpd_phExperimentalavx512fp16
Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvt_roundph_epi16Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
_mm512_cvt_roundph_epi32Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
_mm512_cvt_roundph_epi64Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
_mm512_cvt_roundph_epu16Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
_mm512_cvt_roundph_epu32Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst.
_mm512_cvt_roundph_epu64Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
_mm512_cvt_roundph_pdExperimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
_mm512_cvtepi16_phExperimentalavx512fp16
Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvtepi32_phExperimentalavx512fp16
Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvtepi64_phExperimentalavx512fp16
Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvtepu16_phExperimentalavx512fp16
Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvtepu32_phExperimentalavx512fp16
Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvtepu64_phExperimentalavx512fp16
Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvtpd_phExperimentalavx512fp16
Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvtph_epi16Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
_mm512_cvtph_epi32Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
_mm512_cvtph_epi64Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
_mm512_cvtph_epu16Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
_mm512_cvtph_epu32Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst.
_mm512_cvtph_epu64Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
_mm512_cvtph_pdExperimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
_mm512_cvtsh_hExperimentalavx512fp16
Copy the lower half-precision (16-bit) floating-point element from a to dst.
_mm512_cvtt_roundph_epi16Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
_mm512_cvtt_roundph_epi32Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
_mm512_cvtt_roundph_epi64Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
_mm512_cvtt_roundph_epu16Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
_mm512_cvtt_roundph_epu32Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
_mm512_cvtt_roundph_epu64Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
_mm512_cvttph_epi16Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
_mm512_cvttph_epi32Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
_mm512_cvttph_epi64Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
_mm512_cvttph_epu16Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
_mm512_cvttph_epu32Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
_mm512_cvttph_epu64Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
_mm512_cvtx_roundph_psExperimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
_mm512_cvtx_roundps_phExperimentalavx512fp16
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_cvtxph_psExperimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
_mm512_cvtxps_phExperimentalavx512fp16
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm512_div_phExperimentalavx512fp16
Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst.
_mm512_div_round_phExperimentalavx512fp16
Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
_mm512_fcmadd_pchExperimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_fcmadd_round_pchExperimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_fcmul_pchExperimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_fcmul_round_pchExperimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1],
_mm512_fmadd_pchExperimentalavx512fp16
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_fmadd_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
_mm512_fmadd_round_pchExperimentalavx512fp16
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_fmadd_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
_mm512_fmaddsub_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
_mm512_fmaddsub_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
_mm512_fmsub_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
_mm512_fmsub_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
_mm512_fmsubadd_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
_mm512_fmsubadd_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
_mm512_fmul_pchExperimentalavx512fp16
Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_fmul_round_pchExperimentalavx512fp16
Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1]. Rounding is done according to the rounding parameter, which can be one of:
_mm512_fnmadd_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
_mm512_fnmadd_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
_mm512_fnmsub_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
_mm512_fnmsub_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
_mm512_fpclass_ph_maskExperimentalavx512fp16
Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
_mm512_getexp_phExperimentalavx512fp16
Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
_mm512_getexp_round_phExperimentalavx512fp16
Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_getmant_phExperimentalavx512fp16
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm512_getmant_round_phExperimentalavx512fp16
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_load_ph⚠Experimentalavx512fp16
Load 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address must be aligned to 64 bytes or a general-protection exception may be generated.
_mm512_loadu_ph⚠Experimentalavx512fp16
Load 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address does not need to be aligned to any particular boundary.
_mm512_mask3_fcmadd_pchExperimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask3_fcmadd_round_pchExperimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c using writemask k (the element is copied from c when the corresponding mask bit is not set), and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1, or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask3_fmadd_pchExperimentalavx512fp16
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mask3_fmadd_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fmadd_round_pchExperimentalavx512fp16
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mask3_fmadd_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fmaddsub_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fmaddsub_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fmsub_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fmsub_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fmsubadd_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fmsubadd_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fnmadd_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fnmadd_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fnmsub_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask3_fnmsub_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm512_mask_add_phExperimentalavx512fp16
Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_add_round_phExperimentalavx512fp16
Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_mask_blend_phExperimentalavx512fp16
Blend packed half-precision (16-bit) floating-point elements from a and b using control mask k, and store the results in dst.
_mm512_mask_cmp_ph_maskExperimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_cmp_round_ph_maskExperimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_cmul_pchExperimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask_cmul_round_pchExperimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask_conj_pchExperimentalavx512fp16
Compute the complex conjugates of complex numbers in a, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask_cvt_roundepi16_phExperimentalavx512fp16
Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvt_roundepi32_phExperimentalavx512fp16
Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvt_roundepi64_phExperimentalavx512fp16
Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvt_roundepu16_phExperimentalavx512fp16
Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvt_roundepu32_phExperimentalavx512fp16
Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvt_roundepu64_phExperimentalavx512fp16
Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvt_roundpd_phExperimentalavx512fp16
Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvt_roundph_epi16Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvt_roundph_epi32Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvt_roundph_epi64Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvt_roundph_epu16Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvt_roundph_epu32Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvt_roundph_epu64Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvt_roundph_pdExperimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtepi16_phExperimentalavx512fp16
Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtepi32_phExperimentalavx512fp16
Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtepi64_phExperimentalavx512fp16
Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtepu16_phExperimentalavx512fp16
Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtepu32_phExperimentalavx512fp16
Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtepu64_phExperimentalavx512fp16
Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtpd_phExperimentalavx512fp16
Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtph_epi16Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtph_epi32Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtph_epi64Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtph_epu16Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtph_epu32Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtph_epu64Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtph_pdExperimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtt_roundph_epi16Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtt_roundph_epi32Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtt_roundph_epi64Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtt_roundph_epu16Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtt_roundph_epu32Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtt_roundph_epu64Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvttph_epi16Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvttph_epi32Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvttph_epi64Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvttph_epu16Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvttph_epu32Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvttph_epu64Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_cvtx_roundph_psExperimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtx_roundps_phExperimentalavx512fp16
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtxph_psExperimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_cvtxps_phExperimentalavx512fp16
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm512_mask_div_phExperimentalavx512fp16
Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_div_round_phExperimentalavx512fp16
Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_mask_fcmadd_pchExperimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask_fcmadd_round_pchExperimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask_fcmul_pchExperimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask_fcmul_round_pchExperimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_mask_fmadd_pchExperimentalavx512fp16
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mask_fmadd_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fmadd_round_pchExperimentalavx512fp16
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mask_fmadd_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fmaddsub_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fmaddsub_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fmsub_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fmsub_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fmsubadd_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fmsubadd_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fmul_pchExperimentalavx512fp16
Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mask_fmul_round_pchExperimentalavx512fp16
Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1]. Rounding is done according to the rounding parameter, which can be one of:
_mm512_mask_fnmadd_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fnmadd_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fnmsub_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fnmsub_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm512_mask_fpclass_ph_maskExperimentalavx512fp16
Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
_mm512_mask_getexp_phExperimentalavx512fp16
Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm512_mask_getexp_round_phExperimentalavx512fp16
Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_mask_getmant_phExperimentalavx512fp16
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm512_mask_getmant_round_phExperimentalavx512fp16
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_mask_max_phExperimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm512_mask_max_round_phExperimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm512_mask_min_phExperimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm512_mask_min_round_phExperimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm512_mask_mul_pchExperimentalavx512fp16
Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mask_mul_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_mul_round_pchExperimentalavx512fp16
Multiply the packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mask_mul_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_mask_rcp_phExperimentalavx512fp16
Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm512_mask_reduce_phExperimentalavx512fp16
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_reduce_round_phExperimentalavx512fp16
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_roundscale_phExperimentalavx512fp16
Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_roundscale_round_phExperimentalavx512fp16
Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_mask_rsqrt_phExperimentalavx512fp16
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm512_mask_scalef_phExperimentalavx512fp16
Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_scalef_round_phExperimentalavx512fp16
Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_sqrt_phExperimentalavx512fp16
Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_sqrt_round_phExperimentalavx512fp16
Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_mask_sub_phExperimentalavx512fp16
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_sub_round_phExperimentalavx512fp16
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_maskz_add_phExperimentalavx512fp16
Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_add_round_phExperimentalavx512fp16
Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_maskz_cmul_pchExperimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_maskz_cmul_round_pchExperimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_maskz_conj_pchExperimentalavx512fp16
Compute the complex conjugates of complex numbers in a, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_maskz_cvt_roundepi16_phExperimentalavx512fp16
Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundepi32_phExperimentalavx512fp16
Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundepi64_phExperimentalavx512fp16
Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundepu16_phExperimentalavx512fp16
Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundepu32_phExperimentalavx512fp16
Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundepu64_phExperimentalavx512fp16
Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundpd_phExperimentalavx512fp16
Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundph_epi16Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundph_epi32Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundph_epi64Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundph_epu16Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundph_epu32Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundph_epu64Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvt_roundph_pdExperimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtepi16_phExperimentalavx512fp16
Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtepi32_phExperimentalavx512fp16
Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtepi64_phExperimentalavx512fp16
Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtepu16_phExperimentalavx512fp16
Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtepu32_phExperimentalavx512fp16
Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtepu64_phExperimentalavx512fp16
Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtpd_phExperimentalavx512fp16
Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtph_epi16Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtph_epi32Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtph_epi64Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtph_epu16Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtph_epu32Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtph_epu64Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtph_pdExperimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtt_roundph_epi16Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtt_roundph_epi32Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtt_roundph_epi64Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtt_roundph_epu16Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtt_roundph_epu32Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtt_roundph_epu64Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvttph_epi16Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvttph_epi32Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvttph_epi64Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvttph_epu16Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvttph_epu32Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvttph_epu64Experimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtx_roundph_psExperimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtx_roundps_phExperimentalavx512fp16
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtxph_psExperimentalavx512fp16
Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_cvtxps_phExperimentalavx512fp16
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_div_phExperimentalavx512fp16
Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_div_round_phExperimentalavx512fp16
Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_maskz_fcmadd_pchExperimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_maskz_fcmadd_round_pchExperimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c using zeromask k (the element is zeroed out when the corresponding mask bit is not set), and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1, or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_maskz_fcmul_pchExperimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_maskz_fcmul_round_pchExperimentalavx512fp16
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm512_maskz_fmadd_pchExperimentalavx512fp16
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_maskz_fmadd_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fmadd_round_pchExperimentalavx512fp16
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_maskz_fmadd_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fmaddsub_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fmaddsub_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fmsub_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fmsub_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fmsubadd_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fmsubadd_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fmul_pchExperimentalavx512fp16
Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_maskz_fmul_round_pchExperimentalavx512fp16
Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1]. Rounding is done according to the rounding parameter, which can be one of:
_mm512_maskz_fnmadd_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fnmadd_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fnmsub_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_fnmsub_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm512_maskz_getexp_phExperimentalavx512fp16
Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm512_maskz_getexp_round_phExperimentalavx512fp16
Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_maskz_getmant_phExperimentalavx512fp16
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm512_maskz_getmant_round_phExperimentalavx512fp16
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_maskz_max_phExperimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm512_maskz_max_round_phExperimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm512_maskz_min_phExperimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm512_maskz_min_round_phExperimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm512_maskz_mul_pchExperimentalavx512fp16
Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_maskz_mul_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_mul_round_pchExperimentalavx512fp16
Multiply the packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_maskz_mul_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_maskz_rcp_phExperimentalavx512fp16
Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm512_maskz_reduce_phExperimentalavx512fp16
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_reduce_round_phExperimentalavx512fp16
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_roundscale_phExperimentalavx512fp16
Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_roundscale_round_phExperimentalavx512fp16
Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_maskz_rsqrt_phExperimentalavx512fp16
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm512_maskz_scalef_phExperimentalavx512fp16
Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_scalef_round_phExperimentalavx512fp16
Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_sqrt_phExperimentalavx512fp16
Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_sqrt_round_phExperimentalavx512fp16
Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_maskz_sub_phExperimentalavx512fp16
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_sub_round_phExperimentalavx512fp16
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm512_max_phExperimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm512_max_round_phExperimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm512_min_phExperimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm512_min_round_phExperimentalavx512fp16
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm512_mul_pchExperimentalavx512fp16
Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mul_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
_mm512_mul_round_pchExperimentalavx512fp16
Multiply the packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm512_mul_round_phExperimentalavx512fp16
Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
_mm512_permutex2var_phExperimentalavx512fp16
Shuffle half-precision (16-bit) floating-point elements in a and b using the corresponding selector and index in idx, and store the results in dst.
_mm512_permutexvar_phExperimentalavx512fp16
Shuffle half-precision (16-bit) floating-point elements in a using the corresponding index in idx, and store the results in dst.
_mm512_rcp_phExperimentalavx512fp16
Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm512_reduce_add_phExperimentalavx512fp16
Reduce the packed half-precision (16-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
_mm512_reduce_max_phExperimentalavx512fp16
Reduce the packed half-precision (16-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
_mm512_reduce_min_phExperimentalavx512fp16
Reduce the packed half-precision (16-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
_mm512_reduce_mul_ph⚠Experimentalavx512fp16
Reduce the packed half-precision (16-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
_mm512_reduce_phExperimentalavx512fp16
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
_mm512_reduce_round_phExperimentalavx512fp16
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
_mm512_roundscale_phExperimentalavx512fp16
Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
_mm512_roundscale_round_phExperimentalavx512fp16
Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm512_rsqrt_phExperimentalavx512fp16
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm512_scalef_phExperimentalavx512fp16
Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
_mm512_scalef_round_phExperimentalavx512fp16
Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
_mm512_set1_phExperimentalavx512fp16
Broadcast the half-precision (16-bit) floating-point value a to all elements of dst.
_mm512_set_phExperimentalavx512fp16
Set packed half-precision (16-bit) floating-point elements in dst with the supplied values.
_mm512_setr_phExperimentalavx512fp16
Set packed half-precision (16-bit) floating-point elements in dst with the supplied values in reverse order.
_mm512_setzero_phExperimentalavx512fp16
Return vector of type __m512h with all elements set to zero.
_mm512_sqrt_phExperimentalavx512fp16
Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst.
_mm512_sqrt_round_phExperimentalavx512fp16
Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
_mm512_store_ph⚠Experimentalavx512fp16
Store 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from a into memory. The address must be aligned to 64 bytes or a general-protection exception may be generated.
_mm512_storeu_ph⚠Experimentalavx512fp16
Store 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from a into memory. The address does not need to be aligned to any particular boundary.
_mm512_sub_phExperimentalavx512fp16
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst.
_mm512_sub_round_phExperimentalavx512fp16
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
_mm512_undefined_phExperimentalavx512fp16
Return vector of type __m512h with indetermination elements. Despite using the word β€œundefined” (following Intel’s naming scheme), this non-deterministically picks some valid value and is not equivalent to mem::MaybeUninit. In practice, this is typically equivalent to mem::zeroed.
_mm512_zextph128_ph512Experimentalavx512fp16
Cast vector of type __m128h to type __m512h. The upper 24 elements of the result are zeroed. This intrinsic can generate the vzeroupper instruction, but most of the time it does not generate any instructions.
_mm512_zextph256_ph512Experimentalavx512fp16
Cast vector of type __m256h to type __m512h. The upper 16 elements of the result are zeroed. This intrinsic can generate the vzeroupper instruction, but most of the time it does not generate any instructions.
_mm_abs_phExperimentalavx512fp16 and avx512vl
Finds the absolute value of each packed half-precision (16-bit) floating-point element in v2, storing the results in dst.
_mm_add_phExperimentalavx512fp16 and avx512vl
Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
_mm_add_round_shExperimentalavx512fp16
Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
_mm_add_shExperimentalavx512fp16
Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_castpd_phExperimentalavx512fp16
Cast vector of type __m128d to type __m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm_castph_pdExperimentalavx512fp16
Cast vector of type __m128h to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm_castph_psExperimentalavx512fp16
Cast vector of type __m128h to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm_castph_si128Experimentalavx512fp16
Cast vector of type __m128h to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm_castps_phExperimentalavx512fp16
Cast vector of type __m128 to type __m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm_castsi128_phExperimentalavx512fp16
Cast vector of type __m128i to type __m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm_cmp_ph_maskExperimentalavx512fp16 and avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
_mm_cmp_round_sh_maskExperimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
_mm_cmp_sh_maskExperimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k.
_mm_cmul_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_cmul_round_schExperimentalavx512fp16
Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1],
_mm_cmul_schExperimentalavx512fp16
Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1],
_mm_comi_round_shExperimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and return the boolean result (0 or 1). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
_mm_comi_shExperimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and return the boolean result (0 or 1).
_mm_comieq_shExperimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b for equality, and return the boolean result (0 or 1).
_mm_comige_shExperimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than-or-equal, and return the boolean result (0 or 1).
_mm_comigt_shExperimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than, and return the boolean result (0 or 1).
_mm_comile_shExperimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than-or-equal, and return the boolean result (0 or 1).
_mm_comilt_shExperimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than, and return the boolean result (0 or 1).
_mm_comineq_shExperimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b for not-equal, and return the boolean result (0 or 1).
_mm_conj_pchExperimentalavx512fp16 and avx512vl
Compute the complex conjugates of complex numbers in a, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_cvt_roundi32_shExperimentalavx512fp16
Convert the signed 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_cvt_roundsd_shExperimentalavx512fp16
Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_cvt_roundsh_i32Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
_mm_cvt_roundsh_sdExperimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
_mm_cvt_roundsh_ssExperimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
_mm_cvt_roundsh_u32Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer, and store the result in dst.
_mm_cvt_roundss_shExperimentalavx512fp16
Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_cvt_roundu32_shExperimentalavx512fp16
Convert the unsigned 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_cvtepi16_phExperimentalavx512fp16 and avx512vl
Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm_cvtepi32_phExperimentalavx512fp16 and avx512vl
Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
_mm_cvtepi64_phExperimentalavx512fp16 and avx512vl
Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 96 bits of dst are zeroed out.
_mm_cvtepu16_phExperimentalavx512fp16 and avx512vl
Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm_cvtepu32_phExperimentalavx512fp16 and avx512vl
Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
_mm_cvtepu64_phExperimentalavx512fp16 and avx512vl
Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 96 bits of dst are zeroed out.
_mm_cvti32_shExperimentalavx512fp16
Convert the signed 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_cvtpd_phExperimentalavx512fp16 and avx512vl
Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 96 bits of dst are zeroed out.
_mm_cvtph_epi16Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
_mm_cvtph_epi32Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
_mm_cvtph_epi64Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
_mm_cvtph_epu16Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
_mm_cvtph_epu32Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
_mm_cvtph_epu64Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
_mm_cvtph_pdExperimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
_mm_cvtsd_shExperimentalavx512fp16
Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_cvtsh_hExperimentalavx512fp16
Copy the lower half-precision (16-bit) floating-point element from a to dst.
_mm_cvtsh_i32Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
_mm_cvtsh_sdExperimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
_mm_cvtsh_ssExperimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
_mm_cvtsh_u32Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer, and store the result in dst.
_mm_cvtsi16_si128Experimentalavx512fp16
Copy 16-bit integer a to the lower elements of dst, and zero the upper elements of dst.
_mm_cvtsi128_si16Experimentalavx512fp16
Copy the lower 16-bit integer in a to dst.
_mm_cvtss_shExperimentalavx512fp16
Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_cvtt_roundsh_i32Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
_mm_cvtt_roundsh_u32Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer with truncation, and store the result in dst.
_mm_cvttph_epi16Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
_mm_cvttph_epi32Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
_mm_cvttph_epi64Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
_mm_cvttph_epu16Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
_mm_cvttph_epu32Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
_mm_cvttph_epu64Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
_mm_cvttsh_i32Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
_mm_cvttsh_u32Experimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer with truncation, and store the result in dst.
_mm_cvtu32_shExperimentalavx512fp16
Convert the unsigned 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_cvtxph_psExperimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
_mm_cvtxps_phExperimentalavx512fp16 and avx512vl
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
_mm_div_phExperimentalavx512fp16 and avx512vl
Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst.
_mm_div_round_shExperimentalavx512fp16
Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
_mm_div_shExperimentalavx512fp16
Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fcmadd_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_fcmadd_round_schExperimentalavx512fp16
Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_fcmadd_schExperimentalavx512fp16
Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_fcmul_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_fcmul_round_schExperimentalavx512fp16
Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1],
_mm_fcmul_schExperimentalavx512fp16
Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_fmadd_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_fmadd_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
_mm_fmadd_round_schExperimentalavx512fp16
Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_fmadd_round_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fmadd_schExperimentalavx512fp16
Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_fmadd_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fmaddsub_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
_mm_fmsub_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
_mm_fmsub_round_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fmsub_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fmsubadd_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
_mm_fmul_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_fmul_round_schExperimentalavx512fp16
Multiply the lower complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_fmul_schExperimentalavx512fp16
Multiply the lower complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_fnmadd_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
_mm_fnmadd_round_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fnmadd_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fnmsub_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
_mm_fnmsub_round_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fnmsub_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_fpclass_ph_maskExperimentalavx512fp16 and avx512vl
Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
_mm_fpclass_sh_maskExperimentalavx512fp16
Test the lower half-precision (16-bit) floating-point element in a for special categories specified by imm8, and store the result in mask vector k. imm can be a combination of:
_mm_getexp_phExperimentalavx512fp16 and avx512vl
Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
_mm_getexp_round_shExperimentalavx512fp16
Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm_getexp_shExperimentalavx512fp16
Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
_mm_getmant_phExperimentalavx512fp16 and avx512vl
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm_getmant_round_shExperimentalavx512fp16
Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm_getmant_shExperimentalavx512fp16
Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm_load_ph⚠Experimentalavx512fp16 and avx512vl
Load 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address must be aligned to 16 bytes or a general-protection exception may be generated.
_mm_load_sh⚠Experimentalavx512fp16
Load a half-precision (16-bit) floating-point element from memory into the lower element of a new vector, and zero the upper elements
_mm_loadu_ph⚠Experimentalavx512fp16 and avx512vl
Load 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address does not need to be aligned to any particular boundary.
_mm_mask3_fcmadd_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask3_fcmadd_round_schExperimentalavx512fp16
Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (the element is copied from c when the corresponding mask bit is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask3_fcmadd_schExperimentalavx512fp16
Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (the element is copied from c when the corresponding mask bit is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask3_fmadd_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask3_fmadd_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm_mask3_fmadd_round_schExperimentalavx512fp16
Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (elements are copied from c when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask3_fmadd_round_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
_mm_mask3_fmadd_schExperimentalavx512fp16
Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (elements are copied from c when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask3_fmadd_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
_mm_mask3_fmaddsub_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm_mask3_fmsub_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm_mask3_fmsub_round_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
_mm_mask3_fmsub_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
_mm_mask3_fmsubadd_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm_mask3_fnmadd_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm_mask3_fnmadd_round_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
_mm_mask3_fnmadd_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
_mm_mask3_fnmsub_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
_mm_mask3_fnmsub_round_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
_mm_mask3_fnmsub_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
_mm_mask_add_phExperimentalavx512fp16 and avx512vl
Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_add_round_shExperimentalavx512fp16
Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm_mask_add_shExperimentalavx512fp16
Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
_mm_mask_blend_phExperimentalavx512fp16 and avx512vl
Blend packed half-precision (16-bit) floating-point elements from a and b using control mask k, and store the results in dst.
_mm_mask_cmp_ph_maskExperimentalavx512fp16 and avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_cmp_round_sh_maskExperimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k using zeromask k1. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
_mm_mask_cmp_sh_maskExperimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k using zeromask k1.
_mm_mask_cmul_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_cmul_round_schExperimentalavx512fp16
Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_cmul_schExperimentalavx512fp16
Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1],
_mm_mask_conj_pchExperimentalavx512fp16 and avx512vl
Compute the complex conjugates of complex numbers in a, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_cvt_roundsd_shExperimentalavx512fp16
Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_cvt_roundsh_sdExperimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
_mm_mask_cvt_roundsh_ssExperimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
_mm_mask_cvt_roundss_shExperimentalavx512fp16
Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_cvtepi16_phExperimentalavx512fp16 and avx512vl
Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm_mask_cvtepi32_phExperimentalavx512fp16 and avx512vl
Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm_mask_cvtepi64_phExperimentalavx512fp16 and avx512vl
Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
_mm_mask_cvtepu16_phExperimentalavx512fp16 and avx512vl
Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm_mask_cvtepu32_phExperimentalavx512fp16 and avx512vl
Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm_mask_cvtepu64_phExperimentalavx512fp16 and avx512vl
Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
_mm_mask_cvtpd_phExperimentalavx512fp16 and avx512vl
Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
_mm_mask_cvtph_epi16Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvtph_epi32Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvtph_epi64Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvtph_epu16Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvtph_epu32Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvtph_epu64Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvtph_pdExperimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm_mask_cvtsd_shExperimentalavx512fp16
Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_cvtsh_sdExperimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
_mm_mask_cvtsh_ssExperimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
_mm_mask_cvtss_shExperimentalavx512fp16
Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_cvttph_epi16Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvttph_epi32Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvttph_epi64Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvttph_epu16Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvttph_epu32Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvttph_epu64Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_cvtxph_psExperimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
_mm_mask_cvtxps_phExperimentalavx512fp16 and avx512vl
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm_mask_div_phExperimentalavx512fp16 and avx512vl
Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_div_round_shExperimentalavx512fp16
Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm_mask_div_shExperimentalavx512fp16
Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
_mm_mask_fcmadd_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_fcmadd_round_schExperimentalavx512fp16
Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (the element is copied from a when the corresponding mask bit is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_fcmadd_schExperimentalavx512fp16
Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (the element is copied from a when the corresponding mask bit is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_fcmul_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_fcmul_round_schExperimentalavx512fp16
Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_fcmul_schExperimentalavx512fp16
Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_mask_fmadd_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_fmadd_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm_mask_fmadd_round_schExperimentalavx512fp16
Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (elements are copied from a when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_fmadd_round_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_fmadd_schExperimentalavx512fp16
Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (elements are copied from a when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_fmadd_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_fmaddsub_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm_mask_fmsub_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm_mask_fmsub_round_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_fmsub_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_fmsubadd_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm_mask_fmul_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_fmul_round_schExperimentalavx512fp16
Multiply the lower complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_fmul_schExperimentalavx512fp16
Multiply the lower complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_fnmadd_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm_mask_fnmadd_round_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_fnmadd_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_fnmsub_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
_mm_mask_fnmsub_round_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_fnmsub_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_fpclass_ph_maskExperimentalavx512fp16 and avx512vl
Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
_mm_mask_fpclass_sh_maskExperimentalavx512fp16
Test the lower half-precision (16-bit) floating-point element in a for special categories specified by imm8, and store the result in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
_mm_mask_getexp_phExperimentalavx512fp16 and avx512vl
Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm_mask_getexp_round_shExperimentalavx512fp16
Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm_mask_getexp_shExperimentalavx512fp16
Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
_mm_mask_getmant_phExperimentalavx512fp16 and avx512vl
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm_mask_getmant_round_shExperimentalavx512fp16
Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm_mask_getmant_shExperimentalavx512fp16
Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm_mask_load_sh⚠Experimentalavx512fp16
Load a half-precision (16-bit) floating-point element from memory into the lower element of a new vector using writemask k (the element is copied from src when mask bit 0 is not set), and zero the upper elements.
_mm_mask_max_phExperimentalavx512fp16 and avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_mask_max_round_shExperimentalavx512fp16 and avx512vl
Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_mask_max_shExperimentalavx512fp16 and avx512vl
Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_mask_min_phExperimentalavx512fp16 and avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_mask_min_round_shExperimentalavx512fp16 and avx512vl
Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_mask_min_shExperimentalavx512fp16 and avx512vl
Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_mask_move_shExperimentalavx512fp16
Move the lower half-precision (16-bit) floating-point element from b to the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_mul_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_mul_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_mul_round_schExperimentalavx512fp16
Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_mul_round_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm_mask_mul_schExperimentalavx512fp16
Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mask_mul_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
_mm_mask_rcp_phExperimentalavx512fp16 and avx512vl
Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_mask_rcp_shExperimentalavx512fp16
Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_mask_reduce_phExperimentalavx512fp16 and avx512vl
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_reduce_round_shExperimentalavx512fp16
Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_reduce_shExperimentalavx512fp16
Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_roundscale_phExperimentalavx512fp16 and avx512vl
Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_roundscale_round_shExperimentalavx512fp16
Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_roundscale_shExperimentalavx512fp16
Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_rsqrt_phExperimentalavx512fp16 and avx512vl
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_mask_rsqrt_shExperimentalavx512fp16
Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_mask_scalef_phExperimentalavx512fp16 and avx512vl
Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_scalef_round_shExperimentalavx512fp16
Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_scalef_shExperimentalavx512fp16
Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_sqrt_phExperimentalavx512fp16 and avx512vl
Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_sqrt_round_shExperimentalavx512fp16
Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
_mm_mask_sqrt_shExperimentalavx512fp16
Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mask_store_sh⚠Experimentalavx512fp16
Store the lower half-precision (16-bit) floating-point element from a into memory using writemask k
_mm_mask_sub_phExperimentalavx512fp16 and avx512vl
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_sub_round_shExperimentalavx512fp16
Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm_mask_sub_shExperimentalavx512fp16
Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
_mm_maskz_add_phExperimentalavx512fp16 and avx512vl
Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_add_round_shExperimentalavx512fp16
Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm_maskz_add_shExperimentalavx512fp16
Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
_mm_maskz_cmul_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_cmul_round_schExperimentalavx512fp16
Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_cmul_schExperimentalavx512fp16
Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1],
_mm_maskz_conj_pchExperimentalavx512fp16 and avx512vl
Compute the complex conjugates of complex numbers in a, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_cvt_roundsd_shExperimentalavx512fp16
Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_cvt_roundsh_sdExperimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
_mm_maskz_cvt_roundsh_ssExperimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
_mm_maskz_cvt_roundss_shExperimentalavx512fp16
Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_cvtepi16_phExperimentalavx512fp16 and avx512vl
Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtepi32_phExperimentalavx512fp16 and avx512vl
Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm_maskz_cvtepi64_phExperimentalavx512fp16 and avx512vl
Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
_mm_maskz_cvtepu16_phExperimentalavx512fp16 and avx512vl
Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtepu32_phExperimentalavx512fp16 and avx512vl
Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm_maskz_cvtepu64_phExperimentalavx512fp16 and avx512vl
Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
_mm_maskz_cvtpd_phExperimentalavx512fp16 and avx512vl
Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
_mm_maskz_cvtph_epi16Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtph_epi32Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtph_epi64Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtph_epu16Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtph_epu32Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtph_epu64Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtph_pdExperimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtsd_shExperimentalavx512fp16
Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_cvtsh_sdExperimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
_mm_maskz_cvtsh_ssExperimentalavx512fp16
Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
_mm_maskz_cvtss_shExperimentalavx512fp16
Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_cvttph_epi16Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvttph_epi32Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvttph_epi64Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvttph_epu16Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvttph_epu32Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvttph_epu64Experimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtxph_psExperimentalavx512fp16 and avx512vl
Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_cvtxps_phExperimentalavx512fp16 and avx512vl
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
_mm_maskz_div_phExperimentalavx512fp16 and avx512vl
Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_div_round_shExperimentalavx512fp16
Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm_maskz_div_shExperimentalavx512fp16
Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
_mm_maskz_fcmadd_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_fcmadd_round_schExperimentalavx512fp16
Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c using zeromask k (the element is zeroed out when the corresponding mask bit is not set), and store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1, or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_fcmadd_schExperimentalavx512fp16
Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_fcmul_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_fcmul_round_schExperimentalavx512fp16
Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_fcmul_schExperimentalavx512fp16
Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugate conjugate = vec.fp16[0] - i * vec.fp16[1].
_mm_maskz_fmadd_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_fmadd_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm_maskz_fmadd_round_schExperimentalavx512fp16
Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using zeromask k (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_fmadd_round_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_fmadd_schExperimentalavx512fp16
Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using zeromask k (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_fmadd_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_fmaddsub_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm_maskz_fmsub_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm_maskz_fmsub_round_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_fmsub_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_fmsubadd_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm_maskz_fmul_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_fmul_round_schExperimentalavx512fp16
Multiply the lower complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_fmul_schExperimentalavx512fp16
Multiply the lower complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_fnmadd_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm_maskz_fnmadd_round_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_fnmadd_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_fnmsub_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
_mm_maskz_fnmsub_round_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_fnmsub_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_getexp_phExperimentalavx512fp16 and avx512vl
Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm_maskz_getexp_round_shExperimentalavx512fp16
Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm_maskz_getexp_shExperimentalavx512fp16
Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
_mm_maskz_getmant_phExperimentalavx512fp16 and avx512vl
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm_maskz_getmant_round_shExperimentalavx512fp16
Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
_mm_maskz_getmant_shExperimentalavx512fp16
Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates Β±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
_mm_maskz_load_sh⚠Experimentalavx512fp16
Load a half-precision (16-bit) floating-point element from memory into the lower element of a new vector using zeromask k (the element is zeroed out when mask bit 0 is not set), and zero the upper elements.
_mm_maskz_max_phExperimentalavx512fp16 and avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_maskz_max_round_shExperimentalavx512fp16 and avx512vl
Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_maskz_max_shExperimentalavx512fp16 and avx512vl
Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_maskz_min_phExperimentalavx512fp16 and avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_maskz_min_round_shExperimentalavx512fp16 and avx512vl
Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_maskz_min_shExperimentalavx512fp16 and avx512vl
Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_maskz_move_shExperimentalavx512fp16
Move the lower half-precision (16-bit) floating-point element from b to the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_mul_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_mul_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_mul_round_schExperimentalavx512fp16
Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_mul_round_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm_maskz_mul_schExperimentalavx512fp16
Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_maskz_mul_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
_mm_maskz_rcp_phExperimentalavx512fp16 and avx512vl
Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_maskz_rcp_shExperimentalavx512fp16
Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_maskz_reduce_phExperimentalavx512fp16 and avx512vl
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_reduce_round_shExperimentalavx512fp16
Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_reduce_shExperimentalavx512fp16
Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_roundscale_phExperimentalavx512fp16 and avx512vl
Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_roundscale_round_shExperimentalavx512fp16
Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_roundscale_shExperimentalavx512fp16
Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_rsqrt_phExperimentalavx512fp16 and avx512vl
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_maskz_rsqrt_shExperimentalavx512fp16
Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_maskz_scalef_phExperimentalavx512fp16 and avx512vl
Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_scalef_round_shExperimentalavx512fp16
Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_scalef_shExperimentalavx512fp16
Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_sqrt_phExperimentalavx512fp16 and avx512vl
Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_sqrt_round_shExperimentalavx512fp16
Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
_mm_maskz_sqrt_shExperimentalavx512fp16
Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_maskz_sub_phExperimentalavx512fp16 and avx512vl
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_sub_round_shExperimentalavx512fp16
Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
_mm_maskz_sub_shExperimentalavx512fp16
Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
_mm_max_phExperimentalavx512fp16 and avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_max_round_shExperimentalavx512fp16 and avx512vl
Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_max_shExperimentalavx512fp16 and avx512vl
Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
_mm_min_phExperimentalavx512fp16 and avx512vl
Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_min_round_shExperimentalavx512fp16 and avx512vl
Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_min_shExperimentalavx512fp16 and avx512vl
Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
_mm_move_shExperimentalavx512fp16
Move the lower half-precision (16-bit) floating-point element from b to the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_mul_pchExperimentalavx512fp16 and avx512vl
Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mul_phExperimentalavx512fp16 and avx512vl
Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
_mm_mul_round_schExperimentalavx512fp16
Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mul_round_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
_mm_mul_schExperimentalavx512fp16
Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
_mm_mul_shExperimentalavx512fp16
Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_permutex2var_phExperimentalavx512fp16 and avx512vl
Shuffle half-precision (16-bit) floating-point elements in a and b using the corresponding selector and index in idx, and store the results in dst.
_mm_permutexvar_phExperimentalavx512fp16 and avx512vl
Shuffle half-precision (16-bit) floating-point elements in a using the corresponding index in idx, and store the results in dst.
_mm_rcp_phExperimentalavx512fp16 and avx512vl
Compute the approximate reciprocal of packed 16-bit floating-point elements in a and stores the results in dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_rcp_shExperimentalavx512fp16
Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_reduce_add_phExperimentalavx512fp16 and avx512vl
Reduce the packed half-precision (16-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
_mm_reduce_max_phExperimentalavx512fp16 and avx512vl
Reduce the packed half-precision (16-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
_mm_reduce_min_phExperimentalavx512fp16 and avx512vl
Reduce the packed half-precision (16-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
_mm_reduce_mul_phExperimentalavx512fp16 and avx512vl
Reduce the packed half-precision (16-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
_mm_reduce_phExperimentalavx512fp16 and avx512vl
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
_mm_reduce_round_shExperimentalavx512fp16
Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_reduce_shExperimentalavx512fp16
Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_roundscale_phExperimentalavx512fp16 and avx512vl
Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
_mm_roundscale_round_shExperimentalavx512fp16
Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_roundscale_shExperimentalavx512fp16
Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_rsqrt_phExperimentalavx512fp16 and avx512vl
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_rsqrt_shExperimentalavx512fp16
Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm_scalef_phExperimentalavx512fp16 and avx512vl
Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
_mm_scalef_round_shExperimentalavx512fp16
Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_scalef_shExperimentalavx512fp16
Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_set1_phExperimentalavx512fp16
Broadcast the half-precision (16-bit) floating-point value a to all elements of dst.
_mm_set_phExperimentalavx512fp16
Set packed half-precision (16-bit) floating-point elements in dst with the supplied values.
_mm_set_shExperimentalavx512fp16
Copy half-precision (16-bit) floating-point elements from a to the lower element of dst and zero the upper 7 elements.
_mm_setr_phExperimentalavx512fp16
Set packed half-precision (16-bit) floating-point elements in dst with the supplied values in reverse order.
_mm_setzero_phExperimentalavx512fp16 and avx512vl
Return vector of type __m128h with all elements set to zero.
_mm_sqrt_phExperimentalavx512fp16 and avx512vl
Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst.
_mm_sqrt_round_shExperimentalavx512fp16
Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
_mm_sqrt_shExperimentalavx512fp16
Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_store_ph⚠Experimentalavx512fp16 and avx512vl
Store 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from a into memory. The address must be aligned to 16 bytes or a general-protection exception may be generated.
_mm_store_sh⚠Experimentalavx512fp16
Store the lower half-precision (16-bit) floating-point element from a into memory.
_mm_storeu_ph⚠Experimentalavx512fp16 and avx512vl
Store 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from a into memory. The address does not need to be aligned to any particular boundary.
_mm_sub_phExperimentalavx512fp16 and avx512vl
Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst.
_mm_sub_round_shExperimentalavx512fp16
Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
_mm_sub_shExperimentalavx512fp16
Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
_mm_ucomieq_shExperimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b for equality, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
_mm_ucomige_shExperimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
_mm_ucomigt_shExperimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
_mm_ucomile_shExperimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
_mm_ucomilt_shExperimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
_mm_ucomineq_shExperimentalavx512fp16
Compare the lower half-precision (16-bit) floating-point elements in a and b for not-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
_mm_undefined_phExperimentalavx512fp16 and avx512vl
Return vector of type __m128h with indetermination elements. Despite using the word β€œundefined” (following Intel’s naming scheme), this non-deterministically picks some valid value and is not equivalent to mem::MaybeUninit. In practice, this is typically equivalent to mem::zeroed.