Module avx

Source

Available on x86 or x86-64 only.

Expand description

Advanced Vector Extensions (AVX)

The references are:

Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 2: Instruction Set Reference, A-Z. - AMD64 Architecture Programmer’s Manual, Volume 3: General-Purpose and System Instructions.

Wikipedia provides a quick overview of the instructions available.

Constants§

_CMP_EQ_OQ: Equal (ordered, non-signaling)
_CMP_EQ_OS: Equal (ordered, signaling)
_CMP_EQ_UQ: Equal (unordered, non-signaling)
_CMP_EQ_US: Equal (unordered, signaling)
_CMP_FALSE_OQ: False (ordered, non-signaling)
_CMP_FALSE_OS: False (ordered, signaling)
_CMP_GE_OQ: Greater-than-or-equal (ordered, non-signaling)
_CMP_GE_OS: Greater-than-or-equal (ordered, signaling)
_CMP_GT_OQ: Greater-than (ordered, non-signaling)
_CMP_GT_OS: Greater-than (ordered, signaling)
_CMP_LE_OQ: Less-than-or-equal (ordered, non-signaling)
_CMP_LE_OS: Less-than-or-equal (ordered, signaling)
_CMP_LT_OQ: Less-than (ordered, non-signaling)
_CMP_LT_OS: Less-than (ordered, signaling)
_CMP_NEQ_OQ: Not-equal (ordered, non-signaling)
_CMP_NEQ_OS: Not-equal (ordered, signaling)
_CMP_NEQ_UQ: Not-equal (unordered, non-signaling)
_CMP_NEQ_US: Not-equal (unordered, signaling)
_CMP_NGE_UQ: Not-greater-than-or-equal (unordered, non-signaling)
_CMP_NGE_US: Not-greater-than-or-equal (unordered, signaling)
_CMP_NGT_UQ: Not-greater-than (unordered, non-signaling)
_CMP_NGT_US: Not-greater-than (unordered, signaling)
_CMP_NLE_UQ: Not-less-than-or-equal (unordered, non-signaling)
_CMP_NLE_US: Not-less-than-or-equal (unordered, signaling)
_CMP_NLT_UQ: Not-less-than (unordered, non-signaling)
_CMP_NLT_US: Not-less-than (unordered, signaling)
_CMP_ORD_Q: Ordered (non-signaling)
_CMP_ORD_S: Ordered (signaling)
_CMP_TRUE_UQ: True (unordered, non-signaling)
_CMP_TRUE_US: True (unordered, signaling)
_CMP_UNORD_Q: Unordered (non-signaling)
_CMP_UNORD_S: Unordered (signaling)

Functions§

_mm256_add_pdavx: Adds packed double-precision (64-bit) floating-point elements in a and b.
_mm256_add_psavx: Adds packed single-precision (32-bit) floating-point elements in a and b.
_mm256_addsub_pdavx: Alternatively adds and subtracts packed double-precision (64-bit) floating-point elements in a to/from packed elements in b.
_mm256_addsub_psavx: Alternatively adds and subtracts packed single-precision (32-bit) floating-point elements in a to/from packed elements in b.
_mm256_and_pdavx: Computes the bitwise AND of a packed double-precision (64-bit) floating-point elements in a and b.
_mm256_and_psavx: Computes the bitwise AND of packed single-precision (32-bit) floating-point elements in a and b.
_mm256_andnot_pdavx: Computes the bitwise NOT of packed double-precision (64-bit) floating-point elements in a, and then AND with b.
_mm256_andnot_psavx: Computes the bitwise NOT of packed single-precision (32-bit) floating-point elements in a and then AND with b.
_mm256_blend_pdavx: Blends packed double-precision (64-bit) floating-point elements from a and b using control mask imm8.
_mm256_blend_psavx: Blends packed single-precision (32-bit) floating-point elements from a and b using control mask imm8.
_mm256_blendv_pdavx: Blends packed double-precision (64-bit) floating-point elements from a and b using c as a mask.
_mm256_blendv_psavx: Blends packed single-precision (32-bit) floating-point elements from a and b using c as a mask.
_mm256_broadcast_pdavx: Broadcasts 128 bits from memory (composed of 2 packed double-precision (64-bit) floating-point elements) to all elements of the returned vector.
_mm256_broadcast_psavx: Broadcasts 128 bits from memory (composed of 4 packed single-precision (32-bit) floating-point elements) to all elements of the returned vector.
_mm256_broadcast_sdavx: Broadcasts a double-precision (64-bit) floating-point element from memory to all elements of the returned vector.
_mm256_broadcast_ssavx: Broadcasts a single-precision (32-bit) floating-point element from memory to all elements of the returned vector.
_mm256_castpd128_pd256avx: Casts vector of type __m128d to type __m256d; the upper 128 bits of the result are indeterminate.
_mm256_castpd256_pd128avx: Casts vector of type __m256d to type __m128d.
_mm256_castpd_psavx: Cast vector of type __m256d to type __m256.
_mm256_castpd_si256avx: Casts vector of type __m256d to type __m256i.
_mm256_castps128_ps256avx: Casts vector of type __m128 to type __m256; the upper 128 bits of the result are indeterminate.
_mm256_castps256_ps128avx: Casts vector of type __m256 to type __m128.
_mm256_castps_pdavx: Cast vector of type __m256 to type __m256d.
_mm256_castps_si256avx: Casts vector of type __m256 to type __m256i.
_mm256_castsi128_si256avx: Casts vector of type __m128i to type __m256i; the upper 128 bits of the result are indeterminate.
_mm256_castsi256_pdavx: Casts vector of type __m256i to type __m256d.
_mm256_castsi256_psavx: Casts vector of type __m256i to type __m256.
_mm256_castsi256_si128avx: Casts vector of type __m256i to type __m128i.
_mm256_ceil_pdavx: Rounds packed double-precision (64-bit) floating point elements in a toward positive infinity.
_mm256_ceil_psavx: Rounds packed single-precision (32-bit) floating point elements in a toward positive infinity.
_mm256_cmp_pdavx: Compares packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by IMM5.
_mm256_cmp_psavx: Compares packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by IMM5.
_mm256_cvtepi32_pdavx: Converts packed 32-bit integers in a to packed double-precision (64-bit) floating-point elements.
_mm256_cvtepi32_psavx: Converts packed 32-bit integers in a to packed single-precision (32-bit) floating-point elements.
_mm256_cvtpd_epi32avx: Converts packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers.
_mm256_cvtpd_psavx: Converts packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements.
_mm256_cvtps_epi32avx: Converts packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers.
_mm256_cvtps_pdavx: Converts packed single-precision (32-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements.
_mm256_cvtsd_f64avx: Returns the first element of the input vector of [4 x double].
_mm256_cvtsi256_si32avx: Returns the first element of the input vector of [8 x i32].
_mm256_cvtss_f32avx: Returns the first element of the input vector of [8 x float].
_mm256_cvttpd_epi32avx: Converts packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation.
_mm256_cvttps_epi32avx: Converts packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation.
_mm256_div_pdavx: Computes the division of each of the 4 packed 64-bit floating-point elements in a by the corresponding packed elements in b.
_mm256_div_psavx: Computes the division of each of the 8 packed 32-bit floating-point elements in a by the corresponding packed elements in b.
_mm256_dp_psavx: Conditionally multiplies the packed single-precision (32-bit) floating-point elements in a and b using the high 4 bits in imm8, sum the four products, and conditionally return the sum using the low 4 bits of imm8.
_mm256_extract_epi32avx: Extracts a 32-bit integer from a, selected with INDEX.
_mm256_extractf128_pdavx: Extracts 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with imm8.
_mm256_extractf128_psavx: Extracts 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm8.
_mm256_extractf128_si256avx: Extracts 128 bits (composed of integer data) from a, selected with imm8.
_mm256_floor_pdavx: Rounds packed double-precision (64-bit) floating point elements in a toward negative infinity.
_mm256_floor_psavx: Rounds packed single-precision (32-bit) floating point elements in a toward negative infinity.
_mm256_hadd_pdavx: Horizontal addition of adjacent pairs in the two packed vectors of 4 64-bit floating points a and b. In the result, sums of elements from a are returned in even locations, while sums of elements from b are returned in odd locations.
_mm256_hadd_psavx: Horizontal addition of adjacent pairs in the two packed vectors of 8 32-bit floating points a and b. In the result, sums of elements from a are returned in locations of indices 0, 1, 4, 5; while sums of elements from b are locations 2, 3, 6, 7.
_mm256_hsub_pdavx: Horizontal subtraction of adjacent pairs in the two packed vectors of 4 64-bit floating points a and b. In the result, sums of elements from a are returned in even locations, while sums of elements from b are returned in odd locations.
_mm256_hsub_psavx: Horizontal subtraction of adjacent pairs in the two packed vectors of 8 32-bit floating points a and b. In the result, sums of elements from a are returned in locations of indices 0, 1, 4, 5; while sums of elements from b are locations 2, 3, 6, 7.
_mm256_insert_epi8avx: Copies a to result, and inserts the 8-bit integer i into result at the location specified by index.
_mm256_insert_epi16avx: Copies a to result, and inserts the 16-bit integer i into result at the location specified by index.
_mm256_insert_epi32avx: Copies a to result, and inserts the 32-bit integer i into result at the location specified by index.
_mm256_insertf128_pdavx: Copies a to result, then inserts 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into result at the location specified by imm8.
_mm256_insertf128_psavx: Copies a to result, then inserts 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into result at the location specified by imm8.
_mm256_insertf128_si256avx: Copies a to result, then inserts 128 bits from b into result at the location specified by imm8.
_mm256_lddqu_si256^⚠avx: Loads 256-bits of integer data from unaligned memory into result. This intrinsic may perform better than _mm256_loadu_si256 when the data crosses a cache line boundary.
_mm256_load_pd^⚠avx: Loads 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory into result. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm256_load_ps^⚠avx: Loads 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory into result. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm256_load_si256^⚠avx: Loads 256-bits of integer data from memory into result. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm256_loadu2_m128^⚠avx: Loads two 128-bit values (composed of 4 packed single-precision (32-bit) floating-point elements) from memory, and combine them into a 256-bit value. hiaddr and loaddr do not need to be aligned on any particular boundary.
_mm256_loadu2_m128d^⚠avx: Loads two 128-bit values (composed of 2 packed double-precision (64-bit) floating-point elements) from memory, and combine them into a 256-bit value. hiaddr and loaddr do not need to be aligned on any particular boundary.
_mm256_loadu2_m128i^⚠avx: Loads two 128-bit values (composed of integer data) from memory, and combine them into a 256-bit value. hiaddr and loaddr do not need to be aligned on any particular boundary.
_mm256_loadu_pd^⚠avx: Loads 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory into result. mem_addr does not need to be aligned on any particular boundary.
_mm256_loadu_ps^⚠avx: Loads 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory into result. mem_addr does not need to be aligned on any particular boundary.
_mm256_loadu_si256^⚠avx: Loads 256-bits of integer data from memory into result. mem_addr does not need to be aligned on any particular boundary.
_mm256_maskload_pd^⚠avx: Loads packed double-precision (64-bit) floating-point elements from memory into result using mask (elements are zeroed out when the high bit of the corresponding element is not set).
_mm256_maskload_ps^⚠avx: Loads packed single-precision (32-bit) floating-point elements from memory into result using mask (elements are zeroed out when the high bit of the corresponding element is not set).
_mm256_maskstore_pd^⚠avx: Stores packed double-precision (64-bit) floating-point elements from a into memory using mask.
_mm256_maskstore_ps^⚠avx: Stores packed single-precision (32-bit) floating-point elements from a into memory using mask.
_mm256_max_pdavx: Compares packed double-precision (64-bit) floating-point elements in a and b, and returns packed maximum values
_mm256_max_psavx: Compares packed single-precision (32-bit) floating-point elements in a and b, and returns packed maximum values
_mm256_min_pdavx: Compares packed double-precision (64-bit) floating-point elements in a and b, and returns packed minimum values
_mm256_min_psavx: Compares packed single-precision (32-bit) floating-point elements in a and b, and returns packed minimum values
_mm256_movedup_pdavx: Duplicate even-indexed double-precision (64-bit) floating-point elements from a, and returns the results.
_mm256_movehdup_psavx: Duplicate odd-indexed single-precision (32-bit) floating-point elements from a, and returns the results.
_mm256_moveldup_psavx: Duplicate even-indexed single-precision (32-bit) floating-point elements from a, and returns the results.
_mm256_movemask_pdavx: Sets each bit of the returned mask based on the most significant bit of the corresponding packed double-precision (64-bit) floating-point element in a.
_mm256_movemask_psavx: Sets each bit of the returned mask based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in a.
_mm256_mul_pdavx: Multiplies packed double-precision (64-bit) floating-point elements in a and b.
_mm256_mul_psavx: Multiplies packed single-precision (32-bit) floating-point elements in a and b.
_mm256_or_pdavx: Computes the bitwise OR packed double-precision (64-bit) floating-point elements in a and b.
_mm256_or_psavx: Computes the bitwise OR packed single-precision (32-bit) floating-point elements in a and b.
_mm256_permute2f128_pdavx: Shuffles 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) selected by imm8 from a and b.
_mm256_permute2f128_psavx: Shuffles 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) selected by imm8 from a and b.
_mm256_permute2f128_si256avx: Shuffles 128-bits (composed of integer data) selected by imm8 from a and b.
_mm256_permute_pdavx: Shuffles double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm8.
_mm256_permute_psavx: Shuffles single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8.
_mm256_permutevar_pdavx: Shuffles double-precision (64-bit) floating-point elements in a within 256-bit lanes using the control in b.
_mm256_permutevar_psavx: Shuffles single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b.
_mm256_rcp_psavx: Computes the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and returns the results. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm256_round_pdavx: Rounds packed double-precision (64-bit) floating point elements in a according to the flag ROUNDING. The value of ROUNDING may be as follows:
_mm256_round_psavx: Rounds packed single-precision (32-bit) floating point elements in a according to the flag ROUNDING. The value of ROUNDING may be as follows:
_mm256_rsqrt_psavx: Computes the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and returns the results. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm256_set1_epi8avx: Broadcasts 8-bit integer a to all elements of returned vector. This intrinsic may generate the vpbroadcastb.
_mm256_set1_epi16avx: Broadcasts 16-bit integer a to all elements of returned vector. This intrinsic may generate the vpbroadcastw.
_mm256_set1_epi32avx: Broadcasts 32-bit integer a to all elements of returned vector. This intrinsic may generate the vpbroadcastd.
_mm256_set1_epi64xavx: Broadcasts 64-bit integer a to all elements of returned vector. This intrinsic may generate the vpbroadcastq.
_mm256_set1_pdavx: Broadcasts double-precision (64-bit) floating-point value a to all elements of returned vector.
_mm256_set1_psavx: Broadcasts single-precision (32-bit) floating-point value a to all elements of returned vector.
_mm256_set_epi8avx: Sets packed 8-bit integers in returned vector with the supplied values.
_mm256_set_epi16avx: Sets packed 16-bit integers in returned vector with the supplied values.
_mm256_set_epi32avx: Sets packed 32-bit integers in returned vector with the supplied values.
_mm256_set_epi64xavx: Sets packed 64-bit integers in returned vector with the supplied values.
_mm256_set_m128avx: Sets packed __m256 returned vector with the supplied values.
_mm256_set_m128davx: Sets packed __m256d returned vector with the supplied values.
_mm256_set_m128iavx: Sets packed __m256i returned vector with the supplied values.
_mm256_set_pdavx: Sets packed double-precision (64-bit) floating-point elements in returned vector with the supplied values.
_mm256_set_psavx: Sets packed single-precision (32-bit) floating-point elements in returned vector with the supplied values.
_mm256_setr_epi8avx: Sets packed 8-bit integers in returned vector with the supplied values in reverse order.
_mm256_setr_epi16avx: Sets packed 16-bit integers in returned vector with the supplied values in reverse order.
_mm256_setr_epi32avx: Sets packed 32-bit integers in returned vector with the supplied values in reverse order.
_mm256_setr_epi64xavx: Sets packed 64-bit integers in returned vector with the supplied values in reverse order.
_mm256_setr_m128avx: Sets packed __m256 returned vector with the supplied values.
_mm256_setr_m128davx: Sets packed __m256d returned vector with the supplied values.
_mm256_setr_m128iavx: Sets packed __m256i returned vector with the supplied values.
_mm256_setr_pdavx: Sets packed double-precision (64-bit) floating-point elements in returned vector with the supplied values in reverse order.
_mm256_setr_psavx: Sets packed single-precision (32-bit) floating-point elements in returned vector with the supplied values in reverse order.
_mm256_setzero_pdavx: Returns vector of type __m256d with all elements set to zero.
_mm256_setzero_psavx: Returns vector of type __m256 with all elements set to zero.
_mm256_setzero_si256avx: Returns vector of type __m256i with all elements set to zero.
_mm256_shuffle_pdavx: Shuffles double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm8.
_mm256_shuffle_psavx: Shuffles single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8.
_mm256_sqrt_pdavx: Returns the square root of packed double-precision (64-bit) floating point elements in a.
_mm256_sqrt_psavx: Returns the square root of packed single-precision (32-bit) floating point elements in a.
_mm256_store_pd^⚠avx: Stores 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a into memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm256_store_ps^⚠avx: Stores 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a into memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm256_store_si256^⚠avx: Stores 256-bits of integer data from a into memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm256_storeu2_m128^⚠avx: Stores the high and low 128-bit halves (each composed of 4 packed single-precision (32-bit) floating-point elements) from a into memory two different 128-bit locations. hiaddr and loaddr do not need to be aligned on any particular boundary.
_mm256_storeu2_m128d^⚠avx: Stores the high and low 128-bit halves (each composed of 2 packed double-precision (64-bit) floating-point elements) from a into memory two different 128-bit locations. hiaddr and loaddr do not need to be aligned on any particular boundary.
_mm256_storeu2_m128i^⚠avx: Stores the high and low 128-bit halves (each composed of integer data) from a into memory two different 128-bit locations. hiaddr and loaddr do not need to be aligned on any particular boundary.
_mm256_storeu_pd^⚠avx: Stores 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a into memory. mem_addr does not need to be aligned on any particular boundary.
_mm256_storeu_ps^⚠avx: Stores 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a into memory. mem_addr does not need to be aligned on any particular boundary.
_mm256_storeu_si256^⚠avx: Stores 256-bits of integer data from a into memory. mem_addr does not need to be aligned on any particular boundary.
_mm256_stream_pd^⚠avx: Moves double-precision values from a 256-bit vector of [4 x double] to a 32-byte aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon).
_mm256_stream_ps^⚠avx: Moves single-precision floating point values from a 256-bit vector of [8 x float] to a 32-byte aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon).
_mm256_stream_si256^⚠avx: Moves integer data from a 256-bit integer vector to a 32-byte aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon)
_mm256_sub_pdavx: Subtracts packed double-precision (64-bit) floating-point elements in b from packed elements in a.
_mm256_sub_psavx: Subtracts packed single-precision (32-bit) floating-point elements in b from packed elements in a.
_mm256_testc_pdavx: Computes the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return the CF value.
_mm256_testc_psavx: Computes the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return the CF value.
_mm256_testc_si256avx: Computes the bitwise AND of 256 bits (representing integer data) in a and b, and set ZF to 1 if the result is zero, otherwise set ZF to 0. Computes the bitwise NOT of a and then AND with b, and set CF to 1 if the result is zero, otherwise set CF to 0. Return the CF value.
_mm256_testnzc_pdavx: Computes the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.
_mm256_testnzc_psavx: Computes the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.
_mm256_testnzc_si256avx: Computes the bitwise AND of 256 bits (representing integer data) in a and b, and set ZF to 1 if the result is zero, otherwise set ZF to 0. Computes the bitwise NOT of a and then AND with b, and set CF to 1 if the result is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.
_mm256_testz_pdavx: Computes the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return the ZF value.
_mm256_testz_psavx: Computes the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return the ZF value.
_mm256_testz_si256avx: Computes the bitwise AND of 256 bits (representing integer data) in a and b, and set ZF to 1 if the result is zero, otherwise set ZF to 0. Computes the bitwise NOT of a and then AND with b, and set CF to 1 if the result is zero, otherwise set CF to 0. Return the ZF value.
_mm256_undefined_pdavx: Returns vector of type __m256d with indeterminate elements. Despite using the word “undefined” (following Intel’s naming scheme), this non-deterministically picks some valid value and is not equivalent to mem::MaybeUninit. In practice, this is typically equivalent to mem::zeroed.
_mm256_undefined_psavx: Returns vector of type __m256 with indeterminate elements. Despite using the word “undefined” (following Intel’s naming scheme), this non-deterministically picks some valid value and is not equivalent to mem::MaybeUninit. In practice, this is typically equivalent to mem::zeroed.
_mm256_undefined_si256avx: Returns vector of type __m256i with with indeterminate elements. Despite using the word “undefined” (following Intel’s naming scheme), this non-deterministically picks some valid value and is not equivalent to mem::MaybeUninit. In practice, this is typically equivalent to mem::zeroed.
_mm256_unpackhi_pdavx: Unpacks and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b.
_mm256_unpackhi_psavx: Unpacks and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in a and b.
_mm256_unpacklo_pdavx: Unpacks and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b.
_mm256_unpacklo_psavx: Unpacks and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b.
_mm256_xor_pdavx: Computes the bitwise XOR of packed double-precision (64-bit) floating-point elements in a and b.
_mm256_xor_psavx: Computes the bitwise XOR of packed single-precision (32-bit) floating-point elements in a and b.
_mm256_zeroallavx: Zeroes the contents of all XMM or YMM registers.
_mm256_zeroupperavx: Zeroes the upper 128 bits of all YMM registers; the lower 128-bits of the registers are unmodified.
_mm256_zextpd128_pd256avx: Constructs a 256-bit floating-point vector of [4 x double] from a 128-bit floating-point vector of [2 x double]. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero.
_mm256_zextps128_ps256avx: Constructs a 256-bit floating-point vector of [8 x float] from a 128-bit floating-point vector of [4 x float]. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero.
_mm256_zextsi128_si256avx: Constructs a 256-bit integer vector from a 128-bit integer vector. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero.
_mm_broadcast_ssavx: Broadcasts a single-precision (32-bit) floating-point element from memory to all elements of the returned vector.
_mm_cmp_pdavx: Compares packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by IMM5.
_mm_cmp_psavx: Compares packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by IMM5.
_mm_cmp_sdavx: Compares the lower double-precision (64-bit) floating-point element in a and b based on the comparison operand specified by IMM5, store the result in the lower element of returned vector, and copies the upper element from a to the upper element of returned vector.
_mm_cmp_ssavx: Compares the lower single-precision (32-bit) floating-point element in a and b based on the comparison operand specified by IMM5, store the result in the lower element of returned vector, and copies the upper 3 packed elements from a to the upper elements of returned vector.
_mm_maskload_pd^⚠avx: Loads packed double-precision (64-bit) floating-point elements from memory into result using mask (elements are zeroed out when the high bit of the corresponding element is not set).
_mm_maskload_ps^⚠avx: Loads packed single-precision (32-bit) floating-point elements from memory into result using mask (elements are zeroed out when the high bit of the corresponding element is not set).
_mm_maskstore_pd^⚠avx: Stores packed double-precision (64-bit) floating-point elements from a into memory using mask.
_mm_maskstore_ps^⚠avx: Stores packed single-precision (32-bit) floating-point elements from a into memory using mask.
_mm_permute_pdavx: Shuffles double-precision (64-bit) floating-point elements in a using the control in imm8.
_mm_permute_psavx: Shuffles single-precision (32-bit) floating-point elements in a using the control in imm8.
_mm_permutevar_pdavx: Shuffles double-precision (64-bit) floating-point elements in a using the control in b.
_mm_permutevar_psavx: Shuffles single-precision (32-bit) floating-point elements in a using the control in b.
_mm_testc_pdavx: Computes the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return the CF value.
_mm_testc_psavx: Computes the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return the CF value.
_mm_testnzc_pdavx: Computes the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.
_mm_testnzc_psavx: Computes the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.
_mm_testz_pdavx: Computes the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return the ZF value.
_mm_testz_psavx: Computes the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return the ZF value.
ptestnzc256 🔒 ^⚠
roundpd256 🔒 ^⚠
roundps256 🔒 ^⚠
vcmppd 🔒 ^⚠
vcmppd256 🔒 ^⚠
vcmpps 🔒 ^⚠
vcmpps256 🔒 ^⚠
vcmpsd 🔒 ^⚠
vcmpss 🔒 ^⚠
vcvtpd2dq 🔒 ^⚠
vcvtps2dq 🔒 ^⚠
vcvttpd2dq 🔒 ^⚠
vcvttps2dq 🔒 ^⚠
vdpps 🔒 ^⚠
vlddqu 🔒 ^⚠
vmaxpd 🔒 ^⚠
vmaxps 🔒 ^⚠
vminpd 🔒 ^⚠
vminps 🔒 ^⚠
vpermilpd 🔒 ^⚠
vpermilpd256 🔒 ^⚠
vpermilps 🔒 ^⚠
vpermilps256 🔒 ^⚠
vrcpps 🔒 ^⚠
vrsqrtps 🔒 ^⚠
vtestcpd256 🔒 ^⚠
vtestcps256 🔒 ^⚠
vtestnzcpd 🔒 ^⚠
vtestnzcpd256 🔒 ^⚠
vtestnzcps 🔒 ^⚠
vtestnzcps256 🔒 ^⚠
vtestzpd256 🔒 ^⚠
vtestzps256 🔒 ^⚠
vzeroall 🔒 ^⚠
vzeroupper 🔒 ^⚠

Module avx

Module avx Copy item path

Constants§

Functions§

Module avx