1#![doc = "SIMD and vendor intrinsics module.\n\nThis module is intended to be the gateway to architecture-specific\nintrinsic functions, typically related to SIMD (but not always!). Each\narchitecture that Rust compiles to may contain a submodule here, which\nmeans that this is not a portable module! If you\'re writing a portable\nlibrary take care when using these APIs!\n\nUnder this module you\'ll find an architecture-named module, such as\n`x86_64`. Each `#[cfg(target_arch)]` that Rust can compile to may have a\nmodule entry here, only present on that particular target. For example the\n`i686-pc-windows-msvc` target will have an `x86` module here, whereas\n`x86_64-pc-windows-msvc` has `x86_64`.\n\n[rfc]: https://github.com/rust-lang/rfcs/pull/2325\n[tracked]: https://github.com/rust-lang/rust/issues/48556\n\n# Overview\n\nThis module exposes vendor-specific intrinsics that typically correspond to\na single machine instruction. These intrinsics are not portable: their\navailability is architecture-dependent, and not all machines of that\narchitecture might provide the intrinsic.\n\nThe `arch` module is intended to be a low-level implementation detail for\nhigher-level APIs. Using it correctly can be quite tricky as you need to\nensure at least a few guarantees are upheld:\n\n* The correct architecture\'s module is used. For example the `arm` module\n isn\'t available on the `x86_64-unknown-linux-gnu` target. This is\n typically done by ensuring that `#[cfg]` is used appropriately when using\n this module.\n* The CPU the program is currently running on supports the function being\n called. For example it is unsafe to call an AVX2 function on a CPU that\n doesn\'t actually support AVX2.\n\nAs a result of the latter of these guarantees all intrinsics in this module\nare `unsafe` and extra care needs to be taken when calling them!\n\n# CPU Feature Detection\n\nIn order to call these APIs in a safe fashion there\'s a number of\nmechanisms available to ensure that the correct CPU feature is available\nto call an intrinsic. Let\'s consider, for example, the `_mm256_add_epi64`\nintrinsics on the `x86` and `x86_64` architectures. This function requires\nthe AVX2 feature as [documented by Intel][intel-dox] so to correctly call\nthis function we need to (a) guarantee we only call it on `x86`/`x86_64`\nand (b) ensure that the CPU feature is available\n\n[intel-dox]: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_add_epi64&expand=100\n\n## Static CPU Feature Detection\n\nThe first option available to us is to conditionally compile code via the\n`#[cfg]` attribute. CPU features correspond to the `target_feature` cfg\navailable, and can be used like so:\n\n```ignore\n#[cfg(\n all(\n any(target_arch = \"x86\", target_arch = \"x86_64\"),\n target_feature = \"avx2\"\n )\n)]\nfn foo() {\n #[cfg(target_arch = \"x86\")]\n use std::arch::x86::_mm256_add_epi64;\n #[cfg(target_arch = \"x86_64\")]\n use std::arch::x86_64::_mm256_add_epi64;\n\n unsafe {\n _mm256_add_epi64(...);\n }\n}\n```\n\nHere we\'re using `#[cfg(target_feature = \"avx2\")]` to conditionally compile\nthis function into our module. This means that if the `avx2` feature is\n*enabled statically* then we\'ll use the `_mm256_add_epi64` function at\nruntime. The `unsafe` block here can be justified through the usage of\n`#[cfg]` to only compile the code in situations where the safety guarantees\nare upheld.\n\nStatically enabling a feature is typically done with the `-C\ntarget-feature` or `-C target-cpu` flags to the compiler. For example if\nyour local CPU supports AVX2 then you can compile the above function with:\n\n```sh\n$ RUSTFLAGS=\'-C target-cpu=native\' cargo build\n```\n\nOr otherwise you can specifically enable just the AVX2 feature:\n\n```sh\n$ RUSTFLAGS=\'-C target-feature=+avx2\' cargo build\n```\n\nNote that when you compile a binary with a particular feature enabled it\'s\nimportant to ensure that you only run the binary on systems which satisfy\nthe required feature set.\n\n## Dynamic CPU Feature Detection\n\nSometimes statically dispatching isn\'t quite what you want. Instead you\nmight want to build a portable binary that runs across a variety of CPUs,\nbut at runtime it selects the most optimized implementation available. This\nallows you to build a \"least common denominator\" binary which has certain\nsections more optimized for different CPUs.\n\nTaking our previous example from before, we\'re going to compile our binary\n*without* AVX2 support, but we\'d like to enable it for just one function.\nWe can do that in a manner like:\n\n```ignore\nfn foo() {\n #[cfg(any(target_arch = \"x86\", target_arch = \"x86_64\"))]\n {\n if is_x86_feature_detected!(\"avx2\") {\n return unsafe { foo_avx2() };\n }\n }\n\n // fallback implementation without using AVX2\n}\n\n#[cfg(any(target_arch = \"x86\", target_arch = \"x86_64\"))]\n#[target_feature(enable = \"avx2\")]\nunsafe fn foo_avx2() {\n #[cfg(target_arch = \"x86\")]\n use std::arch::x86::_mm256_add_epi64;\n #[cfg(target_arch = \"x86_64\")]\n use std::arch::x86_64::_mm256_add_epi64;\n\n unsafe { _mm256_add_epi64(...); }\n}\n```\n\nThere\'s a couple of components in play here, so let\'s go through them in\ndetail!\n\n* First up we notice the `is_x86_feature_detected!` macro. Provided by\n the standard library, this macro will perform necessary runtime detection\n to determine whether the CPU the program is running on supports the\n specified feature. In this case the macro will expand to a boolean\n expression evaluating to whether the local CPU has the AVX2 feature or\n not.\n\n Note that this macro, like the `arch` module, is platform-specific. For\n example calling `is_x86_feature_detected!(\"avx2\")` on ARM will be a\n compile time error. To ensure we don\'t hit this error a statement level\n `#[cfg]` is used to only compile usage of the macro on `x86`/`x86_64`.\n\n* Next up we see our AVX2-enabled function, `foo_avx2`. This function is\n decorated with the `#[target_feature]` attribute which enables a CPU\n feature for just this one function. Using a compiler flag like `-C\n target-feature=+avx2` will enable AVX2 for the entire program, but using\n an attribute will only enable it for the one function. Usage of the\n `#[target_feature]` attribute currently requires the function to also be\n `unsafe`, as we see here. This is because the function can only be\n correctly called on systems which have the AVX2 (like the intrinsics\n themselves).\n\nAnd with all that we should have a working program! This program will run\nacross all machines and it\'ll use the optimized AVX2 implementation on\nmachines where support is detected.\n\n# Ergonomics\n\nIt\'s important to note that using the `arch` module is not the easiest\nthing in the world, so if you\'re curious to try it out you may want to\nbrace yourself for some wordiness!\n\nThe primary purpose of this module is to enable stable crates on crates.io\nto build up much more ergonomic abstractions which end up using SIMD under\nthe hood. Over time these abstractions may also move into the standard\nlibrary itself, but for now this module is tasked with providing the bare\nminimum necessary to use vendor intrinsics on stable Rust.\n\n# Other architectures\n\nThis documentation is only for one particular architecture, you can find\nothers at:\n\n* [`x86`]\n* [`x86_64`]\n* [`arm`]\n* [`aarch64`]\n* [`amdgpu`]\n* [`hexagon`]\n* [`riscv32`]\n* [`riscv64`]\n* [`mips`]\n* [`mips64`]\n* [`powerpc`]\n* [`powerpc64`]\n* [`nvptx`]\n* [`wasm32`]\n* [`loongarch32`]\n* [`loongarch64`]\n* [`s390x`]\n\n[`x86`]: ../../core/arch/x86/index.html\n[`x86_64`]: ../../core/arch/x86_64/index.html\n[`arm`]: ../../core/arch/arm/index.html\n[`aarch64`]: ../../core/arch/aarch64/index.html\n[`amdgpu`]: ../../core/arch/amdgpu/index.html\n[`hexagon`]: ../../core/arch/hexagon/index.html\n[`riscv32`]: ../../core/arch/riscv32/index.html\n[`riscv64`]: ../../core/arch/riscv64/index.html\n[`mips`]: ../../core/arch/mips/index.html\n[`mips64`]: ../../core/arch/mips64/index.html\n[`powerpc`]: ../../core/arch/powerpc/index.html\n[`powerpc64`]: ../../core/arch/powerpc64/index.html\n[`nvptx`]: ../../core/arch/nvptx/index.html\n[`wasm32`]: ../../core/arch/wasm32/index.html\n[`loongarch32`]: ../../core/arch/loongarch32/index.html\n[`loongarch64`]: ../../core/arch/loongarch64/index.html\n[`s390x`]: ../../core/arch/s390x/index.html\n\n# Examples\n\nFirst let\'s take a look at not actually using any intrinsics but instead\nusing LLVM\'s auto-vectorization to produce optimized vectorized code for\nAVX2 and also for the default platform.\n\n```rust\nfn main() {\n let mut dst = [0];\n add_quickly(&[1], &[2], &mut dst);\n assert_eq!(dst[0], 3);\n}\n\nfn add_quickly(a: &[u8], b: &[u8], c: &mut [u8]) {\n #[cfg(any(target_arch = \"x86\", target_arch = \"x86_64\"))]\n {\n // Note that this `unsafe` block is safe because we\'re testing\n // that the `avx2` feature is indeed available on our CPU.\n if is_x86_feature_detected!(\"avx2\") {\n return unsafe { add_quickly_avx2(a, b, c) };\n }\n }\n\n add_quickly_fallback(a, b, c)\n}\n\n#[cfg(any(target_arch = \"x86\", target_arch = \"x86_64\"))]\n#[target_feature(enable = \"avx2\")]\nunsafe fn add_quickly_avx2(a: &[u8], b: &[u8], c: &mut [u8]) {\n add_quickly_fallback(a, b, c) // the function below is inlined here\n}\n\nfn add_quickly_fallback(a: &[u8], b: &[u8], c: &mut [u8]) {\n for ((a, b), c) in a.iter().zip(b).zip(c) {\n *c = *a + *b;\n }\n}\n```\n\nNext up let\'s take a look at an example of manually using intrinsics. Here\nwe\'ll be using SSE4.1 features to implement hex encoding.\n\n```\nfn main() {\n let mut dst = [0; 32];\n hex_encode(b\"\\x01\\x02\\x03\", &mut dst);\n assert_eq!(&dst[..6], b\"010203\");\n\n let mut src = [0; 16];\n for i in 0..16 {\n src[i] = (i + 1) as u8;\n }\n hex_encode(&src, &mut dst);\n assert_eq!(&dst, b\"0102030405060708090a0b0c0d0e0f10\");\n}\n\npub fn hex_encode(src: &[u8], dst: &mut [u8]) {\n let len = src.len().checked_mul(2).unwrap();\n assert!(dst.len() >= len);\n\n #[cfg(any(target_arch = \"x86\", target_arch = \"x86_64\"))]\n {\n if is_x86_feature_detected!(\"sse4.1\") {\n return unsafe { hex_encode_sse41(src, dst) };\n }\n }\n\n hex_encode_fallback(src, dst)\n}\n\n// translated from\n// <https://github.com/Matherunner/bin2hex-sse/blob/master/base16_sse4.cpp>\n#[target_feature(enable = \"sse4.1\")]\n#[cfg(any(target_arch = \"x86\", target_arch = \"x86_64\"))]\nunsafe fn hex_encode_sse41(mut src: &[u8], dst: &mut [u8]) {\n #[cfg(target_arch = \"x86\")]\n use std::arch::x86::*;\n #[cfg(target_arch = \"x86_64\")]\n use std::arch::x86_64::*;\n\n unsafe {\n let ascii_zero = _mm_set1_epi8(b\'0\' as i8);\n let nines = _mm_set1_epi8(9);\n let ascii_a = _mm_set1_epi8((b\'a\' - 9 - 1) as i8);\n let and4bits = _mm_set1_epi8(0xf);\n\n let mut i = 0_isize;\n while src.len() >= 16 {\n let invec = _mm_loadu_si128(src.as_ptr() as *const _);\n\n let masked1 = _mm_and_si128(invec, and4bits);\n let masked2 = _mm_and_si128(_mm_srli_epi64(invec, 4), and4bits);\n\n // return 0xff corresponding to the elements > 9, or 0x00 otherwise\n let cmpmask1 = _mm_cmpgt_epi8(masked1, nines);\n let cmpmask2 = _mm_cmpgt_epi8(masked2, nines);\n\n // add \'0\' or the offset depending on the masks\n let masked1 = _mm_add_epi8(\n masked1,\n _mm_blendv_epi8(ascii_zero, ascii_a, cmpmask1),\n );\n let masked2 = _mm_add_epi8(\n masked2,\n _mm_blendv_epi8(ascii_zero, ascii_a, cmpmask2),\n );\n\n // interleave masked1 and masked2 bytes\n let res1 = _mm_unpacklo_epi8(masked2, masked1);\n let res2 = _mm_unpackhi_epi8(masked2, masked1);\n\n _mm_storeu_si128(dst.as_mut_ptr().offset(i * 2) as *mut _, res1);\n _mm_storeu_si128(\n dst.as_mut_ptr().offset(i * 2 + 16) as *mut _,\n res2,\n );\n src = &src[16..];\n i += 16;\n }\n\n let i = i as usize;\n hex_encode_fallback(src, &mut dst[i * 2..]);\n }\n}\n\nfn hex_encode_fallback(src: &[u8], dst: &mut [u8]) {\n fn hex(byte: u8) -> u8 {\n static TABLE: &[u8] = b\"0123456789abcdef\";\n TABLE[byte as usize]\n }\n\n for (byte, slots) in src.iter().zip(dst.chunks_mut(2)) {\n slots[0] = hex((*byte >> 4) & 0xf);\n slots[1] = hex(*byte & 0xf);\n }\n}\n```\n"include_str!("../../stdarch/crates/core_arch/src/core_arch_docs.md")]
23#[allow(
4// some targets don't have anything to reexport, which
5 // makes the `pub use` unused and unreachable, allow
6 // both lints as to not have `#[cfg]`s
7 //
8 // cf. https://github.com/rust-lang/rust/pull/116033#issuecomment-1760085575
9unused_imports,
10 unreachable_pub
11)]
12#[stable(feature = "simd_arch", since = "1.27.0")]
13pub use crate::core_arch::arch::*;
1415/// Inline assembly.
16///
17/// Refer to [Rust By Example] for a usage guide and the [reference] for
18/// detailed information about the syntax and available options.
19///
20/// [Rust By Example]: https://doc.rust-lang.org/nightly/rust-by-example/unsafe/asm.html
21/// [reference]: https://doc.rust-lang.org/nightly/reference/inline-assembly.html
22#[stable(feature = "asm", since = "1.59.0")]
23#[rustc_builtin_macro]
24pub macro asm("assembly template", $(operands,)* $(options($(option),*))?) {
25/* compiler built-in */
26}
2728/// Inline assembly used in combination with `#[naked]` functions.
29///
30/// Refer to [Rust By Example] for a usage guide and the [reference] for
31/// detailed information about the syntax and available options.
32///
33/// [Rust By Example]: https://doc.rust-lang.org/nightly/rust-by-example/unsafe/asm.html
34/// [reference]: https://doc.rust-lang.org/nightly/reference/inline-assembly.html
35#[stable(feature = "naked_functions", since = "1.88.0")]
36#[rustc_builtin_macro]
37pub macro naked_asm("assembly template", $(operands,)* $(options($(option),*))?) {
38/* compiler built-in */
39}
4041/// Module-level inline assembly.
42///
43/// Refer to [Rust By Example] for a usage guide and the [reference] for
44/// detailed information about the syntax and available options.
45///
46/// [Rust By Example]: https://doc.rust-lang.org/nightly/rust-by-example/unsafe/asm.html
47/// [reference]: https://doc.rust-lang.org/nightly/reference/inline-assembly.html
48#[stable(feature = "global_asm", since = "1.59.0")]
49#[rustc_builtin_macro]
50pub macro global_asm("assembly template", $(operands,)* $(options($(option),*))?) {
51/* compiler built-in */
52}
5354/// Compiles to a target-specific software breakpoint instruction or equivalent.
55///
56/// This will typically abort the program. It may result in a core dump, and/or the system logging
57/// debug information. Additional target-specific capabilities may be possible depending on
58/// debuggers or other tooling; in particular, a debugger may be able to resume execution.
59///
60/// If possible, this will produce an instruction sequence that allows a debugger to resume *after*
61/// the breakpoint, rather than resuming *at* the breakpoint; however, the exact behavior is
62/// target-specific and debugger-specific, and not guaranteed.
63///
64/// If the target platform does not have any kind of debug breakpoint instruction, this may compile
65/// to a trapping instruction (e.g. an undefined instruction) instead, or to some other form of
66/// target-specific abort that may or may not support convenient resumption.
67///
68/// The precise behavior and the precise instruction generated are not guaranteed, except that in
69/// normal execution with no debug tooling involved this will not continue executing.
70///
71/// - On x86 targets, this produces an `int3` instruction.
72/// - On aarch64 targets, this produces a `brk #0xf000` instruction.
73// When stabilizing this, update the comment on `core::intrinsics::breakpoint`.
74#[unstable(feature = "breakpoint", issue = "133724")]
75#[inline(always)]
76pub fn breakpoint() {
77 core::intrinsics::breakpoint();
78}