core/num/imp/dec2flt/mod.rs
1//! Converting decimal strings into IEEE 754 binary floating point numbers.
2//!
3//! # Problem statement
4//!
5//! We are given a decimal string such as `12.34e56`. This string consists of integral (`12`),
6//! fractional (`34`), and exponent (`56`) parts. All parts are optional and interpreted as a
7//! default value (1 or 0) when missing.
8//!
9//! We seek the IEEE 754 floating point number that is closest to the exact value of the decimal
10//! string. It is well-known that many decimal strings do not have terminating representations in
11//! base two, so we round to 0.5 units in the last place (in other words, as well as possible).
12//! Ties, decimal values exactly half-way between two consecutive floats, are resolved with the
13//! half-to-even strategy, also known as banker's rounding.
14//!
15//! Needless to say, this is quite hard, both in terms of implementation complexity and in terms
16//! of CPU cycles taken.
17//!
18//! # Implementation
19//!
20//! First, we ignore signs. Or rather, we remove it at the very beginning of the conversion
21//! process and re-apply it at the very end. This is correct in all edge cases since IEEE
22//! floats are symmetric around zero, negating one simply flips the first bit.
23//!
24//! Then we remove the decimal point by adjusting the exponent: Conceptually, `12.34e56` turns
25//! into `1234e54`, which we describe with a positive integer `f = 1234` and an integer `e = 54`.
26//! The `(f, e)` representation is used by almost all code past the parsing stage.
27//!
28//! We then try a long chain of progressively more general and expensive special cases using
29//! machine-sized integers and small, fixed-sized floating point numbers (first `f32`/`f64`, then
30//! a type with 64 bit significand). The extended-precision algorithm
31//! uses the Eisel-Lemire algorithm, which uses a 128-bit (or 192-bit)
32//! representation that can accurately and quickly compute the vast majority
33//! of floats. When all these fail, we bite the bullet and resort to using
34//! a large-decimal representation, shifting the digits into range, calculating
35//! the upper significant bits and exactly round to the nearest representation.
36//!
37//! Another aspect that needs attention is the ``RawFloat`` trait by which almost all functions
38//! are parametrized. One might think that it's enough to parse to `f64` and cast the result to
39//! `f32`. Unfortunately this is not the world we live in, and this has nothing to do with using
40//! base two or half-to-even rounding.
41//!
42//! Consider for example two types `d2` and `d4` representing a decimal type with two decimal
43//! digits and four decimal digits each and take "0.01499" as input. Let's use half-up rounding.
44//! Going directly to two decimal digits gives `0.01`, but if we round to four digits first,
45//! we get `0.0150`, which is then rounded up to `0.02`. The same principle applies to other
46//! operations as well, if you want 0.5 ULP accuracy you need to do *everything* in full precision
47//! and round *exactly once, at the end*, by considering all truncated bits at once.
48//!
49//! Primarily, this module and its children implement the algorithms described in:
50//! "Number Parsing at a Gigabyte per Second", available online:
51//! <https://arxiv.org/abs/2101.11408>.
52//!
53//! # Other
54//!
55//! The conversion should *never* panic. There are assertions and explicit panics in the code,
56//! but they should never be triggered and only serve as internal sanity checks. Any panics should
57//! be considered a bug.
58//!
59//! There are unit tests but they are woefully inadequate at ensuring correctness, they only cover
60//! a small percentage of possible errors. Far more extensive tests are located in the directory
61//! `src/tools/test-float-parse` as a Rust program.
62//!
63//! A note on integer overflow: Many parts of this file perform arithmetic with the decimal
64//! exponent `e`. Primarily, we shift the decimal point around: Before the first decimal digit,
65//! after the last decimal digit, and so on. This could overflow if done carelessly. We rely on
66//! the parsing submodule to only hand out sufficiently small exponents, where "sufficient" means
67//! "such that the exponent +/- the number of decimal digits fits into a 64 bit integer".
68//! Larger exponents are accepted, but we don't do arithmetic with them, they are immediately
69//! turned into {positive,negative} {zero,infinity}.
70//!
71//! # Notation
72//!
73//! This module uses the same notation as the Lemire paper:
74//!
75//! - `m`: binary mantissa; always nonnegative
76//! - `p`: binary exponent; a signed integer
77//! - `w`: decimal significand; always nonnegative
78//! - `q`: decimal exponent; a signed integer
79//!
80//! This gives `m * 2^p` for the binary floating-point number, with `w * 10^q` as the decimal
81//! equivalent.
82
83#![doc(hidden)]
84#![unstable(
85 feature = "dec2flt",
86 reason = "internal routines only exposed for testing",
87 issue = "none"
88)]
89
90use common::BiasedFp;
91use lemire::compute_float;
92use parse::{parse_inf_nan, parse_number};
93use slow::parse_long_mantissa;
94
95use crate::f64;
96use crate::num::ParseFloatError;
97use crate::num::float_parse::FloatErrorKind;
98use crate::num::imp::FloatExt;
99
100mod common;
101pub mod decimal;
102pub mod decimal_seq;
103mod fpu;
104pub mod lemire;
105pub mod parse;
106mod slow;
107mod table;
108
109/// Extension to `Float` that are necessary for parsing using the Lemire method.
110///
111/// See the parent module's doc comment for why this is necessary.
112///
113/// Not intended for use outside of the `dec2flt` module.
114#[doc(hidden)]
115pub trait Lemire: FloatExt {
116 /// Maximum exponent for a fast path case, or `⌊(SIG_BITS+1)/log2(5)⌋`
117 // assuming FLT_EVAL_METHOD = 0
118 const MAX_EXPONENT_FAST_PATH: i64 = {
119 let log2_5 = f64::consts::LOG2_10 - 1.0;
120 (Self::SIG_TOTAL_BITS as f64 / log2_5) as i64
121 };
122
123 /// Minimum exponent for a fast path case, or `-⌊(SIG_BITS+1)/log2(5)⌋`
124 const MIN_EXPONENT_FAST_PATH: i64 = -Self::MAX_EXPONENT_FAST_PATH;
125
126 /// Maximum exponent that can be represented for a disguised-fast path case.
127 /// This is `MAX_EXPONENT_FAST_PATH + ⌊(SIG_BITS+1)/log2(10)⌋`
128 const MAX_EXPONENT_DISGUISED_FAST_PATH: i64 =
129 Self::MAX_EXPONENT_FAST_PATH + (Self::SIG_TOTAL_BITS as f64 / f64::consts::LOG2_10) as i64;
130
131 /// Maximum mantissa for the fast-path (`1 << 53` for f64).
132 const MAX_MANTISSA_FAST_PATH: u64 = 1 << Self::SIG_TOTAL_BITS;
133
134 /// Gets a small power-of-ten for fast-path multiplication.
135 fn pow10_fast_path(exponent: usize) -> Self;
136
137 /// Converts integer into float through an as cast.
138 /// This is only called in the fast-path algorithm, and therefore
139 /// will not lose precision, since the value will always have
140 /// only if the value is <= Self::MAX_MANTISSA_FAST_PATH.
141 fn from_u64(v: u64) -> Self;
142}
143
144#[cfg(target_has_reliable_f16)]
145impl Lemire for f16 {
146 fn pow10_fast_path(exponent: usize) -> Self {
147 #[allow(clippy::use_self)]
148 const TABLE: [f16; 8] = [1e0, 1e1, 1e2, 1e3, 1e4, 0.0, 0.0, 0.];
149 TABLE[exponent & 7]
150 }
151
152 #[inline]
153 fn from_u64(v: u64) -> Self {
154 debug_assert!(v <= Self::MAX_MANTISSA_FAST_PATH);
155 v as _
156 }
157}
158
159impl Lemire for f32 {
160 fn pow10_fast_path(exponent: usize) -> Self {
161 #[allow(clippy::use_self)]
162 const TABLE: [f32; 16] =
163 [1e0, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7, 1e8, 1e9, 1e10, 0., 0., 0., 0., 0.];
164 TABLE[exponent & 15]
165 }
166
167 #[inline]
168 fn from_u64(v: u64) -> Self {
169 debug_assert!(v <= Self::MAX_MANTISSA_FAST_PATH);
170 v as _
171 }
172}
173
174impl Lemire for f64 {
175 fn pow10_fast_path(exponent: usize) -> Self {
176 const TABLE: [f64; 32] = [
177 1e0, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7, 1e8, 1e9, 1e10, 1e11, 1e12, 1e13, 1e14, 1e15,
178 1e16, 1e17, 1e18, 1e19, 1e20, 1e21, 1e22, 0., 0., 0., 0., 0., 0., 0., 0., 0.,
179 ];
180 TABLE[exponent & 31]
181 }
182
183 #[inline]
184 fn from_u64(v: u64) -> Self {
185 debug_assert!(v <= Self::MAX_MANTISSA_FAST_PATH);
186 v as _
187 }
188}
189
190#[inline]
191pub(super) fn pfe_empty() -> ParseFloatError {
192 ParseFloatError { kind: FloatErrorKind::Empty }
193}
194
195// Used in unit tests, keep public.
196// This is much better than making FloatErrorKind and ParseFloatError::kind public.
197#[inline]
198pub fn pfe_invalid() -> ParseFloatError {
199 ParseFloatError { kind: FloatErrorKind::Invalid }
200}
201
202/// Converts a `BiasedFp` to the closest machine float type.
203fn biased_fp_to_float<F: FloatExt>(x: BiasedFp) -> F {
204 let mut word = x.m;
205 word |= (x.p_biased as u64) << F::SIG_BITS;
206 F::from_u64_bits(word)
207}
208
209/// Converts a decimal string into a floating point number.
210#[inline(always)] // Will be inlined into a function with `#[inline(never)]`, see above
211pub fn dec2flt<F: Lemire>(s: &str) -> Result<F, ParseFloatError> {
212 let mut s = s.as_bytes();
213 let Some(&c) = s.first() else { return Err(pfe_empty()) };
214 let negative = c == b'-';
215 if c == b'-' || c == b'+' {
216 s = &s[1..];
217 }
218 if s.is_empty() {
219 return Err(pfe_invalid());
220 }
221
222 let mut num = match parse_number(s) {
223 Some(r) => r,
224 None if let Some(value) = parse_inf_nan(s, negative) => return Ok(value),
225 None => return Err(pfe_invalid()),
226 };
227 num.negative = negative;
228 if !cfg!(feature = "optimize_for_size") {
229 if let Some(value) = num.try_fast_path::<F>() {
230 return Ok(value);
231 }
232 }
233
234 // If significant digits were truncated, then we can have rounding error
235 // only if `mantissa + 1` produces a different result. We also avoid
236 // redundantly using the Eisel-Lemire algorithm if it was unable to
237 // correctly round on the first pass.
238 let mut fp = compute_float::<F>(num.exponent, num.mantissa);
239 if num.many_digits
240 && fp.p_biased >= 0
241 && fp != compute_float::<F>(num.exponent, num.mantissa + 1)
242 {
243 fp.p_biased = -1;
244 }
245 // Unable to correctly round the float using the Eisel-Lemire algorithm.
246 // Fallback to a slower, but always correct algorithm.
247 if fp.p_biased < 0 {
248 fp = parse_long_mantissa::<F>(s);
249 }
250
251 let mut float = biased_fp_to_float::<F>(fp);
252 if num.negative {
253 float = -float;
254 }
255 Ok(float)
256}