git.feebdaed.xyz Git - 0xmirror/glibc.git/commit

math: New generic fma implementation

The current implementation relies on setting the rounding mode for
different calculations (first to FE_TONEAREST and then to FE_TOWARDZERO)
to obtain correctly rounded results. For most CPUs, this adds a significant
performance overhead since it requires executing a typically slow
instruction (to get/set the floating-point status), it necessitates
flushing the pipeline, and breaks some compiler assumptions/optimizations.

This patch introduces a new implementation originally written by Szabolcs
for musl, which utilizes mostly integer arithmetic.  Floating-point
arithmetic is used to raise the expected exceptions, without the need for
fenv.h operations.

I added some changes compared to the original code:

  * Fixed some signaling NaN issues when the 3-argument is NaN.

  * Use math_uint128.h for the 64-bit multiplication operation.  It allows
    the compiler to use 128-bit types where available, which enables some
    optimizations on certain targets (for instance, MIPS64).

  * Fixed an arm32 issue where the libgcc routine might not respect the
    rounding mode [1].  This can also be used on other targets to optimize
    the conversion from int64_t to double.

  * Use -fexcess-precision=standard on i686.

I tested this implementation on various targets (x86_64, i686, arm, aarch64,
powerpc), including some by manually disabling the compiler instructions.

Performance-wise, it shows large improvements:

reciprocal-throughput       master       patched       improvement
x86_64 [2]                289.4640       22.4396            12.90x
i686 [2]                  636.8660      169.3640             3.76x
aarch64 [3]                46.0020       11.3281             4.06x
armhf [3]                   63.989       26.5056             2.41x
powerpc [4]                23.9332       6.40205             3.74x

latency                     master       patched       improvement
x86_64                    293.7360       38.1478             7.70x
i686                      658.4160      187.9940             3.50x
aarch64                    44.5166       14.7157             3.03x
armhf                      63.7678       28.4116             2.24x
power10                    23.8561       11.4250             2.09x

Checked on x86_64-linux-gnu and i686-linux-gnu with —disable-multi-arch,
and on arm-linux-gnueabihf.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970
[2] gcc 15.2.1, Zen3
[3] gcc 15.2.1, Neoverse N1
[4] gcc 15.2.1, POWER10

Signed-off-by: Szabolcs Nagy <nsz@gcc.gnu.org>
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>

author	Adhemerval Zanella <adhemerval.zanella@linaro.org>
	Thu, 13 Nov 2025 12:58:20 +0000 (09:58 -0300)
committer	Adhemerval Zanella <adhemerval.zanella@linaro.org>
	Wed, 26 Nov 2025 13:10:06 +0000 (10:10 -0300)
commit	bf211c34993921eccbc074f82cfbb8e9a16d850c
tree	a3001196793825d55e5ec28961d57aadaeacbe88	tree \| snapshot
parent	5dab2a31954b0e0ff220cb28fa2f3fc79b8781df	commit \| diff

sysdeps/arm/fpu/math_private.h	[new file with mode: 0644]	blob
sysdeps/i386/Makefile		diff \| blob \| history
sysdeps/ieee754/dbl-64/math_config.h		diff \| blob \| history
sysdeps/ieee754/dbl-64/s_fma.c		diff \| blob \| history