Quadruple-precision floating-point format

In computing, quadruple precision (or quad precision) is a binary floating-point–based computer number format that occupies 16 bytes (128 bits) with precision at least twice the 53-bit double precision.

This 128-bit quadruple precision is designed not only for applications requiring results in higher than double precision,^[1] but also, as a primary function, to allow the computation of double precision results more reliably and accurately by minimising overflow and round-off errors in intermediate calculations and scratch variables. William Kahan, primary architect of the original IEEE 754 floating-point standard noted, "For now the 10-byte Extended format is a tolerable compromise between the value of extra-precise arithmetic and the price of implementing it to run fast; very soon two more bytes of precision will become tolerable, and ultimately a 16-byte format ... That kind of gradual evolution towards wider precision was already in view when IEEE Standard 754 for Floating-Point Arithmetic was framed."^[2]

In IEEE 754-2008 the 128-bit base-2 format is officially referred to as binary128.

^ Bailey, David H.; Borwein, Jonathan M. (July 6, 2009). "High-Precision Computation and Mathematical Physics" (PDF).
^ Higham, Nicholas (2002). "Designing stable algorithms" in Accuracy and Stability of Numerical Algorithms (2 ed). SIAM. p. 43.

[1] Bailey, David H.; Borwein, Jonathan M. (July 6, 2009). "High-Precision Computation and Mathematical Physics" (PDF).

[2] Higham, Nicholas (2002). "Designing stable algorithms" in Accuracy and Stability of Numerical Algorithms (2 ed). SIAM. p. 43.

[1]

[2]

Computer architecture bit widths
Bit
1 4 8 12 16 18 24 26 28 30 31 32 36 45 48 60 64 128 256 512 bit slicing
Application
8 16 32 64
Binary floating-point precision
16 (×½) 24 32 (×1) 40 64 (×2) 80 128 (×4) 256 (×8)
Decimal floating-point precision
32 64 128
v t e