Template:Floating-point/doc
This is a
documentation
subpage
for
Template:Floating-point
.
It may contain usage information,
categories
and other content that is not part of the original template page.
Floating-point
formats
IEEE 754
16-bit:
Half
(binary16)
32-bit:
Single
(binary32),
decimal32
64-bit:
Double
(binary64),
decimal64
128-bit:
Quadruple
(binary128),
decimal128
256-bit:
Octuple
(binary256)
Extended precision
Other
Minifloat
bfloat16
TensorFloat-32
Microsoft Binary Format
IBM floating-point architecture
PMBus Linear-11
G.711 8-bit floats
Alternatives
Arbitrary precision
Tapered floating point
Posit
v
t
e