Minifloat

In computing, minifloats are floating-point values represented with very few bits. This reduced precision makes them ill-suited for general-purpose numerical calculations, but they are useful for special purposes such as:

Computer graphics, where iterations are small and precision has aesthetic effects.^[1]
Machine learning, which can be relatively insensitive to numeric precision. bfloat16 and fp8 are common formats.^[2]

Additionally, they are frequently encountered as a pedagogical tool in computer-science courses to demonstrate the properties and structures of floating-point arithmetic and IEEE 754 numbers.

Minifloats with 16 bits are half-precision numbers (opposed to single and double precision). There are also minifloats with 8 bits or even fewer.^[2]

Minifloats can be designed following the principles of the IEEE 754 standard. In this case they must obey the (not explicitly written) rules for the frontier between subnormal and normal numbers and must have special patterns for infinity and NaN. Normalized numbers are stored with a biased exponent. The new revision of the standard, IEEE 754-2008, has 16-bit binary minifloats.

^ Mocerino, Luca; Calimera, Andrea (24 November 2021). "AxP: A HW-SW Co-Design Pipeline for Energy-Efficient Approximated ConvNets via Associative Matching". Applied Sciences. 11 (23): 11164. doi:10.3390/app112311164.
^ ^a ^b https://developer.nvidia.com/blog/nvidia-arm-and-intel-publish-fp8-specification-for-standardization-as-an-interchange-format-for-ai/ (joint announcement by Intel, NVIDIA, Arm); https://arxiv.org/abs/2209.05433 (preprint paper jointly written by researchers from aforementioned 3 companies)

[1] Mocerino, Luca; Calimera, Andrea (24 November 2021). "AxP: A HW-SW Co-Design Pipeline for Energy-Efficient Approximated ConvNets via Associative Matching". Applied Sciences. 11 (23): 11164. doi:10.3390/app112311164.

[fp8-ml-2] ttps://developer.nvidia.com/blog/nvidia-arm-and-intel-publish-fp8-specification-for-standardization-as-an-interchange-format-for-ai/ (joint announcement by Intel, NVIDIA, Arm); https://arxiv.org/abs/2209.05433 (preprint paper jointly written by researchers from aforementioned 3 companies)

[1]

[2]

Computer architecture bit widths
Bit
1 4 8 12 16 18 24 26 28 30 31 32 36 45 48 60 64 128 256 512 bit slicing
Application
8 16 32 64
Binary floating-point precision
16 (×½) 24 32 (×1) 40 64 (×2) 80 128 (×4) 256 (×8)
Decimal floating-point precision
32 64 128
v t e