SSE4

SSE4 (Streaming SIMD Extensions 4) is a SIMD CPU instruction set used in the Intel Core microarchitecture and AMD K10 (K8L). It was announced on September 27, 2006, at the Fall 2006 Intel Developer Forum, with vague details in a white paper;[1] more precise details of 47 instructions became available at the Spring 2007 Intel Developer Forum in Beijing, in the presentation.[2] SSE4 extended the SSE3 instruction set which was released in early 2004. All software using previous Intel SIMD instructions (ex. SSE3) are compatible with modern microprocessors supporting SSE4 instructions. All existing software continues to run correctly without modification on microprocessors that incorporate SSE4, as well as in the presence of existing and new applications that incorporate SSE4.[3]

Like other previous generation CPU SIMD instruction sets, SSE4 supports up to 16 registers, each 128-bits wide which can load four 32-bit integers, four 32-bit single precision floating point numbers, or two 64-bit double precision floating point numbers.[1] SIMD operations, such as vector element-wise addition/multiplication and vector scalar addition/multiplication, process multiple bytes of data in a single CPU instruction. The parallel operation packs noticeable increases in performance. SSE4.2 introduced new SIMD string operations, including an instruction to compare two string fragments of up to 16 bytes each.[1] SSE4.2 is a subset of SSE4 and it was released a few years after the initial release of SSE4.

  1. ^ a b c Intel Streaming SIMD Extensions 4 (SSE4) Instruction Set Innovation Archived May 30, 2009, at the Wayback Machine, Intel.
  2. ^ Tuning for Intel SSE4 for the 45nm Next Generation Intel Core Microarchitecture Archived March 8, 2021, at the Wayback Machine, Intel.
  3. ^ "Intel SSE4 Programming Reference" (PDF). Archived (PDF) from the original on February 15, 2020. Retrieved December 26, 2014.