This article needs additional citations for verification. (November 2011) |
In electronics and computing, a soft error is a type of error where a signal or datum is wrong. Errors may be caused by a defect, usually understood either to be a mistake in design or construction, or a broken component. A soft error is also a signal or datum which is wrong, but is not assumed to imply such a mistake or breakage. After observing a soft error, there is no implication that the system is any less reliable than before. One cause of soft errors is single event upsets from cosmic rays.
In a computer's memory system, a soft error changes an instruction in a program or a data value. Soft errors typically can be remedied by cold booting the computer. A soft error will not damage a system's hardware; the only damage is to the data that is being processed.
There are two types of soft errors, chip-level soft error and system-level soft error. Chip-level soft errors occur when particles hit the chip, e.g., when secondary particles from cosmic rays land on the silicon die. If a particle with certain properties hits a memory cell it can cause the cell to change state to a different value. The atomic reaction in this example is so tiny that it does not damage the physical structure of the chip. System-level soft errors occur when the data being processed is hit with a noise phenomenon, typically when the data is on a data bus. The computer tries to interpret the noise as a data bit, which can cause errors in addressing or processing program code. The bad data bit can even be saved in memory and cause problems at a later time.
If detected, a soft error may be corrected by rewriting correct data in place of erroneous data. Highly reliable systems use error correction to correct soft errors on the fly. However, in many systems, it may be impossible to determine the correct data, or even to discover that an error is present at all. In addition, before the correction can occur, the system may have crashed, in which case the recovery procedure must include a reboot. Soft errors involve changes to data—the electrons in a storage circuit, for example—but not changes to the physical circuit itself, the atoms. If the data is rewritten, the circuit will work perfectly again. Soft errors can occur on transmission lines, in digital logic, analog circuits, magnetic storage, and elsewhere, but are most commonly known in semiconductor storage.