Fault-tolerant design
Encyclopedia
|
| Tutorials | Encyclopedia | Dictionary | Directory |
|
![]()
Fault-tolerant design
In engineering, Fault-tolerant design, also known as fail-safe design, is a design that enables a system to continue operation, possibly at a reduced level (also known as graceful degradation), rather than failing completely, when some part of the system fails. The term is most commonly used to describe computer-based systems designed to continue more or less fully operational with, perhaps, a reduction in throughput or an increase in response time in the event of some partial failure. That is, the system as a whole is not stopped due to problems either in the hardware or the software. An example in another field is a motor vehicle designed so it will continue to be drivable if one of the tires is punctured. A structure is able to retain its integrity in the presence of damage due to causes such as fatigue, corrosion, manufacturing flaws, or impact.
ComponentsIf each component, in turn, can continue to function when one of its subcomponents fails, this will allow the total system to continue to operate, as well. Using a passenger vehicle as an example, a car can have "run-flat" tires, which each contain a solid rubber core, allowing them to be used even if a tire is punctured. The punctured "run-flat" tire may be used for a limited time at a reduced speed. RedundancyThis means having backup components which automatically "kick in" should one component fail. For example, large cargo trucks can lose a tire without any major consequences. They have so many tires that no one tire is critical (with the exception of the front tires, which are used to steer). When to useProviding fault-tolerant design for every component is normally not an option. In such cases the following criteria may be used to determine which components should be fault-tolerant:
An example of a component that passes all the tests is a car's occupant restraint system. While we do not normally think of the primary occupant restraint system, it is gravity. If the vehicle rolls over or undergoes severe g-forces, then this primary method of occupant restraint may fail. Restraining the occupants during such an accident is absolutely critical to safety, so we pass the first test. Accidents causing occupant ejection were quite common before seat belts, so we pass the second test. The cost of a redundant restraint method like seat belts is quite low, both economically and in terms or weight and space, so we pass the third test. Therefore, adding seat belts to all vehicles is an excellent idea. Other "supplemental restraint systems", such as airbags, are more expensive and so pass that test by a smaller margin. This is why inexpensive vehicles typically have fewer airbags than expensive vehicles. ExamplesHardware fault-tolerance sometimes requires that broken parts can be swapped out with new ones while the system is still operational (in computing known as hot swapping). Such a system implemented with a single backup is known as single point tolerant, and represents the vast majority of fault-tolerant systems. In such systems the mean time between failures should be long enough for the operators to have time to fix the broken devices before the backup also fails. It helps if the time between failures is as long as possible, but this is not specifically required in a fault-tolerant system. Fault-tolerance is notably successful in computer applications. Tandem Computers built their entire business on such machines, which used single point tolerance to create their NonStop systems with uptimes measured in years. Fail-safe architectures may encompass also the computer software, for example by process replication (computer science). DisadvantagesFault-tolerant design's advantages are obvious, while many of its disadvantages are not:
Related termsThere is a difference between fault-tolerance and systems that rarely have problems. For instance, the Western Electric crossbar systems had failure rates of two hours per forty years, and therefore were highly fault resistant. But when a fault did occur they still stopped operating completely, and therefore were not fault-tolerant. See also
External links
fr:Tolérance aux pannes ja:???????????? pl:Fault Tolerant sv:Felsäkert läge
Source: Wikipedia | The above article is available under the GNU FDL. | Edit this article
|
|
top
©2008-2009 TutorGig.com. All Rights Reserved. Privacy Statement