|
|
Home \ Manual \
S.M.A.R.T. basics
S.M.A.R.T. basics
S.M.A.R.T. is the abbreviation for "Self Monitoring And Reporting Technology". It is a standard
interface protocol and set of the disk features that allows disk to
check its status and report it to a host system. S.M.A.R.T. information consists
of "attributes", each one describing some particular aspect of drive
condition. Some attributes may be designated "life-critical", which
implies that the corresponding parameters are more important than other ones.
Three values are associated with each S.M.A.R.T. attribute:
- "Normalized value", commonly referred to as just "value".
This is a most universal measurement, on the scale from 0 (bad) to some
maximum (good) value. Maximum values are typically 100, 200 or 253. Rule of
thumb is: high values are good, low values are bad.
- "Threshold" - the minimum normalized value limit for the
attribute. If the normalized value falls below the threshold, the disk is
considered defective and should be replaced under warranty. This situation
is called "T.E.C." (Threshold Exceeded Condition).
- "Raw value" - the value of the attribute as it is tracked by the
device, before any normalization takes place. Some raw numbers provide
valuable insight when properly interpreted. These cases will be discussed
later on. Raw values are typically listed in hexadecimal numbers.
Most common S.M.A.R.T. attributes reference
Note that not all of the attributes are present on all drives. Some
attributes are of similar meaning (just counted differently), so only one of
them will normally be monitored by the drive. Some require special sensors (e.g.
temperature or G-loads monitoring). The decision about which attributes should
be implemented is up to the drive vendor. Along the same lines the interpretation of raw values depends
heavily on the manufacturer.
| Critical device status attributes |
| Reallocated sectors count |
Indicates how many defective sectors were discovered on the drive
and remapped using a spare sectors pool. Low values in absence of
other fault indications point to the disk surface problem. Raw value
indicates the exact number of such sectors. |
| Current pending sectors count |
Indicates how many suspected defective sectors are pending
"investigation". These will not necessarily be remapped. In fact, such
sectors my be not defective at all (e.g. if some transient condition
prevented reading of the sector, it will be marked "pending") - they
will be then re-tested by the device off-line scan1
procedure and returned to the pool of serviceable sectors. Raw value
indicates the exact number of such sectors. |
| Off-line uncorrectable sectors count |
Similar to "Reallocated sectors count". Indicates how many defective
sectors were found during the off-line scan1. |
| Read error rate |
Logs rate at which specified events (errors) occur.
Lower value indicates more events (errors). Retries are not necessarily
indicate a persistent problem, but one should proceed with caution if
any of these attributes is degraded. |
| Read error retry rate |
| Write error rate |
| Seek error rate |
| Recalibration retries |
Indicates how often the drive is unable to recalibrate at the first
attempt. Raw value may show the exact number of recalibration events (at
least with some vendors) but this should be taken with a grain of salt. |
| Spin up time |
Low value indicates that a drive takes longer than expected to spin
up to its rated speed. Might indicate either a controller or a spindle
bearing problem. |
| Spin retry count |
Spin retry event is logged each time the drive was unable to spin
its platters up to the rated rotation speed in the due time. Spin-up
attempt was aborted and retried. This typically indicates severe
controller or bearing problem, but may be sometimes caused by power
supply problems. |
| Drive lifetime information |
| Drive start/stop count |
These two provide the estimation of the drive wear.
Vendor estimates the supposed device lifetime and the number of cycles.
The value for these attributes is then computed based on this
estimation. The T.E.C. condition with one of these attributes does not
necessarily indicate a drive failure, but rather suggests that a drive
should be considered unreliable due to the wear and tear. Raw values are
typically just the count of events. |
| Power off/retract cycle count |
| Power on hours count |
Normalized values are computed similar to the above.
Despite what the name suggests, the raw value of the attribute is stored
using all sorts of measurement units (hours, half-hours, or ten-minute
intervals to name a few) depending on the manufacturer of the device. |
| Head flying hours count |
| Operating conditions information |
| Temperature |
Indicates the device temperature, if the appropriate sensor is
fitted. Lowest byte of the raw value contains the exact temperature
value (Celsius degrees). |
| Ultra DMA CRC error rate |
Low value of this attribute typically indicates that something is
wrong with the connectors and/or cables. Disk-to-host transfers are
protected by CRC error detection code when Ultra-DMA 66 or 100 is used. So if the
data gets garbled between the disk and the host machine, the receiving
controller senses this and the retransmission is initiated. Such a
situation is called "UDMA CRC error". Once the problem is rectified
(typically by replacing a cable), the attribute value returns to the
normal levels pretty quick. |
| G-sense error rate |
Indicates if the errors are occurring attributed to the drive
shocking (either due to the environmental factors or due to improper
installation). The hard drive must be fitted with the appropriate sensor
to get information about the G-loads. This attribute is mainly limited to
the notebook (2.5") drives. Once the operation conditions are corrected,
the attribute value will return to normal. |
1Off-line scan procedure - When the device is
idle for some period of time, it may start various self-diagnostic procedures,
including just wandering into the surface to read sectors here and there. You
may observe this by letting your system idle (provided that no background
process requests disk access) and listening to the hard drive. It will soon
start to operate "on its own" - this is an off-line scan underway. This is done
to detect any possible growing defects before they become critical.
|