Home \ Manual \ S.M.A.R.T. basics

S.M.A.R.T. basics

ZAR has been discontinued
After about twenty years, I felt ZAR can no longer be updated to match the modern requirements, and I decided to retire it.

ZAR is replaced by Klennet Recovery, my new general-purpose DIY data recovery software.

If you are looking specifically for recovery of image files (like JPEG, CR2, and NEF), take a look at Klennet Carver, a separate video and photo recovery software.

S.M.A.R.T. is the abbreviation for "Self Monitoring And Reporting Technology". It is a standard interface protocol and set of the disk features that allows disk to check its status and report it to a host system. S.M.A.R.T. information consists of "attributes", each one describing some particular aspect of drive condition. Some attributes may be designated "life-critical", which implies that the corresponding parameters are more important than other ones.

Three values are associated with each S.M.A.R.T. attribute:

  • "Normalized value", commonly referred to as just "value". This is a most universal measurement, on the scale from 0 (bad) to some maximum (good) value. Maximum values are typically 100, 200 or 253. Rule of thumb is: high values are good, low values are bad.
  • "Threshold" - the minimum normalized value limit for the attribute. If the normalized value falls below the threshold, the disk is considered defective and should be replaced under warranty. This situation is called "T.E.C." (Threshold Exceeded Condition).
  • "Raw value" - the value of the attribute as it is tracked by the device, before any normalization takes place. Some raw numbers provide valuable insight when properly interpreted. These cases will be discussed later on. Raw values are typically listed in hexadecimal numbers.

Most common S.M.A.R.T. attributes reference

Note that not all of the attributes are present on all drives. Some attributes are of similar meaning (just counted differently), so only one of them will normally be monitored by the drive. Some require special sensors (e.g. temperature or G-loads monitoring). The decision about which attributes should be implemented is up to the drive vendor. Along the same lines the interpretation of raw values depends heavily on the manufacturer.

Critical device status attributes
Reallocated sectors count Indicates how many defective sectors were discovered on the drive and remapped using a spare sectors pool. Low values  in absence of other fault indications point to the disk surface problem. Raw value indicates the exact number of such sectors.
Current pending sectors count Indicates how many suspected defective sectors are pending "investigation". These will not necessarily be remapped. In fact, such sectors my be not defective at all (e.g. if some transient condition prevented reading of the sector, it will be marked "pending") - they will be then re-tested by the device off-line scan1 procedure and returned to the pool of serviceable sectors. Raw value indicates the exact number of such sectors.
Off-line uncorrectable sectors count Similar to "Reallocated sectors count". Indicates how many defective sectors were found during the off-line scan1.
Read error rate Logs rate at which specified events (errors) occur. Lower value indicates more events (errors). Retries are not necessarily indicate a persistent problem, but one should proceed with caution if any of these attributes is degraded.
Read error retry rate
Write error rate
Seek error rate
Recalibration retries Indicates how often the drive is unable to recalibrate at the first attempt. Raw value may show the exact number of recalibration events (at least with some vendors) but this should be taken with a grain of salt.
Spin up time Low value indicates that a drive takes longer than expected to spin up to its rated speed. Might indicate either a controller or a spindle bearing problem.
Spin retry count Spin retry event is logged each time the drive was unable to spin its platters up to the rated rotation speed in the due time. Spin-up attempt was aborted and retried. This typically indicates severe controller or bearing problem, but may be sometimes caused by power supply problems.
Drive lifetime information
Drive start/stop count These two provide the estimation of the drive wear. Vendor estimates the supposed device lifetime and the number of cycles. The value for these attributes is then computed based on this estimation. The T.E.C. condition with one of these attributes does not necessarily indicate a drive failure, but rather suggests that a drive should be considered unreliable due to the wear and tear. Raw values are typically just the count of events.
Power off/retract cycle count
Power on hours count Normalized values are computed similar to the above. Despite what the name suggests, the raw value of the attribute is stored using all sorts of measurement units (hours, half-hours, or ten-minute intervals to name a few) depending on the manufacturer of the device.
Head flying hours count
Operating conditions information
Temperature Indicates the device temperature, if the appropriate sensor is fitted. Lowest byte of the raw value contains the exact temperature value (Celsius degrees).
Ultra DMA CRC error rate Low value of this attribute typically indicates that something is wrong with the connectors and/or cables. Disk-to-host transfers are protected by CRC error detection code when Ultra-DMA 66 or 100 is used. So if the data gets garbled between the disk and the host machine, the receiving controller senses this and the retransmission is initiated. Such a situation is called "UDMA CRC error". Once the problem is rectified (typically by replacing a cable), the attribute value returns to the normal levels pretty quick.
G-sense error rate Indicates if the errors are occurring attributed to the drive shocking (either due to the environmental factors or due to improper installation). The hard drive must be fitted with the appropriate sensor to get information about the G-loads. This attribute is mainly limited to the notebook (2.5") drives. Once the operation conditions are corrected, the attribute value will  return to normal.

1Off-line scan procedure - When the device is idle for some period of time, it may start various self-diagnostic procedures, including just wandering into the surface to read sectors here and there. You may observe this by letting your system idle (provided that no background process requests disk access) and listening to the hard drive. It will soon start to operate "on its own" - this is an off-line scan underway. This is done to detect any possible growing defects before they become critical.

Copyright © 2001 - 2024 Alexey V. Gubin.