Performance of Fitbit Devices as Tools for Assessing Sleep Patterns and Associated Factors
Article information
Abstract
As sleep duration and quality are associated with various diseases, many devices have been developed to assess sleep. We reviewed the devices used to assess sleep and their performance, focusing on Fitbit, one of the most popular devices for tracking sleep. Although previous studies have reported conflicting results regarding Fitbit’s estimation of the sleep index, most studies have shown that it has acceptable accuracy and high sensitivity. Its performance could be affected by various factors such as participants’ sleep characteristics, sleep hours and stages, analysis methods, devices compared, device models and settings, and the dominant/non-dominant wrist wearing the device. Therefore, these factors should be considered when using Fitbit to assess sleep patterns.
INTRODUCTION
Sleep duration have been reported to be associated with the risk of various diseases including cognitive disorders [1], cardiovascular diseases [2], and cancer [3]. Since having appropriate sleep duration would affect physical and mental health status [4], it is important to assess sleep duration precisely. Although polysomnography (PSG) is considered the gold standard for assessing sleep, it has some limitations because it is time-consuming, expensive, and intrusive [5].
Actigraphy, a technique used to record and analyze activities, has been used as an alternative to PSG. Actigraphy infers wakefulness and sleep based on the movements detected using body-worn devices [6]. Unlike PSG, which relies on changes in brain electrical activity patterns, actigraphy uses small motion sensor detectors, called accelerometers, to detect immobility at sleep onset [7]. Accelerometers also assess sleep-related changes based on heart rate and heart rate variability [8]. Combining heart rate variability with accelerometer data may help identify sleep stages [9]. Actigraphy measures activity over extended periods and monitors rhythm changes, making it more accessible and cost-effective than PSG. Additionally, actigraphy is relatively inexpensive and well received by patients [10].
However, the validity of measuring sleep onset latency (SOL) and daytime sleep using actigraphy is limited [11]. A systematic review indicated the unreliability of actigraphy in detecting periodic limb movements in adult and pediatric patients [10]. Furthermore, actigraphy overestimated the total sleep time (TST) and sleep efficiency (SE) compared with PSG [12]. Many types of devices, including smartwatches [13,14] and rings [15], have been validated in previous studies; however, their performances differ according to the type of device and remain controversial. Although reported as a valid sleep assessment method, the performance of actigraphy can differ according to the device type. For example, a study [16] comparing ActiGraph Link and Actiwatch 2 from Philips for measuring the sleeping period reported that the performance of ActiGraph Link and Actiwatch 2, a validated accelerometer, was similar [16]. However, another study showed that ActiGraph GT9X Link deviated from the sleep diary on most occasions despite performing similarly to Actiwatch 2 [17].
We reviewed the devices used to assess sleep performance, particularly focusing on Fitbit, one of the most popular wearable devices used to track activities, including sleep. Fitbit holds the second largest share in the worldwide market of fitness trackers at 9%, following Apple at 19% [18]. Feehan et al. [19] reported the accuracy of Fitbit devices in a systematic review; however, it included only three studies comparing Fitbit and PSG for sleep assessment. In addition, previous studies have not thoroughly reviewed the accuracy of Fitbit as it may vary depending on factors such as sleep stages [20] and the wrist wearing it [21]. Therefore, a review based on more recent studies and factors influencing performance is necessary to assess the accuracy of Fitbit as a tool for measuring sleep. In this study, we aimed to examine the accuracy of Fitbit in assessing sleep and review the factors affecting its performance.
RESEARCH-GRADE AND CONSUMER-GRADE DEVICES
Previous studies investigating alternatives to PSG for sleep monitoring have used research- and consumer-grade devices. Research-grade devices include actigraphy devices, such as BodyMedia SenseWear [22], ActiGraph GT3X+ [22], Motionlogger sleep watch [23], Motionlogger® Micro Watch Actigraph [24], Phillips-Respironics Mini-Mitter Spectrum [23], and Actiwatch Spectrum [13].
Consumer-grade devices are typically divided into wearable and nonwearable devices. Wearable devices include the Jawbone UP [22], Fatigue Science Readiband [25], Garmin series [25], Basis Health Tracker [13], Withings Pulse O2 [13], Misfit Shine [22], Nike fuelband [22], EEG-based eye mask Neuroon [26], and Fitbit series [23,27-30]. Nonwearable devices that commonly use remote detection of physiological and behavioral signals under a mattress or on a bedside table include EarlySense Live, ResMed S+, SleepScore Max [25], Withings Sleep, Withings Aura, HugOne Sleep Tracking System, and Sleepace Reston (Supplement Table 1 in the online-only Data Supplement) [31].
The validity of consumer-grade devices has been challenged by concerns that they are inferior to research-grade devices, as previous studies have reported poor detection of wakefulness [32]. However, a recent study demonstrated that consumergrade wearable devices can measure sleep duration as accurately as research-grade devices [20].
VALIDITY OF WEARABLE DEVICES FOR SLEEP ASSESSMENT
Chandel et al. [33] reviewed evolution of smartwatches in the biomedical sector. They reported that smartwatches are used for various purposes, including detecting activity and sleep, as well as checking physiological data, such as heart rate and skin temperature. Only a few of the many types of wearable devices for sleep detection have been validated [34]. In Peake et al.’s study [34], UPTM device, Fitbit Charge 2TM, and OURA were the only three validated devices of the 21 devices for monitoring sleep, while the others, including Fibit FlexTM, DREAM device, Plex® Sleep Scanner, Sleep Profiler PSG2, Zmachine®, Somté PSG, Sleep Shepherd, Re-Timer, AYO, Illumy Sleep Smart Mask, HUSH, Kokoon, Dreampad, NightWave Sleep Assistant, Withings Aura and REM Sleep Tracker, Circadia sleep tracker, Beddit 3.0 Sleep Tracker, and ResMed+, were not validated.
However, many devices have shown adequate performance as sleep assessment tools. One study reported a strong correlation of consumer-grade activity monitors, such as Fitbit and Nike Fuelband, with research-grade devices for sleep-time monitoring [22]. A systematic review assessing the accuracy of Fitbit wristbands reported that Fitbit models showed promising performance, in spite of their limited specificity and inability to substitute for PSG [35]. A recent systematic review of wearable devices for estimating sleep onset reported that estimations made using actigraphy devices were not significantly different from those made using PSG [36].
The range of bias for the sleep index differed according to the device type. Peake et al. [34] reviewed wearable devices for monitoring sleep and reported that the UP device overestimated TST and SOL, whereas it underestimated wake after sleep onset (WASO). Another study evaluating five wearable devices and PSG demonstrated no difference in TST among most devices (Withings, Misfit, Fitbit, and Basis); however, the Actiwatch differed from PSG in terms of SE [13]. Compared to the sleep log, mean absolute percent errors for TST were 4%, 8.8%, 10.2%, 12.9%, and 21.6% for Garmin VivoFit, Flex, Jawbone UP, Misfit Shine, and SenseWear Armband Mini, respectively [14]. The Garmin overestimated TST by 43–55 min, SE by 10%, and underestimated WASO by 35 min [24,25]. Many devices, including fatigue science, early sense, and sleep score, tend to overestimate the TST and SE, whereas other types of devices, such as ResMed, underestimate the TST [25].
PERFORMANCE OF FITBIT DEVICES
Previous studies investigated various Fitbit devices for sleep assessment. Two systematic reviews on the accuracy of Fitbit devices reported that these devices provided estimates similar to those of research-grade devices or PSG for sleep hours [19,35] despite having limited specificity [35]. In addition, Benedetti et al. [37] indicated that Fitbit devices are suitable for sleep-related research applications because of their high accuracy in pulse-rate detection during sleep.
Although systematic reviews [19,35] and several studies have reported acceptable accuracy of Fitbit devices [26,38], other studies have shown that these devices have some bias in measuring TST, SE, WASO, SOL, light sleep, and deep sleep, but not rapid eye movement (REM) sleep [23,39,40]. Several studies including two systematic reviews [19,24] have reported that Fitbit overestimated SE29 by 1%–15% [23,25,40]. However, other studies have reported underestimation of SE by 2.5%–4.9% [41,42]. SOL assessments using Fitbit have shown similar results compared to PSG [40] or underestimation by 1.8–23 min [25,42-44]. The results regarding WASO assessment using Fitbit have also been conflicting, showing no significant difference [43], or underestimation [29] by 5.6–44 min [38,40,44], or overestimation by 2.8–41 min [20,41,42,45].
A systematic review of Fitbit wristbands reported that Fitbit overestimated TST (7–67 min) and SE (2%–15 %), underestimated WASO (6–44 min), and showed no significant difference in SOL with respect to PSG, with 81%–90% accuracy, 87%–99% sensitivity, and 10%–52% specificity [35]. However, previous studies reported conflicting results. While most studies assessing Fitbit performance compared to PSG showed similar TST [29], or overestimation of TST [46] by 2.6–60 min [23,25,38-40,43-45], some studies reported contradictory results (underestimation by 11–47 min) [20,41,42].
Previous studies have shown varying results depending on the device model in terms of REM sleep data. Compared to PSG, Fitbit measured a statistically insignificant difference of 43 2.7 minutes more [45], or 4.7 minutes less [42]. Fitbit devices have been reported to overestimate REM latency by 29 min [45] and light sleep by 10.4–37.7 min [42,43,45]. Three studies reported underestimation of deep sleep by 11.2–41.4 min [42,43,45], while one study reported overestimation of deep sleep [46].
Overall sensitivity of Fitbit for sleep assessment has been high in previous studies (88%–99%) [20,29,40,42-44]; while specific sensitivity to each stage of sleep has varied. The sleep cycle of humans is divided into two phases of REM and non-REM sleep (deep sleep), including three progressively deeper sleep phases of N1 to N3. N1 represents the lightest stage of sleep, N2 represents a deeper sleep, and N3 represents the deepest stage of sleep. This classification is based on variations in muscle tone, brainwave patterns, and eye movements [47]. The sensitivity of Fitbit has been reported as 68% for wakefulness [41], 53.4%–78% for light sleep [41,45], 27.9%–59% for deep sleep [41,45], 54.8%–69% for REM [41,45], and 42.8% for WASO [45]. Several studies have reported low specificity of Fitbit (10%–43.9%) [29,40,42,44], while other studies have reported a specificity of 61% [43] or higher than 88% [20]. Overall accuracy of Fitbit is 86.5%–88% [40,42], while it is 81% for light sleep, 49% for deep sleep, and 74% for REM sleep [43].
COMPARISON BETWEEN FITBIT AND OTHER ACTIGRAPHY DEVICES
Although the Actiwatch overestimated TST and WASO and underestimated SE and SOL [42,48,49], compared to Fitbit in some studies, other studies have reported contrary results [27,28,50]. Several studies showed no significant differences between Fitbit and other actigraphy devices, such as Galaxy [30] and Motionlogger® Micro Watch Actigraph: MMWA [24].
FACTORS AFFECTING THE PERFORMANCE OF FITBIT DEVICES
Participants’ sleep characteristics
The performance of smartwatches seemed to differ according to the participants’ sleep characteristics. A study comparing Fitbit and PSG reported that the sensitivity and specificity of Fitbit varied according to the participants’ characteristics, showing 89.3% sensitivity for normal PSG, 87.6% for obstructive sleep apnea, and 88.2% for obstructive sleep apnea under continuous positive airway pressure treatment [44].
Several studies have shown that sleep monitoring using a Fitbit device is acceptable only in non-clinical situations [26,42,41]. Liang et al. [26] compared Fitbit and a wearable electroencephalogram-based eye mask (Neuroon) with a medical sleep monitor. They reported that sleep tracking using consumer-grade devices was reasonably satisfactory only for general purposes and non-clinical use because the performance of these devices in measuring sleep structure was not satisfactory [26]. Dong et al. [42] reported that Fitbit cannot replace PSG in measuring sleep variables and determining sleep stage classification in patients with chronic insomnia.
Menghini et al. [41] reported that although Fitbit underestimated TST and SE and overestimated WASO in participants with and without insomnia, the bias was two-fold in participants with insomnia compared with healthy participants. However, Baroni et al. [51] reported that Fitbit Flex is not an appropriate tool to measure sleep in non-clinical situations because only 14% of the Fitbit Flex devices recorded a significant amount of sleep data.
Sleep hours and sleep stages
The performance of Fitbit sleep index differs according to the sleep stage. Lee et al. [20] reported the overall sensitivity of Fitbit was >90%, but it was >68% during light sleep, 50% during deep sleep, and 72% during REM sleep. Another study reported variations in Fitbit sensitivity according to sleep stage, with 68%, 78%, 59%, and 69% sensitivity during wakefulness, light sleep, deep sleep, and REM sleep, respectively [41]. Thus, Fitbit showed better sensitivity during light and REM sleep but lower sensitivity during deep sleep and wakefulness [41].
The performance of Fitbit devices can also differ according to sleep duration. One study showed that Fitbit overestimated N1+N2 light sleep by an average of 9.9 min during 5-hour sleep, while it underestimated light sleep by an average of 20.7 min during 9-hour sleep [20]. Menghini et al. [41] suggested that the discrepancies between Fitbit and PSG data were higher in patients with longer sleep durations (N1+N2 duration >225 min and N3 duration >80 min). However, Fitbit showed a higher sensitivity for longer REM sleep (43.2% sensitivity for REM sleep <120 min and 57.0% for REM sleep >120 min) in another study [45]. Burkart et al. [52] reported that Fitbit and ActiGraph underestimated WASO and SE at higher values for each index.
Analysis methods and comparison
The equivalence between sleep assessment devices may differ according to the analytical method used. For example, TST difference between Fitbit and Actiwatch was -5.37 min for the manually merged data, but it was 26.97 min for the automatically merged data in a study by Castner et al. [48] Moreover, coding of the data could change their prediction value so that replacing missing data as “wakefulness data” increased the specificity but decreased the sensitivity of Fitbit [48]. The raw signals obtained using different methods and devices are preprocessed in different ways, leading to significant differences in activity signals [53]. Pini et al. [54] reported the validity of a heart rate-based algorithm for sleep stage classification as feasible and accurate in their investigation. Regarding sleep stage classification, one systematic review indicated that the methods for grouping sleep stages and metrics used to evaluate performance were largely heterogeneous [55].
Although many studies have evaluated the performance of Fitbit using PSG as the gold standard, several have compared it with self-reporting for sleep assessment [14,56]. Sleep status was coded and analyzed epoch-by-epoch in the comparison between Fitbit and PSG [29]. Thus, it can provide a more accurate measure of the performance of Fitbit devices than a sleep diary. Liu et al. [56] compared the Fitbit and a consensus sleep diary to assess sleep. They showed that the disagreement between the sleep diary and Fitbit data was sufficiently small, except for awakenings per night. Fitbit recorded 2.15 times more awakenings per night than the sleep diary [56].
Device models and settings
Brooke et al. [14] compared eight devices with a sleep log and reported that Fitbit Flex and Fitbit Charge HR showed mean absolute percent errors of 8.8% and 11.5%, respectively, for TST measurement. In another study including adults with obstructive sleep apnea, sensitivity and specificity were 87.4% and 35.0%, respectively, for Fitbit Charge 3, while they were 88.1% and 51.9%, respectively, for Fitbit Alta HR [44].
The performance of Fitbit devices differed according to device settings. One systematic review reported that Fitbit overestimated TST and SE by more than 10% in the normal mode but it underestimated TST and SE by more than 15% in the sensitive mode [19]. Other studies have shown that the Fitbit overestimated the TST in the normal mode and underestimated it in the sensitive mode [23,40].
Dominant/non-dominant wrist wearing a device
Although one study reported no difference in the accuracy of sleep-stage classification between the dominant and nondominant wrists [57], another study showed that the data from the dominant and non-dominant wrists were significantly different [21]. While Cleverger et al. [21] reported that non-dominant wrist-worn Fitbit data recorded greater average activity levels, another previous study showed contrary results, demonstrating higher movement of dominant wrists than that of non-dominant wrists [58].
CONCLUSION
Although several devices have been developed and used to assess sleep quality, only a few have been validated. Fitbit, one of the most popular devices for assessing activity and sleep, has shown acceptable accuracy and high sensitivity in previous studies. It can be used to monitor sleep in free-living conditions; however, the evidence to support its use in clinical settings is insufficient because of the conflicting biases shown in the measurement of TST, REM sleep, deep sleep, SE, and WASO, and as it overestimated REM latency and light sleep and underestimated SOL. Fitbit performance can differ according to various factors, such as participants’ sleep characteristics, sleep hours and stages, analysis methods, devices used for comparison, device models, and device settings. These factors should be considered when assessing sleep using activity-tracking devices, such as Fitbit, in research and everyday life settings.
Supplementary Materials
The online-only Data Supplement is available with this article at https://doi.org/10.13078/jsm.240010.
Notes
The authors have no potential conflicts of interest to disclose.
Author Contributions
Conceptualization: Ji-Eun Park. Data curation: Ji-Eun Park, Eun Kyoung Ahn. Formal analysis: Ji-Eun Park. Funding acquisition: Ji-Eun Park. Investigation: Ji-Eun Park, Eun Kyoung Ahn. Methodology: Ji-Eun Park, Kyuhyun Yoon, Jayeun Kim. Validation: Kyuhyun Yoon, Jayeun Kim. Writing—original draft: Ji-Eun Park. Writing—review & editing: Eun Kyoung Ahn, Kyuhyun Yoon, Jayeun Kim.
Funding Statement
This research was supported by the Korea Institute of Oriental Medicine (KSN20234113 and KSN1732121).
Acknowledgements
None