Processing Physical Activity to a Normalized Quantile • activerse

This report follows the structure of the original project page at johnmuschelli.com/upper_limb_gt3x_prosthesis and the source document at muschellij2/upper_limb_gt3x_prosthesis/index.Rmd. The goal is the same: show a fully reproducible pipeline for going from raw GT3X+ data to a summary that can be mapped with participant demographics.

The report is written so that every download is stored on disk under data/ instead of tempfile(). That keeps the workflow repeatable across knits and prevents the slow network work from running over and over.

Setup

We first load the packages used by the analysis and tell reticulate which Python environment to manage. The Python stepcount module is imported explicitly so the step-counting part of the pipeline is reproducible too.

has_reticulate = requireNamespace("reticulate", quietly = TRUE)
if (has_reticulate) {
  Sys.setenv(
    RETICULATE_PYTHON = "managed",
    PATH = paste(path.expand("~/.local/bin"), Sys.getenv("PATH"), sep = ":"),
    # R_USER_CACHE_DIR = file.path("data", "r-cache"),
    UV_CACHE_DIR = file.path("data", "uv-cache")
    # UV_PYTHON_PREFERENCE = "only-system",
    # UV_OFFLINE = "1",
    # UV_PRERELEASE = "allow"
  )
  library(reticulate)
  reticulate::py_require("stepcount==3.11.0", python_version = "3.10")
  sc <- reticulate::import("stepcount")
  try({sc$stepcount})
  if (requireNamespace("stepcount", quietly = TRUE)) {
    stepcount::stepcount_check()
  }
}
#> Downloading uv...Done!
#> Warning in reticulate::py_require("stepcount>=3.11.0", python_version =
#> "3.10"): After Python has initialized, only `action = 'add'` with new packages
#> is supported. You tried to add `stepcount>=3.11.0` but requirements contain
#> `stepcount==3.11.0` already.
#> [1] TRUE

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> 
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> 
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(janitor)
#> 
#> Attaching package: 'janitor'
#> 
#> The following objects are masked from 'package:stats':
#> 
#>     chisq.test, fisher.test
if (requireNamespace("readxl", quietly = TRUE)) {
  library(readxl)
}
library(activerse)
#> Attaching activerse packages
#> - actibase 0.0.1
#> - actiread 0.0.2
#> - actimetrics 0.0.1
#> - actisensorlog 0.0.1
#> - actiwalkability 0.0.1
#> - actiquantiles 0.0.1

The report keeps downloaded files in a small directory structure so that each artifact only has to be fetched once.

raw_dir <- file.path("data", "raw")
derived_dir <- file.path("data", "derived")
dir.create(raw_dir, recursive = TRUE, showWarnings = FALSE)
dir.create(derived_dir, recursive = TRUE, showWarnings = FALSE)

download_if_missing <- function(url, dest) {
  dir.create(dirname(dest), recursive = TRUE, showWarnings = FALSE)
  if (!file.exists(dest)) {
    download.file(url, destfile = dest, quiet = FALSE, mode = "wb")
  }
  dest
}

Goal

The data come from a publicly released study of upper-limb activity in people with and without upper-limb prostheses. The original project pairs raw GT3X+ files with a metadata spreadsheet, then uses those data to build an analysis that can be checked end-to-end with open-source tooling.

This report keeps the same logic, but it emphasizes why each step exists:

Download and cache the raw accelerometer file.
Read the file with an open-source GT3X reader.
Process the raw samples into minute-level and daily summaries.
Rebuild the demographic table from the spreadsheet used in demog_mapping.R.
Combine the daily activity summary with the demographics and generate the mapping display.

Download Raw GT3X+

The sample GT3X+ file is downloaded once and kept on disk. That makes the report self-contained after the first run and avoids repeated network access.

gt3x_url <- "https://ndownloader.figshare.com/files/21855555"
gt3x_path <- download_if_missing(
  gt3x_url,
  file.path(raw_dir, "AI1_NEO1B41100255_2016-10-17.gt3x.gz")
)
gt3x_path
#> [1] "data/raw/AI1_NEO1B41100255_2016-10-17.gt3x.gz"

Read the File

The raw file is read directly from the gzipped GT3X+ archive. No temporary copy is needed because the reader can work with the file in place.

data <- acti_read_gt3x(gt3x_path, cleanup = TRUE)
#> ℹ Filling zeros in data
#> ✔ Filled zeros in data
#> ℹ Timezone not applied to data
data
#> # A tibble: 18,143,952 × 4
#>    time                     X      Y      Z
#>    <dttm>               <dbl>  <dbl>  <dbl>
#>  1 2016-10-17 16:00:00 -0.569 -0.106 -0.68 
#>  2 2016-10-17 16:00:00 -0.584 -0.106 -0.695
#>  3 2016-10-17 16:00:00 -0.786 -0.106 -0.768
#>  4 2016-10-17 16:00:00 -0.695 -0.082 -0.812
#>  5 2016-10-17 16:00:00 -0.78  -0.018 -0.783
#>  6 2016-10-17 16:00:00 -0.883  0.003 -0.865
#>  7 2016-10-17 16:00:00 -0.894  0.067 -0.883
#>  8 2016-10-17 16:00:00 -0.938  0.155 -0.897
#>  9 2016-10-17 16:00:00 -1.05   0.217 -0.818
#> 10 2016-10-17 16:00:00 -1.14   0.282 -0.768
#> # ℹ 18,143,942 more rows

Process the Accelerometry

The helper below mirrors the original script, but the comments explain the purpose of each stage. Optional calibration is kept as an argument so the same function can be reused if a later analysis needs it.

checker = function(data) {
  data = data %>%
    dplyr::mutate(
      date = as_date_safe(time),
      minute = lubridate::floor_date(time, unit = "minutes"),
      hourtime = hms::as_hms(time))
  
  data = data %>%
    dplyr::distinct(date, minute, hourtime, wear)
  
  # Check for duplicates - should not be there
  suppressMessages({
    dupes = janitor::get_dupes(data, date, minute)
  })
  if (nrow(dupes) > 0) {
    stop("There are duplicate rows of date and minute!")
  }
}
acti_run <- function(data, calibrate = FALSE,
                     run_stepcount = TRUE,
                     run_forest = FALSE) {
  if (calibrate) {
    data <- acti_calibrate(data)
  }
  
  message("Processing Data")
  # The reader already returns resampled data, so we can start processing.
  processed <- acti_process(data)
  
  if (run_stepcount) {
    if (requireNamespace("stepcount", quietly = TRUE) ||
        !stepcount::have_stepcount()) {
      run_stepcount = FALSE
      cli::cli_warn(paste0("`run_stepcount` = TRUE, but 
                    {.help [have_stepcount](stepcount::have_stepcount)} ", 
                    "came back `FALSE`"))
    }
  }
  if (run_stepcount) {
    message("Running Stepcount")
    # Step counts come from the Python-backed stepcount implementation.
    steps <- acti_calculate_stepcount(data) %>% 
      rename(steps_stepcount_ssl = steps) 
    
    processed <- processed %>%
      dplyr::full_join(steps)
    
    steps <- acti_calculate_stepcount(data, model_type = "rf") %>% 
      rename(steps_stepcount_rf = steps) %>% 
      select(-any_of("walking"))
    processed <- processed %>%
      dplyr::full_join(steps)
  }
  
  # Verisense steps  
  steps <- acti_calculate_verisense(data, resample_to_15hz = TRUE,
                                    method = "original") %>% 
    rename(steps_vs_original = steps)
  processed <- processed %>%
    dplyr::full_join(steps)
  
  # Verisense steps Revised
  steps <- acti_calculate_verisense(data, resample_to_15hz = TRUE,
                                    method = "revised") %>% 
    rename(steps_vs_revised = steps)
  processed <- processed %>%
    dplyr::full_join(steps)
  
  steps <- acti_calculate_sdt(data) %>% 
    rename(steps_sdt = steps)
  processed <- processed %>%
    dplyr::full_join(steps)
  
  if (run_forest) {
    if (!reticulate::py_module_available("forest")) {
      run_forest = FALSE
      cli::cli_warn(paste0("`run_forest` = TRUE, but 
                    {.help [have_forest](walking::have_forest)} ", 
                    "came back `FALSE`"))
    }
  }
  if (run_forest) {  
    steps <- acti_calculate_forest(data) %>% 
      rename(steps_forest = steps)
    processed <- processed %>%
      dplyr::full_join(steps)  
  }
  
  
  
  # Flag which days have enough valid wear time to count as included.
  inclusion <- create_day_inclusion(processed, min_required = 1368L)
  processed <- processed %>%
    mutate(date = as.Date(time)) %>%
    left_join(inclusion)
  
  # A small helper for daily totals that ignore missing values.
  na_sum <- function(x) sum(x, na.rm = TRUE)
  
  # Collapse the minute-level data to daily totals for the measures we care about.
  daily <- processed %>%
    acti_create_date() %>%
    group_by(date) %>%
    summarise(
      dplyr::across(
        c(any_of(c("counts", "counts_log10", "MIMS", "mims")),
          starts_with("steps")
        ),
        na_sum
      )
    )
  
  daily <- daily %>%
    full_join(inclusion)
  
  list(
    minute_level = processed,
    inclusion = inclusion,
    daily = daily
  )
}

out <- acti_run(data)
#> Processing Data
#> [1] "Creating Downsampled Data"
#> [1] "Filtering Data"
#> [1] "Trimming Data"
#> [1] "Getting data back to 10Hz for accumulation"
#> [1] "Summing epochs"
#> Joining with `by = join_by(timestamp)`
#> Warning: `run_stepcount` = TRUE, but have_stepcount (`?stepcount::have_stepcount()`)
#> came back `FALSE`
#> Finding Peak Locations
#> Thresholding Peak by VM Threshold
#> Thresholding Peak by Periodicity
#> Thresholding Peak by Similarity
#> Thresholding Peak by Continuity
#> Joining with `by = join_by(time)`
#> Finding Peak Locations
#> Thresholding Peak by VM Threshold
#> Thresholding Peak by Periodicity
#> Thresholding Peak by Similarity
#> Thresholding Peak by Continuity
#> Joining with `by = join_by(time)`
#> Registered S3 methods overwritten by 'signal': method from print.freqs gsignal
#> print.freqz gsignal print.grpdelay gsignal plot.grpdelay gsignal print.impz
#> gsignal print.specgram gsignal plot.specgram gsignal
#> sdt completed
#> Joining with `by = join_by(time)`
#> Joining with `by = join_by(date)`
#> Joining with `by = join_by(date)`

The minute-level output shows the raw time series after step counts, wear-time classification, and the log-count transform have been added.

out$minute_level %>%
  select(time, counts, starts_with("steps"), counts_log10, date, is_included) %>%
  slice_head(n = 8) %>%
  knitr::kable()

time	counts	steps_vs_original	steps_vs_revised	steps_sdt	counts_log10	date	is_included
2016-10-17 16:00:00	3939	0	1	25	3.595496	2016-10-17	FALSE
2016-10-17 16:01:00	3835	2	5	27	3.583879	2016-10-17	FALSE
2016-10-17 16:02:00	1954	1	4	15	3.291147	2016-10-17	FALSE
2016-10-17 16:03:00	1771	2	1	9	3.248464	2016-10-17	FALSE
2016-10-17 16:04:00	283	0	0	1	2.453318	2016-10-17	FALSE
2016-10-17 16:05:00	333	0	0	1	2.523747	2016-10-17	FALSE
2016-10-17 16:06:00	78	0	0	1	1.897627	2016-10-17	FALSE
2016-10-17 16:07:00	20	0	0	0	1.322219	2016-10-17	FALSE

The inclusion table records which days met the wear-time threshold. The threshold is study-specific, but the key idea is that downstream summaries should only use days with enough valid data.

out$inclusion %>%
  slice_head(n = 8) %>%
  knitr::kable()

date	n_minutes_wear	n_minutes_observed	prop_minutes_wear	prop_minutes_wear_from_observed	is_included
2016-10-17	382	1440	0.2652778	0.2652778	FALSE
2016-10-18	1440	1440	1.0000000	1.0000000	TRUE
2016-10-19	1440	1440	1.0000000	1.0000000	TRUE
2016-10-20	1440	1440	1.0000000	1.0000000	TRUE
2016-10-21	1300	1440	0.9027778	0.9027778	FALSE
2016-10-22	1440	1440	1.0000000	1.0000000	TRUE
2016-10-23	1440	1440	1.0000000	1.0000000	TRUE
2016-10-24	959	1440	0.6659722	0.6659722	FALSE

The daily table is the compact summary that gets paired with the demographic data.

out$daily %>%
  slice_head(n = 8) %>%
  knitr::kable()

date	counts	counts_log10	steps_vs_original	steps_vs_revised	steps_sdt	n_minutes_wear	n_minutes_observed	prop_minutes_wear	prop_minutes_wear_from_observed	is_included
2016-10-17	861018	1038.614	3013	3123	8807	382	1440	0.2652778	0.2652778	FALSE
2016-10-18	2752651	2996.724	9130	9412	28916	1440	1440	1.0000000	1.0000000	TRUE
2016-10-19	1932515	2554.070	8703	9251	20835	1440	1440	1.0000000	1.0000000	TRUE
2016-10-20	1654903	2131.413	5838	6279	16212	1440	1440	1.0000000	1.0000000	TRUE
2016-10-21	2261210	2814.163	7546	7627	22837	1300	1440	0.9027778	0.9027778	FALSE
2016-10-22	1640473	2321.432	7863	7162	21173	1440	1440	1.0000000	1.0000000	TRUE
2016-10-23	1742237	2408.970	3806	2831	17726	1440	1440	1.0000000	1.0000000	TRUE
2016-10-24	1027828	1481.536	5047	5104	11712	959	1440	0.6659722	0.6659722	FALSE

Reproduce the Demographic Data

The next section reproduces the cleaned demographic table from demog_mapping.R. It is wrapped in a collapsible block so the report stays readable, but the code still runs when the document is rendered. The chunk is cached and the spreadsheet is stored locally, so the expensive work only happens once unless the inputs change.

Show how the demographic table is rebuilt

if (requireNamespace("readxl", quietly = TRUE)) {
  meta_url <- "https://ndownloader.figshare.com/files/21881742"
  meta_path <- download_if_missing(meta_url, file.path(raw_dir, "Metadata.xlsx"))
  
  meta <- readxl::read_excel(meta_path)
  
  # Excel sometimes creates placeholder columns with names like "..1".
  bad_col <- grepl("^\\.\\.", colnames(meta))
  colnames(meta)[bad_col] <- NA
  
  # The first couple of rows contain extra header information, so we stitch the
  # real column names together before dropping those rows.
  potential_headers <- rbind(colnames(meta), meta[1:2, ])
  potential_headers <- apply(potential_headers, 2, function(x) {
    x <- paste(na.omit(x), collapse = "")
    x <- sub(" .csv", ".csv", x)
    x <- sub(" .wav", ".wav", x)
    x <- gsub(" ", "_", x)
    x
  })
  colnames(meta) <- potential_headers
  
  # The imported table is easiest to use after dropping the extra header rows and
  # normalizing the names to snake_case.
  meta <- meta[-c(1:2), ] %>%
    janitor::clean_names()
  
  # Standardize the participant identifier and clean obvious formatting artifacts.
  meta <- meta %>%
    rename(id = participant_identifier) %>%
    filter(!is.na(id), id != "Participant Identifier") %>%
    mutate(across(where(is.character), ~ gsub("ü", "yes", .x)))
  
  # Convert the fields that should behave like numbers.
  meta <- meta %>%
    mutate(
      age = readr::parse_number(as.character(age)),
      time_since_prescription_of_a_myoelectric_prosthesis_years =
        readr::parse_number(as.character(
          time_since_prescription_of_a_myoelectric_prosthesis_years
        ))
    )
  
  # "Congenital" means there is no time since limb loss, so we encode that as 0.
  meta <- meta %>%
    mutate(
      time_since_limb_loss_years = ifelse(
        time_since_limb_loss_years == "Congenital",
        0,
        time_since_limb_loss_years
      )
    )
  
  demog_lookup <- meta %>%
    filter(id == "AI1") %>%
    select(id, gender, age, right_sensor)
  
  demog_lookup
  
  # The mapping function only needs sex and age, so we convert the cleaned table
  # into the compact input expected by the quantile map.
  demog <- demog_lookup %>%
    transmute(sex = gender, age = age)
  
  demog
  
  saveRDS(demog_lookup, file.path(derived_dir, "demog_lookup_AI1.rds"))
  saveRDS(demog, file.path(derived_dir, "demog_AI1_for_map.rds"))
}
#> New names:
#> • `` -> `...5`
#> • `` -> `...11`
#> • `` -> `...12`
#> • `` -> `...13`
#> • `` -> `...14`
#> • `` -> `...15`
#> • `` -> `...16`
#> • `` -> `...17`
#> • `` -> `...18`
#> • `` -> `...19`
#> • `` -> `...20`

Combine Activity and Demographics

Now we keep only included days, average the daily summaries, and attach the demographic fields. This is the final data frame passed to the map function.

map_data <- out$daily %>%
  filter(is_included) %>%
  summarise(
    dplyr::across(starts_with("steps"), mean),
    ac = mean(counts)
  ) %>%
  bind_cols(demog) %>%
  pivot_longer(
    cols = c(starts_with("steps"), ac),
    names_to = "measure",
    values_to = "value"
  )

map_data
#> # A tibble: 4 × 4
#>   sex     age measure              value
#>   <chr> <dbl> <chr>                <dbl>
#> 1 F        27 steps_vs_original    7068 
#> 2 F        27 steps_vs_revised     6987 
#> 3 F        27 steps_sdt           20972.
#> 4 F        27 ac                1944556.

The map itself is a compact visual summary of the mean steps and accelerometer counts for the included days, positioned relative to the demographic inputs.

actiquantiles::acti_map_nhanes(map_data)
#> # A tibble: 4 × 5
#>   sex     age measure              value acti_pa_quantile
#>   <chr> <dbl> <chr>                <dbl>            <dbl>
#> 1 F        27 steps_vs_original    7068            0.212 
#> 2 F        27 steps_vs_revised     6987            0.250 
#> 3 F        27 steps_sdt           20972.          NA     
#> 4 F        27 ac                1944556.           0.0977

Why This Structure Helps

The report separates the analysis into small, named steps so each part is easy to verify:

The raw files are downloaded to persistent paths, so future knits do not spend time on repeat downloads.
The GT3X+ reader works directly on the saved file, which keeps the raw data untouched.
The processing function isolates the transformation logic, making it easier to test or reuse.
The demographic reconstruction is hidden by default but fully documented and cached.
The final mapping step uses the cleaned, joined data rather than a manually assembled example table.

That makes the document suitable both as a narrative report and as a precise reproduction of the original analysis workflow.