Script that showcases what will be the script produced during the workshop

Setup

Here we will load the libraries we will use to facilitate preparing and analyzing the data.

library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.2     ✔ purrr   1.0.2
✔ tibble  3.2.1     ✔ dplyr   1.1.1
✔ tidyr   1.2.0     ✔ stringr 1.4.1
✔ readr   2.1.2     ✔ forcats 1.0.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
library(broom)
library(easystats)
# Attaching packages: easystats 0.6.0 (red = needs update)
✖ bayestestR  0.13.1   ✖ correlation 0.8.4 
✖ datawizard  0.8.0    ✖ effectsize  0.8.3 
✖ insight     0.19.7   ✖ modelbased  0.8.6 
✖ performance 0.10.4   ✖ parameters  0.21.1
✖ report      0.5.7    ✖ see         0.8.0 

Restart the R-Session and update packages in red with `easystats::easystats_update()`.
library(effectsize)

Read data

Let’s read the data from an URL (online) using the tidyverse function read_csv.

data <- read_csv("https://raw.githubusercontent.com/mario-bermonti/talks/refs/heads/main/intro_r_cognition/data.csv")
Rows: 100 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): id, group
dbl (2): digit_span_forward, digit_span_backward

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Inspect the data

The first step when we get data is to inspect it to understand its structure and content.

First few rows

head(data)
id group digit_span_forward digit_span_backward
S001 healthy 5 6
S002 healthy 4 4
S003 healthy 8 5
S004 healthy 2 4
S005 healthy 8 5
S006 healthy 5 5

Number of rows and columns

dim(data)
[1] 100   4

Variable types

str(data)
spc_tbl_ [100 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ id                 : chr [1:100] "S001" "S002" "S003" "S004" ...
 $ group              : chr [1:100] "healthy" "healthy" "healthy" "healthy" ...
 $ digit_span_forward : num [1:100] 5 4 8 2 8 5 7 6 9 6 ...
 $ digit_span_backward: num [1:100] 6 4 5 4 5 5 5 2 7 5 ...
 - attr(*, "spec")=
  .. cols(
  ..   id = col_character(),
  ..   group = col_character(),
  ..   digit_span_forward = col_double(),
  ..   digit_span_backward = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

Effect of MDD on digit span forward

We will examine the effect of Major Depressive Disorder (MDD) on the digit span forward task.

We will begin by visualizing the data, then we will calculate descriptive statistics, and finally, we will perform inferential statistics.

Viz

Box plot

We will start by creating a box plot to visualize the data.

We will use ggplot2 to create the plot. This is one of the most popular packages for data visualization in R.

Plots with ggplot2 are built in layers, where each layer adds elements on top of the previous one. We add these layers using +, which is great it is an intuitive math operation.

ggplot(
    data,
    aes(x = group, y = digit_span_forward)
) +
geom_boxplot()

Bar plot

Bar plots are a little bit different and require us to calculate the mean before plotting.

Let’s calculate the means we will plot.

Notice that we are now using a |> symbol. This is a convenient way to perform multiple sequential operations on data since you “chain” these operations with |> (pipe operator). You should read it as:

“do operation 1

“and then (|>)”

“do operation 2

If you think about it is a natural way to express the steps below:

  • take the data
  • and then group it by the group variable
  • and then calculate the mean of the digit_span_forward variable
  • and then calculate the mean of the digit_span_backward variable
means <- data |>
    group_by(group) |>
    summarise(
        mean_dsf = mean(digit_span_forward),
        mean_dsb = mean(digit_span_backward)
    )

means
group mean_dsf mean_dsb
depression 4.88 4.22
healthy 5.78 5.40

Now, let’s build the bar plot

ggplot(
    data = means,
    aes(x = group, y = mean_dsf)
) + 
geom_bar(stat = "identity")

Violin plot

One great thing about R is that it can create all sort of beautiful and informative plots.

Violin and dot plots are great ways to visualize the distribution of the data. We will combine them into a single plot by combining ggplot2 and easystats’s see package.

ggplot(
    data,
    aes(x = group, y = digit_span_forward)
) + 
geom_violindot()

Descriptive stats

As we know, summarizing the data is a crucial step in data analysis. It helps us understand the data by identifying patterns, checking for errors, and assumptions.

Let’s calculate the mean, standard deviation, minimum, and maximum of the digit_span_forward variable.

This is easily achieved in R using tidyverse’s summarise function.

General

Let’s first summarize the data without considering the groups.

data |>
    summarise(
        mean_dsf = mean(digit_span_forward),
        sd_dsf = sd(digit_span_forward),
        min_dsf = min(digit_span_forward),
        max_dsf = max(digit_span_forward)
    )
mean_dsf sd_dsf min_dsf max_dsf
5.33 1.98507 2 10

By group

Now, let’s summarize the data for each group to allow group comparisons.

data |>
    group_by(group) |>
    summarise(
        mean_dsf = mean(digit_span_forward),
        sd_dsf = sd(digit_span_forward),
        min_dsf = min(digit_span_forward),
        max_dsf = max(digit_span_forward)
    )
group mean_dsf sd_dsf min_dsf max_dsf
depression 4.88 1.814229 2 9
healthy 5.78 2.063186 2 10

Inferential

Now that we understand the data better, we can perform inferential statistics to test if the differences we observed are statistically significant.

We will use a t-test because we are comparing the means of two groups.

R provides the t.test function which performs a t-test. Be mindful about the formula syntax used to specify the predictor and outcome.

The final bit tidy formats the results as a nice table.

results_ttest <- t.test(
    digit_span_forward ~ group,
    data = data
)

tidy(results_ttest)   
estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method alternative
-0.9 4.88 5.78 -2.316364 0.022656 96.42301 -1.671202 -0.1287975 Welch Two Sample t-test two.sided

Let’s also calculate the standardized effect size, Hedges’ g, to understand the magnitude of the effect.

Again, notice the formula syntax used to specify the predictor and outcome.

hedges_g(
  data = data,
  digit_span_forward ~ group
)
Hedges_g CI CI_low CI_high
-0.4597168 0.95 -0.8528339 -0.0643165

Effect of MDD on digit span backward

Let’s now complete a second exercise but focusing this time on the digit span backward task.

You will not find extensive descriptions because the code is mostly the same, except for the variable names.

Viz

Box plot

ggplot(
    data,
    aes(x = group, y = digit_span_backward)
) +
geom_boxplot()

Bar plot

Let’s calculate the means we will plot

means <- data |>
    group_by(group) |>
    summarise(
        mean_dsb = mean(digit_span_backward),
        mean_dsb = mean(digit_span_backward)
    )

means
group mean_dsb
depression 4.22
healthy 5.40

Let’s build the visualization

ggplot(
    data = means,
    aes(x = group, y = mean_dsb)
) + 
geom_bar(stat = "identity")

Violin plot

ggplot(
    data,
    aes(x = group, y = digit_span_backward)
) + 
geom_violindot()

Descriptive stats

General

data |>
    summarise(
        mean_dsb = mean(digit_span_backward),
        sd_dsb = sd(digit_span_backward),
        min_dsb = min(digit_span_backward),
        max_dsb = max(digit_span_backward)
    )
mean_dsb sd_dsb min_dsb max_dsb
4.81 1.567956 2 9

By group

data |>
    group_by(group) |>
    summarise(
        mean_dsb = mean(digit_span_backward),
        sd_dsb = sd(digit_span_backward),
        min_dsb = min(digit_span_backward),
        max_dsb = max(digit_span_backward)
    )
group mean_dsb sd_dsb min_dsb max_dsb
depression 4.22 1.374550 2 7
healthy 5.40 1.538618 2 9

Inferential

results_ttest <- t.test(
    digit_span_backward ~ group,
    data = data
)

tidy(results_ttest)   
estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method alternative
-1.18 4.22 5.4 -4.044164 0.0001056 96.77985 -1.759116 -0.6008838 Welch Two Sample t-test two.sided
hedges_g(
  data = data,
  digit_span_backward ~ group
)
Hedges_g CI CI_low CI_high
-0.8026242 0.95 -1.205577 -0.395881