Continuous - a range of numbers (measurement data)
Discrete - take on separate, distinct values (countable data)
Create a (random) daily-life dataset
library(tidyverse) # let's load in everythingset.seed(7) # we are going to create something random# the 'seed' is so my random looks like your randomN <-60#number of observationslifelog <-tibble(id =1:N, # identifier (categorical by meaning)age =sample(18:35, N, replace =TRUE), # numeric (discrete integer)height_cm =round(rnorm(N, mean =170, sd =10), 1), # continuouscommute_mode =sample(c("Walk","Bike","Transit","Car"), N, replace =TRUE), # categorical (nominal)coffee_cups =sample(0:5, N, replace =TRUE), # counts (discrete numeric)coffee_today = coffee_cups >0, # logical/binary (categorical by meaning)study_hours =round(runif(N, 0, 6), 1), # continuousmood =factor(sample(c("Low","Medium","High"), N, replace =TRUE),levels =c("Low","Medium","High"), ordered =TRUE), # categorical (ordinal)zip_code =sample(c("20001","20002","20037","20052"), N, replace =TRUE) # numeric-looking categorical)
The Two Big Families
Family
Meaning
R classes you’ll see
Typical summaries
Typical visuals
Categorical
labels/groups (nominal or ordered)
factor, ordered, character, logical
counts, proportions
bar charts, stacked bars
Continuous
measurements in a range
numeric, double, integer
mean/median, sd/IQR, quantiles
histogram, line plot, scatterplot, boxplot
Categorical Data
Classify
What counts as categorical?
Nominal: commute_mode, zip_code
Ordinal: mood (Low < Medium < High)
Binary: coffee_today (TRUE/FALSE)
Summarize one categorical variable
# counts & proportionslifelog |>count(commute_mode) |>mutate(prop = n /sum(n)) |>arrange(desc(n))
# A tibble: 4 × 3
commute_mode n prop
<chr> <int> <dbl>
1 Car 17 0.283
2 Walk 17 0.283
3 Transit 15 0.25
4 Bike 11 0.183
Visualize
Single categorical variable: use bar chart
ggplot(lifelog, aes(x = commute_mode)) +geom_bar(fill="chartreuse4") +labs(x ="Commute mode", y ="Count")
Visualize
Two categorical variables: dodged bars
ggplot(lifelog, aes(x = commute_mode, fill = mood)) +geom_bar(position ="dodge") +labs(x ="Commute mode", y ="Count", fill ="Mood (ordinal)")
Visualize
Two categorical variables: stacked bars
ggplot(lifelog, aes(x = commute_mode, fill = mood)) +geom_bar(position ="fill") +scale_y_continuous(labels = scales::percent) +labs(x ="Commute mode", y ="Proportion", fill ="Mood")
Continuous Data
Classify
What counts as continuous?
Measurements: height_cm, study_hours
Counts (discrete numeric): coffee_cups (treated similarly but integer-valued)