[1] '2.0.0'
Sarah Cassie Burnett
September 4, 2025
or install it during lecture:
install.packages("tidyverse")
Scan and fill out for participation credit
Scan and fill out for participation credit
(these take a whole vector and return a single summary value)
mean() mean of a set of numbersmedian() median of a set of numberssd() standard deviation of a set of numberssum() sum of a set of numberslength() length of a vectormax() / min() maximum and minimum values of a vector(these apply a transformation to each element of a vector)
round() round to a specified number of decimal placessqrt() square rootlog() natural logarithmexp() exponentialabs() absolute valueread_csv() is a functionsample() is a functionreadr is a package that contains the read_csv() functionggplot2 is a package that contains the ggplot() functioninstall.packages() to install packageslibrary() to load packagesreadr for reading datatidyr for data tidyingdplyr for data manipulationggplot2 for data visualizationlibrary(ggplot2)library(tidyverse)::, e.g. ggplot2::ggplot()What is this?
name <- c("Cars", "WALL-E", "The Lego Movie", "PAW Patrol: The Movie")
lead_person <- c("Lightning McQueen (Owen Wilson)",
"WALL-E (Ben Burtt)",
"Emmet Brickowski (Chris Pratt)",
"Ryder (Will Brisbin)")
length_minutes <- c(120, 97, 101, 86)
award <- c(TRUE, TRUE, TRUE, FALSE)
df <- data.frame(
name,
lead_person,
length_minutes,
award
)Download some data: Download the ZIP version
Let’s use the readr package to read in a dataset
One way to do this is with the base R head() function
# A tibble: 6 × 9
Year Length Title Genre `Lead Man` `Lead Woman` Director Popularity Awards
<dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <lgl>
1 1990 111 Tie Me … Come… Banderas,… Abril, Vict… Almodóv… 68 FALSE
2 1991 113 High He… Come… Bosé, Mig… Abril, Vict… Almodóv… 68 FALSE
3 1983 104 Dead Zo… Horr… Walken, C… Adams, Broo… Cronenb… 79 FALSE
4 1979 122 Cuba Acti… Connery, … Adams, Broo… Lester,… 6 FALSE
5 1978 94 Days of… Drama Gere, Ric… Adams, Broo… Malick,… 14 FALSE
6 1983 140 Octopus… Acti… Moore, Ro… Adams, Maud Glen, J… 68 FALSE
View()Another way to look at the data is with View(). Or click on the name of the data frame in the Environment pane.
glimpse() from dplyrAnother way to look at the data is with glimpse() from the dplyr package.
Rows: 1,659
Columns: 9
$ Year <dbl> 1990, 1991, 1983, 1979, 1978, 1983, 1984, 1989, 1985, 199…
$ Length <dbl> 111, 113, 104, 122, 94, 140, 101, 99, 104, 149, 188, 117,…
$ Title <chr> "Tie Me Up! Tie Me Down!", "High Heels", "Dead Zone, The"…
$ Genre <chr> "Comedy", "Comedy", "Horror", "Action", "Drama", "Action"…
$ `Lead Man` <chr> "Banderas, Antonio", "Bosé, Miguel", "Walken, Christopher…
$ `Lead Woman` <chr> "Abril, Victoria", "Abril, Victoria", "Adams, Brooke", "A…
$ Director <chr> "Almodóvar, Pedro", "Almodóvar, Pedro", "Cronenberg, Davi…
$ Popularity <dbl> 68, 68, 79, 6, 14, 68, 14, 28, 6, 32, 81, 17, 46, 49, 6, …
$ Awards <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
film_cleanish.csv file05:00
dplyr FunctionsUse select() to choose columns.
dplyr FunctionsUse filter() to choose rows.
Note
Using the same name for the data frame results in overwriting the original data frame. If you want to keep the original data frame, use a different name.
dplyr FunctionsUse mutate() to create new columns.
Rows: 1,659
Columns: 10
$ Year <dbl> 1990, 1991, 1983, 1979, 1978, 1983, 1984, 1989, 1985, 199…
$ Length <dbl> 111, 113, 104, 122, 94, 140, 101, 99, 104, 149, 188, 117,…
$ Title <chr> "Tie Me Up! Tie Me Down!", "High Heels", "Dead Zone, The"…
$ Genre <chr> "Comedy", "Comedy", "Horror", "Action", "Drama", "Action"…
$ `Lead Man` <chr> "Banderas, Antonio", "Bosé, Miguel", "Walken, Christopher…
$ `Lead Woman` <chr> "Abril, Victoria", "Abril, Victoria", "Adams, Brooke", "A…
$ Director <chr> "Almodóvar, Pedro", "Almodóvar, Pedro", "Cronenberg, Davi…
$ Popularity <dbl> 68, 68, 79, 6, 14, 68, 14, 28, 6, 32, 81, 17, 46, 49, 6, …
$ Awards <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
$ Length_hours <dbl> 1.850000, 1.883333, 1.733333, 2.033333, 1.566667, 2.33333…
dplyr verbs to manipulate the data05:00
ggplot2ggplot2 is a powerful data visualization packageggplot2ggplot2 to make a simple column chart05:00