ggplot2

Getting Started with ggplot2 in R

By Kelvin Kiprono in ggplot2

September 15, 2024

ggplot2 is one of the most popular data visualization packages in R, known for its versatility and ease of creating beautiful, complex graphics. It’s built on the “grammar of graphics” concept, which provides a logical structure for building a plot in layers. Whether you’re new to data visualization or experienced in R, ggplot2 offers an intuitive and powerful approach to creating plots.

Key Components of ggplot2

The grammar of graphics in ggplot2 consists of several components:

  • Data: The dataset you want to plot.
  • Aesthetics (aes): Mappings that define how data variables are visualized, such as x and y coordinates, colors, shapes, and sizes.
  • Geometries (geom): The type of plot (e.g., points, lines, bars).
  • Facets: Separate plots by subsets of data (e.g., different panels for each group).
  • Statistics (stat): Representations of data summaries (e.g., counts, averages).
  • Coordinates (coord): Control the mapping of data to the plotting space.
  • Themes: Adjust the non-data elements, such as text, axes, backgrounds, and colors.

We are going to explore the penguins data from the palmerpenguins package.

Loading dataset

library(palmerpenguins)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
head(penguins,5)
## # A tibble: 5 × 8
##   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
## 1 Adelie  Torgersen           39.1          18.7               181        3750
## 2 Adelie  Torgersen           39.5          17.4               186        3800
## 3 Adelie  Torgersen           40.3          18                 195        3250
## 4 Adelie  Torgersen           NA            NA                  NA          NA
## 5 Adelie  Torgersen           36.7          19.3               193        3450
## # ℹ 2 more variables: sex <fct>, year <int>
tail(penguins,5)
## # A tibble: 5 × 8
##   species   island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##   <fct>     <fct>           <dbl>         <dbl>             <int>       <int>
## 1 Chinstrap Dream            55.8          19.8               207        4000
## 2 Chinstrap Dream            43.5          18.1               202        3400
## 3 Chinstrap Dream            49.6          18.2               193        3775
## 4 Chinstrap Dream            50.8          19                 210        4100
## 5 Chinstrap Dream            50.2          18.7               198        3775
## # ℹ 2 more variables: sex <fct>, year <int>
levels(penguins$species)
## [1] "Adelie"    "Chinstrap" "Gentoo"
levels(penguins$sex)
## [1] "female" "male"
count(penguins,sex)
## # A tibble: 3 × 2
##   sex        n
##   <fct>  <int>
## 1 female   165
## 2 male     168
## 3 <NA>      11

First, you need to install and load ggplot2

library(ggplot2)
ggplot(penguins,aes(flipper_length_mm,body_mass_g,colour = sex)) +
  geom_point() + 
  geom_smooth(method='lm')+
  theme_minimal()+
  ggthemes::theme_tufte() +
  ggtitle("Relationship between flipper_length and body mass based on sex")
## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_smooth()`).

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).

  • The plot shows relationship between flipper_length and body mass differentiated by sex.

  • ggplot2 allows for extensive customization through themes. Use theme_minimal(), theme_classic(), or other themes to change the appearance, or customize individual elements with theme()

  • ggthemes:: theme_tufte() removes the unnecessary background grids and colours.

ggplot(penguins,aes(flipper_length_mm,body_mass_g)) +
  geom_point() + 
  geom_smooth(method='lm')+
  facet_wrap(~sex)+
  theme_minimal()+
  ggthemes::theme_tufte() +
  ggtitle("Relationship between flipper_length and body mass based on sex")
## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_smooth()`).

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).

library(plotly)
## Warning: package 'plotly' was built under R version 4.4.2

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

Plotly is an interactive graphing library for R that makes it easy to create interactive, web-ready visualizations. It is particularly useful for exploring data, sharing findings, and building dashboards with enhanced user experiences.

relationship <- ggplot(penguins,aes(flipper_length_mm,body_mass_g,colour = sex)) +
  geom_point() + 
  geom_smooth(method='lm')+
  theme_minimal()+
  ggthemes::theme_tufte() +
  ggtitle("Relationship between flipper_length and body mass based on sex")
ggplotly(relationship)
## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_smooth()`).
ggplot(data = penguins, aes(x = flipper_length_mm)) +
  geom_histogram(aes(fill = species), alpha = 0.5, position = "identity") +
  scale_fill_manual(values = c("darkorange","darkorchid","cyan4"))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_bin()`).

Flipper <- ggplot(data = penguins, aes(x = flipper_length_mm)) +
  geom_histogram(aes(fill = species), alpha = 0.5, position = "identity") +
  scale_fill_manual(values = c("darkorange","darkorchid","cyan4"))
ggplotly(Flipper)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_bin()`).

Conclusion

With its layered grammar approach, ggplot2 is ideal for creating both basic and complex visualizations. It’s an essential package for anyone working with data in R and is highly flexible, enabling users to convey insights effectively with minimal code. Try experimenting with different geoms, facets, and themes to unlock the full potential of your data visualizations!

Posted on:
September 15, 2024
Length:
32 minute read, 6670 words
Categories:
ggplot2
See Also: