9 Advanced Visualizations

9.1 Preliminaries

9.1.1 Install Packages

For this lecture, you need to install a lot of packages. Please do this before our lecture as it will take a long time to install them. Furthermore, there might be issues in installing, such as needing to install other packages. Please let the professor know if you encounter any issues.

# Install necessary packages
if (!require(gganimate)) install.packages("gganimate")
## Loading required package: gganimate
if (!require(ggplot2)) install.packages("ggplot2")
if (!require(dplyr)) install.packages("dplyr")
if (!require(sf)) install.packages("sf")
## Loading required package: sf
## Linking to GEOS 3.13.1, GDAL 3.11.4, PROJ 9.7.0; sf_use_s2()
## is TRUE
if (!require(rnaturalearth)) install.packages("rnaturalearth")
## Loading required package: rnaturalearth
if (!require(readr)) install.packages("readr")
if (!require(tidyr)) install.packages("tidyr")
if (!require(gifski)) install.packages("gifski")
## Loading required package: gifski
if (!require(WDI)) install.packages("WDI")
## Loading required package: WDI
if (!require(leaflet)) install.packages("leaflet")
## Loading required package: leaflet
if (!require(rnaturalearthdata)) install.packages("rnaturalearthdata")
## Loading required package: rnaturalearthdata
## 
## Attaching package: 'rnaturalearthdata'
## The following object is masked from 'package:rnaturalearth':
## 
##     countries110
#Load packages
library(gganimate)
library(ggplot2)
library(dplyr)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)
library(readr)
library(tidyr)
library(gifski)
library(WDI)
library(leaflet)

9.2 Animated Visualizations

gganimate includes animation to ggplot2; It adds some classes to the plot object in order to customise how it should change with time.

  • transition_*() defines how the data should be spread out and how it relates to itself across time.

  • view_*() defines how the positional scales should change along the animation.

  • shadow_*() defines how data from other points in time should be presented in the given point in time.

  • enter_*()/exit_*() defines how new data should appear and how old data should disappear during the course of the animation.

  • ease_aes() defines how different aesthetics should be eased during transitions

    For this lecture, we will use the quotas dataset and fetch some World Bank Development Indicators from the WDI package.

9.2.1 Animated Bar Chart

At the start, you need to do the same steps as that of when doing the visualizations without animations.

rm(list = ls())
gc()
##           used  (Mb) gc trigger  (Mb) max used  (Mb)
## Ncells 3486903 186.3    5173005 276.3  5173005 276.3
## Vcells 6874057  52.5   14786712 112.9 10624508  81.1
# Load the dataset
quotas <- read.csv("quotas.csv")

ch5.1<-quotas %>% 
  ggplot(aes(x=factor(AC_type_noST)))+
  geom_bar(fill="darkgreen", color="black")+
  labs(title = "Animated Bar Chart: Assembly Constituencies",
       x="Reservation Status", y="Count")+
  theme_minimal()+
  transition_states(AC_type_noST, transition_length = 2, state_length = 1)

The transition_states animates transitions between different categorical states. The AC_type_noST is the categorical variable that defines different animation states. The transition_length controls how long the transition between states lasts, measured in animation frames. While the state_length defines how long each state remains static before transitioning to the next one.

This is very important; it is different for Quarto where the gif is automatically rendered; however, later, we will find out how to save in Quarto.

anim_file <- "bar_chart.gif"
animate(ch5.1, duration = 4, fps = 10, renderer = gifski_renderer(anim_file), 
        preview = TRUE)  # Preview the animation before rendering  
# Save animation as a GIF
knitr::include_graphics(anim_file)

The animate function generates the animation of the ggplot object ch5.1. duration = 4: Specifies that the animation should run for 4 seconds in total. fps = 10: Defines the frame rate, meaning the animation will show 10 frames per second. renderer = gifski_renderer(anim_file): Uses the gifski_renderer to save the animation as a GIF file named bar_chart.gif. preview = TRUE: Allows you to view the animation immediately in the RStudio Viewer before saving it as a file.

9.2.2 Animated Scatter Plot

ch5.2<-quotas %>% 
  ggplot(aes(x=Plit71, y=P_W71))+
  geom_point(aes(color=factor(AC_type_noST)))+
  labs(title="Animated Scatter Plot: Literacy vs. Employment",
       x="Literacy Rate (1971)", y="Employment Rate (1971)", color="Reservation Status")+
  theme_minimal()+
  transition_reveal(Plit71)

Here, the transition_reveal animates the points to be revealed over literacy rates

anim_file2<-"scatter_plot.gif"
animate(ch5.2, duration=20, fps=10, renderer=gifski_renderer(anim_file2),
        preview=TRUE)
knitr::include_graphics(anim_file2)

To slow down the animation, increase the duration and decrease fps.

9.2.3 Animated Faceted Scatter Plot

9.2.3.1 Fetching WDI Data

For this portion, we will make use of the WDI database. To search which indicator you wish to work with, type WDIsearch("keyword")

ch5<-WDI(
  country = c("USA","CHN", "IND", "BRA","NLD", "JPN"),
  indicator = c("NY.GDP.PCAP.CD", "SP.DYN.LE00.IN"),
  start = 2000,
  end = 2020,
  extra = TRUE
)

This code chunk pulls data from the World Bank WDI database of 6 countries and two indicators (GDP per capita (current US$) and Life Expectancy at Birth (years). It fetches data from 2000 to 2020 and includes extra metadata such as region names and income levels.

#Cleaning the dataset
ch5 <- ch5 %>%
  rename(GDPpc = NY.GDP.PCAP.CD, Life_Exp = SP.DYN.LE00.IN)
ch5<-ch5 %>% 
  filter(!is.na(GDPpc), !is.na(Life_Exp))
head(ch5)
##   country iso2c iso3c year status lastupdated    GDPpc
## 1  Brazil    BR   BRA 2000         2026-01-28 3766.548
## 2  Brazil    BR   BRA 2001         2026-01-28 3176.290
## 3  Brazil    BR   BRA 2002         2026-01-28 2855.940
## 4  Brazil    BR   BRA 2003         2026-01-28 3090.607
## 5  Brazil    BR   BRA 2004         2026-01-28 3663.823
## 6  Brazil    BR   BRA 2005         2026-01-28 4827.782
##   Life_Exp                    region  capital longitude
## 1   69.584 Latin America & Caribbean Brasilia  -47.9292
## 2   69.980 Latin America & Caribbean Brasilia  -47.9292
## 3   70.396 Latin America & Caribbean Brasilia  -47.9292
## 4   70.884 Latin America & Caribbean Brasilia  -47.9292
## 5   71.361 Latin America & Caribbean Brasilia  -47.9292
## 6   71.832 Latin America & Caribbean Brasilia  -47.9292
##   latitude              income lending
## 1 -15.7801 Upper middle income    IBRD
## 2 -15.7801 Upper middle income    IBRD
## 3 -15.7801 Upper middle income    IBRD
## 4 -15.7801 Upper middle income    IBRD
## 5 -15.7801 Upper middle income    IBRD
## 6 -15.7801 Upper middle income    IBRD
ch5.3<-ch5 %>% 
  ggplot(aes(x=GDPpc, y=Life_Exp))+
  geom_point(aes(color=region), alpha=0.7, size=3)+
  labs(title="Faceted Scatter Plot: GDP vs. Life Expectancy",
       subtitle = "Year: 2000-2020",
       x="GDP per Capita (USD)",
       y="Life Expectancy (Years)",
       color="Region")+
  theme_minimal()+
  facet_wrap(~country, ncol=3)+
  scale_x_log10()+ #log scale for better visualization
  transition_states(year, transition_length = 2, state_length = 1)

There are additional things here like the size=3 which changes the size of points. the scale_x_log10() was added because it applies a logarithmic scale to the x-axis since GDP per capital values vary widely, and a log scale makes comparisons clearer. The transition_states has year since each frame would represent a different year.

anim_file3<-"faceted_scatter_plot.gif"
animate(ch5.3, duration=10, fps=15, renderer=gifski_renderer(anim_file3),
        preview=TRUE)
knitr::include_graphics(anim_file3)

This does not really provide any information. Let us try visualizing in a different way.

9.3 Animated Time Series

Let us just choose USA for this.

ch5_1<-ch5 %>% 
  filter(iso3c=="USA")
ch5.4<-ch5_1 %>% 
  ggplot(aes(x=year, y=GDPpc))+
  geom_line(color="green", size = 1.2)+
  geom_point(color="purple", size=2)+
  labs(title = "US GDP per Capita Growth over Time",
       subtitle = "Year:2000-2020",
       x="Year",
       y="GDP per Capita (Current US$)")+
  theme_minimal()+
  transition_reveal(year)
anim_file4<-"us_timeseries.gif"
animate(ch5.4, duration = 10, fps = 15, renderer = gifski_renderer(anim_file4), 
        preview = TRUE)
knitr::include_graphics(anim_file4)

9.4 Animated Faceted Time Series

ch5.5<-ch5 %>% 
  ggplot(aes(x=year, y=GDPpc, group=country))+
  geom_line(aes(color=country), size=1.2)+
  labs(title = "Faceted Time Series: GDP Growth Over Time",
       subtitle = "Year: 2000-2020",
       x="Year",
       y="GDP (Current US$)",
       color="Country")+
  theme_minimal()+
  facet_wrap(~country, scales = "free_y")+ #allows free-scaling in the y-axis
  transition_reveal(year)
anim_file5<-"timeseries.gif"
animate(ch5.5, duration = 10, fps = 15, renderer = gifski_renderer(anim_file5), 
        preview = TRUE)
knitr::include_graphics(anim_file5)

9.4.1 Saving in Quarto

For Data Presentation, you need to save interactive visualizations so that you can place them in your presentations.

# Create an animated plot
p <- ggplot(DATA, aes(MAPPINGS)) +
  geom_function +
  transition_states(gear, transition_length = 2, state_length = 1)

# Save as GIF
anim_save("animation.gif", p)

# Save as MP4
anim_save("animation.mp4", p)

9.5 Maps

9.5.1 Where to get shapefiles?

The rnaturalearth provides some shapefiles you can use. A suggested site to find shapefiles of different countries: https://gadm.org/index.html . You can also locate shapefiles from the government sites.

We will make use of the packages like WDI, rnaturalearth,sf, gganimate, and leaflet. Do note however, that gganimate and leaflet does not work for PDF, thus, it can only be used for your Data Story Presentation.

9.6 Static Maps

9.6.1 Fetching GDP Data from WDI

rm(list=ls())
gc()
##           used  (Mb) gc trigger  (Mb) max used  (Mb)
## Ncells 3571300 190.8    6548841 349.8  5173005 276.3
## Vcells 7735268  59.1   14786712 112.9 11551640  88.2
ch5_2<-WDI(country = "all", 
           indicator = "NY.GDP.MKTP.CD",
           start = 2000, 
           end = 2022,
           extra=TRUE)

#clean the dataset
ch5_2<-ch5_2 %>% 
  rename(gdp=NY.GDP.MKTP.CD,
         iso_a3=iso2c) %>% 
  drop_na(gdp) #dropping missing values

We are retrieving GDP data for all countries from 2000 to 2022 and we are also renaming columns to match with map data found in the rnaturalearth. We also remove missing GDP values

ch5map<-ne_countries(scale = "medium", returnclass = "sf")

This fetches the world map with country boundaries in sf (simple features) format to visualize the GDP data. We fetch medium-scale country boundaries in spatial data format and the returnclass="sf" ensures it can be used with ggplot2

#Merge
wgdp<-ch5map %>% 
  left_join(ch5_2, by="iso_a3")

We need to match the GDP data with the corresponding country for visualization. You can opt to retrieve the world map to find out how to merge both.

ggplot(data = wgdp) +
  geom_sf(aes(fill = gdp_md)) +
  scale_fill_viridis_c(option = "plasma", trans = "log", na.value = "grey") +
  theme_minimal() +
  labs(title = "GDP by Country",
       subtitle = "Data from WDI",
       fill = "GDP (log scale)")

We use the merged map and GDP data and geom_sf(aes(fill=gdp_md)) filles countries based on GDP. The scale_fill_viridis_c uses “plasma” which is best used for maps. The log scale transformation is used to better differentiate large economies and small economies and grey fill for missing data.

9.6.1.1 Animated World Map for GDP

Now we create an animated version of the map. You need to check that your time variable does not have any NA. It would be better to choose the years that are available consecutively so we will filter the data to only include from 2009-2019.

unique(wgdp$gdp_year)
##  [1] 2019 2014 2018 2017 2016 2013 2010 2009 2012 2006 2015
## [12] 2003 2007 2011
# Filter for years 2009-2019 
wgdp_filtered <- wgdp %>%    
  filter(gdp_year %in% 2009:2019)
ch5.6 <- ggplot(data = wgdp_filtered) +   
  geom_sf(aes(fill = gdp_md)) +   
  scale_fill_viridis_c(option = "plasma", trans = "log", na.value = "grey50") +   
  theme_minimal() +   
  labs(title = "World GDP by Country: {closest_state}",        
       subtitle = "Data from World Development Indicators",        
       fill = "GDP (log scale)") +   
  transition_states(gdp_year) +     
  ease_aes('linear')
anim_file6<-"worldgdp.gif"
animate(ch5.6, duration = 10, fps = 15, renderer = gifski_renderer(anim_file6), 
        preview = TRUE)
knitr::include_graphics(anim_file6)

9.7 Philippine Regional Map

We will use Ch6.xlsx containing Regional GDP and use the Philippine Regions shapefile.

rm(list=ls())
gc()
##           used  (Mb) gc trigger  (Mb) max used  (Mb)
## Ncells 3642159 194.6    6548841 349.8  6009573 321.0
## Vcells 8148717  62.2   14786712 112.9 14770706 112.7
library(sf)
layers<-st_layers("PH_Adm1_Regions.shp")
print(layers)
## Driver: ESRI Shapefile 
## Available layers:
##            layer_name geometry_type features fields
## 1 PH_Adm1_Regions.shp       Polygon       17      7
## 2         Regions.shp       Polygon       17     19
##                crs_name
## 1 WGS 84 / UTM zone 51N
## 2                WGS 84
sp_df = sf::st_read( 
  dsn= 'PH_Adm1_Regions.shp', 
  layer="Regions.shp")
## Reading layer `Regions.shp' from data source 
##   `C:\Users\jemma\OneDrive\Documents\DLSU\LBOMETR_Book\PH_Adm1_Regions.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 17 features and 19 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 114.2779 ymin: 4.587294 xmax: 126.605 ymax: 21.12189
## Geodetic CRS:  WGS 84

The code loads the sf package then st_layers checks all available layers in the shapefile because some shapefiles contains multiple layers like administrative boundaries, water bodies, etc. so we display the available layers in the console. The st_read(dsn "PH_Adm1_Regions.shp") specifies the data source and the layer = "Regions.shp" specifies the layer name to read (the one we will use and need).

Let us check if the shapefile was loaded correctly.

print(sp_df) #check structure
## Simple feature collection with 17 features and 19 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 114.2779 ymin: 4.587294 xmax: 126.605 ymax: 21.12189
## Geodetic CRS:  WGS 84
## First 10 features:
##     psgc_code                            name corr_code
## 1   100000000        Region I (Ilocos Region)  10000000
## 2   200000000      Region II (Cagayan Valley)  20000000
## 3   300000000      Region III (Central Luzon)  30000000
## 4   400000000        Region IV-A (CALABARZON)  40000000
## 5   500000000         Region V (Bicol Region)  50000000
## 6   600000000     Region VI (Western Visayas)  60000000
## 7   700000000    Region VII (Central Visayas)  70000000
## 8   800000000   Region VIII (Eastern Visayas)  80000000
## 9   900000000 Region IX (Zamboanga Peninsula)  90000000
## 10 1000000000    Region X (Northern Mindanao) 100000000
##    geo_level city_class inc_class urb_rur pop_2015 pop_2020
## 1        Reg       <NA>      <NA>    <NA>     <NA>     <NA>
## 2        Reg       <NA>      <NA>    <NA>     <NA>     <NA>
## 3        Reg       <NA>      <NA>    <NA>     <NA>     <NA>
## 4        Reg       <NA>      <NA>    <NA>     <NA>     <NA>
## 5        Reg       <NA>      <NA>    <NA>     <NA>     <NA>
## 6        Reg       <NA>      <NA>    <NA>     <NA>     <NA>
## 7        Reg       <NA>      <NA>    <NA>     <NA>     <NA>
## 8        Reg       <NA>      <NA>    <NA>     <NA>     <NA>
## 9        Reg       <NA>      <NA>    <NA>     <NA>     <NA>
## 10       Reg       <NA>      <NA>    <NA>     <NA>     <NA>
##    status adm1_pcode                         adm1_en
## 1    <NA>       PH01        Region I (Ilocos Region)
## 2    <NA>       PH02      Region II (Cagayan Valley)
## 3    <NA>       PH03      Region III (Central Luzon)
## 4    <NA>       PH04        Region IV-A (Calabarzon)
## 5    <NA>       PH05         Region V (Bicol Region)
## 6    <NA>       PH06     Region VI (Western Visayas)
## 7    <NA>       PH07    Region VII (Central Visayas)
## 8    <NA>       PH08   Region VIII (Eastern Visayas)
## 9    <NA>       PH09 Region IX (Zamboanga Peninsula)
## 10   <NA>       PH10    Region X (Northern Mindanao)
##               adm1_alt adm0_pcode           adm0_en
## 1        Ilocos Region         PH Philippines (the)
## 2       Cagayan Valley         PH Philippines (the)
## 3        Central Luzon         PH Philippines (the)
## 4           Calabarzon         PH Philippines (the)
## 5         Bicol Region         PH Philippines (the)
## 6      Western Visayas         PH Philippines (the)
## 7      Central Visayas         PH Philippines (the)
## 8      Eastern Visayas         PH Philippines (the)
## 9  Zamboanga Peninsula         PH Philippines (the)
## 10   Northern Mindanao         PH Philippines (the)
##          date shape_len shape_area shape_sqkm
## 1  2022-11-09  14.99505   1.043983   12307.35
## 2  2022-11-09  19.13905   2.241812   26387.73
## 3  2022-11-09  15.94956   1.793513   21304.16
## 4  2022-11-09  27.62549   1.326720   15846.63
## 5  2022-11-09  44.92324   1.446324   17338.38
## 6  2022-11-09  27.77415   1.657591   20047.63
## 7  2022-11-09  29.11311   1.178431   14293.66
## 8  2022-11-09  42.02116   1.726804   20835.68
## 9  2022-11-09  23.18144   1.196677   14596.05
## 10 2022-11-09  15.00295   1.435115   17489.29
##                          geometry
## 1  MULTIPOLYGON (((120.9714 18...
## 2  MULTIPOLYGON (((121.9488 21...
## 3  MULTIPOLYGON (((122.2342 16...
## 4  MULTIPOLYGON (((122.3079 14...
## 5  MULTIPOLYGON (((122.9882 11...
## 6  MULTIPOLYGON (((121.4341 12...
## 7  MULTIPOLYGON (((124.093 11....
## 8  MULTIPOLYGON (((124.3678 12...
## 9  MULTIPOLYGON (((123.4129 8....
## 10 MULTIPOLYGON (((124.6987 9....

We then check available attributes and also check the region names so that we can merge the shapefile with our GDP data.

colnames(sp_df)
##  [1] "psgc_code"  "name"       "corr_code"  "geo_level" 
##  [5] "city_class" "inc_class"  "urb_rur"    "pop_2015"  
##  [9] "pop_2020"   "status"     "adm1_pcode" "adm1_en"   
## [13] "adm1_alt"   "adm0_pcode" "adm0_en"    "date"      
## [17] "shape_len"  "shape_area" "shape_sqkm" "geometry"
head(sp_df)
## Simple feature collection with 6 features and 19 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 119.7497 ymin: 9.444193 xmax: 124.4249 ymax: 21.12189
## Geodetic CRS:  WGS 84
##   psgc_code                        name corr_code geo_level
## 1 100000000    Region I (Ilocos Region)  10000000       Reg
## 2 200000000  Region II (Cagayan Valley)  20000000       Reg
## 3 300000000  Region III (Central Luzon)  30000000       Reg
## 4 400000000    Region IV-A (CALABARZON)  40000000       Reg
## 5 500000000     Region V (Bicol Region)  50000000       Reg
## 6 600000000 Region VI (Western Visayas)  60000000       Reg
##   city_class inc_class urb_rur pop_2015 pop_2020 status
## 1       <NA>      <NA>    <NA>     <NA>     <NA>   <NA>
## 2       <NA>      <NA>    <NA>     <NA>     <NA>   <NA>
## 3       <NA>      <NA>    <NA>     <NA>     <NA>   <NA>
## 4       <NA>      <NA>    <NA>     <NA>     <NA>   <NA>
## 5       <NA>      <NA>    <NA>     <NA>     <NA>   <NA>
## 6       <NA>      <NA>    <NA>     <NA>     <NA>   <NA>
##   adm1_pcode                     adm1_en        adm1_alt
## 1       PH01    Region I (Ilocos Region)   Ilocos Region
## 2       PH02  Region II (Cagayan Valley)  Cagayan Valley
## 3       PH03  Region III (Central Luzon)   Central Luzon
## 4       PH04    Region IV-A (Calabarzon)      Calabarzon
## 5       PH05     Region V (Bicol Region)    Bicol Region
## 6       PH06 Region VI (Western Visayas) Western Visayas
##   adm0_pcode           adm0_en       date shape_len
## 1         PH Philippines (the) 2022-11-09  14.99505
## 2         PH Philippines (the) 2022-11-09  19.13905
## 3         PH Philippines (the) 2022-11-09  15.94956
## 4         PH Philippines (the) 2022-11-09  27.62549
## 5         PH Philippines (the) 2022-11-09  44.92324
## 6         PH Philippines (the) 2022-11-09  27.77415
##   shape_area shape_sqkm                       geometry
## 1   1.043983   12307.35 MULTIPOLYGON (((120.9714 18...
## 2   2.241812   26387.73 MULTIPOLYGON (((121.9488 21...
## 3   1.793513   21304.16 MULTIPOLYGON (((122.2342 16...
## 4   1.326720   15846.63 MULTIPOLYGON (((122.3079 14...
## 5   1.446324   17338.38 MULTIPOLYGON (((122.9882 11...
## 6   1.657591   20047.63 MULTIPOLYGON (((121.4341 12...
library(dplyr)
unique(sp_df$name)
##  [1] "Region I (Ilocos Region)"                               
##  [2] "Region II (Cagayan Valley)"                             
##  [3] "Region III (Central Luzon)"                             
##  [4] "Region IV-A (CALABARZON)"                               
##  [5] "Region V (Bicol Region)"                                
##  [6] "Region VI (Western Visayas)"                            
##  [7] "Region VII (Central Visayas)"                           
##  [8] "Region VIII (Eastern Visayas)"                          
##  [9] "Region IX (Zamboanga Peninsula)"                        
## [10] "Region X (Northern Mindanao)"                           
## [11] "Region XI (Davao Region)"                               
## [12] "Region XII (SOCCSKSARGEN)"                              
## [13] "National Capital Region (NCR)"                          
## [14] "Cordillera Administrative Region (CAR)"                 
## [15] "Region XIII (Caraga)"                                   
## [16] "MIMAROPA Region"                                        
## [17] "Bangsamoro Autonomous Region In Muslim Mindanao (BARMM)"
sp_df <- sp_df %>%
  rename(region = name)

Now, we load our Regional GDP package;

library(readxl)
rgdp<-read_excel("Ch5PracticeB.xlsx",
                 sheet = "Sheet1")
## New names:
## • `` -> `...1`
head(rgdp)
## # A tibble: 6 × 26
##   ...1  region      `2000` `2001` `2002` `2003` `2004` `2005`
##   <chr> <chr>        <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1 NCR   National C… 2.42e9 2.48e9 2.51e9 2.62e9 2.84e9 2.99e9
## 2 CAR   Cordillera… 1.44e8 1.49e8 1.56e8 1.64e8 1.73e8 1.76e8
## 3 I     Ilocos Reg… 2.40e8 2.45e8 2.53e8 2.64e8 2.78e8 2.91e8
## 4 II    Cagayan Va… 1.58e8 1.64e8 1.66e8 1.71e8 1.86e8 1.83e8
## 5 III   Central Lu… 7.07e8 7.49e8 7.92e8 8.26e8 8.52e8 8.86e8
## 6 IVA   CALABARZON  1.05e9 1.07e9 1.12e9 1.18e9 1.24e9 1.31e9
## # ℹ 18 more variables: `2006` <dbl>, `2007` <dbl>,
## #   `2008` <dbl>, `2009` <dbl>, `2010` <dbl>, `2011` <dbl>,
## #   `2012` <dbl>, `2013` <dbl>, `2014` <dbl>, `2015` <dbl>,
## #   `2016` <dbl>, `2017` <dbl>, `2018` <dbl>, `2019` <dbl>,
## #   `2020` <dbl>, `2021` <dbl>, `2022` <dbl>, `2023` <dbl>

We need to convert the columns to have the same format first so that we can edit all of them later on when we transform to long format since different formats cannot be combined when modifying to long format.

colnames(rgdp)<-as.character(colnames(rgdp))
rgdp<-rgdp %>% select(-"...1")

Now, we can modify

library(tidyverse)
rgdpl<-rgdp %>% 
  pivot_longer(cols = -region, names_to = "Year", values_to = "GDP") %>% 
  mutate(Year=as.numeric(Year),
         GDP=as.numeric(GDP))
head(rgdpl)
## # A tibble: 6 × 3
##   region                   Year         GDP
##   <chr>                   <dbl>       <dbl>
## 1 National Capital Region  2000 2416391870.
## 2 National Capital Region  2001 2483504980.
## 3 National Capital Region  2002 2507171644.
## 4 National Capital Region  2003 2624052475.
## 5 National Capital Region  2004 2841837836.
## 6 National Capital Region  2005 2988546218.
unique(rgdpl$region)
##  [1] "National Capital Region"         
##  [2] "Cordillera Administrative Region"
##  [3] "Ilocos Region"                   
##  [4] "Cagayan Valley"                  
##  [5] "Central Luzon"                   
##  [6] "CALABARZON"                      
##  [7] "MIMAROPA Region"                 
##  [8] "Bicol Region"                    
##  [9] "Western Visayas"                 
## [10] "Central Visayas"                 
## [11] "Eastern Visayas"                 
## [12] "Zamboanga Peninsula"             
## [13] "Northern Mindanao"               
## [14] "Davao Region"                    
## [15] "SOCCSKSARGEN"                    
## [16] "Caraga"                          
## [17] "BARMM"

Since we know that the names are different in the shapefile, we rename the regions so that they match through mapping.

region_mapping <- c(
  "Ilocos Region" = "Region I (Ilocos Region)",
  "Cagayan Valley" = "Region II (Cagayan Valley)",
  "Central Luzon" = "Region III (Central Luzon)",
  "CALABARZON" = "Region IV-A (CALABARZON)",
  "Bicol Region" = "Region V (Bicol Region)",
  "Western Visayas" = "Region VI (Western Visayas)",
  "Central Visayas" = "Region VII (Central Visayas)",
  "Eastern Visayas" = "Region VIII (Eastern Visayas)",
  "Zamboanga Peninsula" = "Region IX (Zamboanga Peninsula)",
  "Northern Mindanao" = "Region X (Northern Mindanao)",
  "Davao Region" = "Region XI (Davao Region)",
  "SOCCSKSARGEN" = "Region XII (SOCCSKSARGEN)",
  "National Capital Region" = "National Capital Region (NCR)",
  "Cordillera Administrative Region" = "Cordillera Administrative Region (CAR)",
  "Caraga" = "Region XIII (Caraga)",
  "BARMM" = "Bangsamoro Autonomous Region In Muslim Mindanao (BARMM)",
  "MIMAROPA Region" = "MIMAROPA Region"
)
rgdpl$region <- dplyr::recode(rgdpl$region, !!!region_mapping)
setdiff(sp_df$region, rgdpl$region)  # Should be (0)
## character(0)

Now, we can merge.

rgdp_map<-sp_df %>% 
  left_join(rgdpl, by = "region")
setdiff(rgdpl$region, sp_df$region)
## character(0)
#remove unnecessary columns and mutate GDP to billions
rgdp_map <- rgdp_map %>%
  select(region, Year, GDP, geometry) %>%
  mutate(GDP = GDP / 1e9)

9.7.0.1 Choropleth Map

A Choropleth Map is a type of thematic map where regions are color-coded based on a statistical value (e.g., GDP, population, unemployment rate, etc.). It is commonly used in geographic data visualization to show spatial variations across different regions. For the static map, let us choose the latest year, 2023;

rgdp2023<-rgdp_map %>% 
  filter(Year==2023)
library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor
# Modify the Choropleth Map
ggplot(rgdp2023) +
  geom_sf(aes(fill = GDP), color = "white") +  # Fill regions based on GDP
  scale_fill_viridis_c(
    option = "magma", 
    name = "GDP (2023)", 
    labels = label_number(accuracy = 0.1, suffix = "B", big.mark = ",")  # Fix rounding issue
  ) +  
  labs(title = "Regional GDP of the Philippines (2023)") +
  theme_minimal()

#Saving the images
#ggsave("rgdp2023.jpg", plot = last_plot(), width = 10, height = 8, dpi = 300)
#or plot = nameofplot, width and height are in inches and for high quality image, dpi = 300

9.7.0.2 Bubble Map

A Bubble Map is a type of thematic map that represents data values using circles (bubbles) placed over specific locations. The size of the bubble corresponds to the magnitude of the variable being visualized (e.g., GDP, population, sales, etc.). We use 2023 again.

The first step that we need to do is calculate the centroids using st_centroid from the sf package because this calculates the geometric center of each polygon region in the shapefile. These will be used as our bubble locations and to ensure that there will be no overlapping with borders.

phr_centroids<-st_centroid(rgdp2023)
## Warning: st_centroid assumes attributes are constant over
## geometries
print(phr_centroids)
## Simple feature collection with 17 features and 3 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 119.8902 ymin: 6.580654 xmax: 125.8263 ymax: 17.35385
## Geodetic CRS:  WGS 84
## First 10 features:
##                             region Year       GDP
## 1         Region I (Ilocos Region) 2023 0.7014687
## 2       Region II (Cagayan Valley) 2023 0.4470732
## 3       Region III (Central Luzon) 2023 2.3184283
## 4         Region IV-A (CALABARZON) 2023 3.0952557
## 5          Region V (Bicol Region) 2023 0.6103399
## 6      Region VI (Western Visayas) 2023 1.0242699
## 7     Region VII (Central Visayas) 2023 1.3811721
## 8    Region VIII (Eastern Visayas) 2023 0.5235611
## 9  Region IX (Zamboanga Peninsula) 2023 0.4473210
## 10    Region X (Northern Mindanao) 2023 0.9847503
##                     geometry
## 1  POINT (120.4827 16.90283)
## 2  POINT (121.7316 17.20396)
## 3   POINT (120.8223 15.3919)
## 4   POINT (121.568 14.16225)
## 5  POINT (123.4734 13.27216)
## 6   POINT (122.652 10.84319)
## 7  POINT (123.6147 9.921139)
## 8  POINT (124.9534 11.53574)
## 9  POINT (122.8477 7.825902)
## 10 POINT (124.6742 8.187355)

We now create our Bubble Map

ggplot()+
  geom_sf(data=rgdp2023, fill = "gray", color="white")+#base map
  geom_point(data = phr_centroids, 
             aes(x = st_coordinates(geometry)[,1], 
                 y = st_coordinates(geometry)[,2], 
                 size = GDP, 
                 color = GDP), 
             alpha = 0.7) + 
  scale_size(range = c(2, 15), name = "GDP (2023)") +  # Adjust bubble size
  scale_color_viridis_c(option = "magma", name = "GDP (2023)", labels = label_number(accuracy = 0.1, suffix = "B", big.mark = ",")) + 
  labs(title = "Regional GDP of the Philippines (2023)",
       subtitle = "Bubble Size Represents GDP",
       x="Coordinates",
       y="Coordinates")+
  theme_minimal()

As you can see, the base map is drawn using geom_sf() then the bubbles represent the GDP values. The bubble size and color represent GDP magnitude.

You will notice that ggplot() is empty, this creates an empty ggplot canvas where the map layers will be added using the geom_sf.

In the aes there’s x = st_coordinates(geometry)[,1] same for y. These extract the longitude and latitude of each centroid.

The scale_size() range sets the bubble sizes between 2 and 15 units but you can edit this.

Now, scale_viridis_c has more things, the name sets the legend title for GDP colors. the labels ensures decimal precision and adds suffix B, plus, adds comma separators since we do not want the legend to contain scientific notations.

9.8 Animated Map

9.8.0.1 Animated Philippine Regional GDP Choropleth Map

We will now make an animated version of the Regional GDP Choropleth Map since we have data from 2000 to 2023. Let’s convert to billions like what we did for the 2023.

It takes really long to render (it depends on the memory you have in your laptops) so, I will place the best strategy, that is creating static maps and combining them to form a video. Notice that when cleaning the environment, I chose to keep rgdp_map since we will still use this.

rm(list = setdiff(ls(), "rgdp_map"))
gc()
##            used  (Mb) gc trigger   (Mb)  max used   (Mb)
## Ncells  3699371 197.6    6548841  349.8   6548841  349.8
## Vcells 34287981 261.6  155411355 1185.7 202265969 1543.2
#for mp4
if (!require(av)) install.packages("av")
## Loading required package: av
library(av)

We need to create a directory within our working directory where the images will be saved.

dir.create("ch_gdp_frames", showWarnings = FALSE)

We now generate static maps; this requires us to create a function/loop. When we use the av, the default is a black setting so it’s better to have a white background set for the plot and the entire canvas.

for(year in 2000:2023){
  #filter data for the specific year
  data_year<-rgdp_map %>% filter(Year==year)
  
  #Generate the plot
  ip<-ggplot(data_year) +
  geom_sf(aes(fill = GDP), color = "white") +
  scale_fill_viridis_c(
    option = "magma",
    name = "GDP",
    labels = scales::label_number(accuracy = 0.1, suffix = "B", big.mark = ",")
  ) +
  labs(
    title = paste("Regional GDP of the Philippines"),
    subtitle = paste("Year:", year),
    fill = "GDP (Billion $)"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold"),
    plot.subtitle = element_text(size = 14),
    panel.background = element_rect(fill = "white", color = NA),  # Set white background
    plot.background = element_rect(fill = "white", color = NA)   # Set white background
  )

  
  # Save the plot as an image
  ggsave(filename = paste0("ch_gdp_frames/gdp_", year, ".png"), plot = ip, width = 8, height = 6, dpi = 300)
}

You need to edit some parts, such as the 2000:2023 with whatever years you have; you need to filter data depending on how you called the time column. Then edit the fill and other things.

We now combine our static maps into MP4. If you want the transition for each year to be seen clearly, the framerate is set at 1 to have 1 frame per second and the frame is repeated twice. the vfilter makes the video HD

file_list <- list.files("ch_gdp_frames", pattern = "gdp_.*\\.png", full.names = TRUE)
rep_files <- rep(file_list, each = 2)  # Repeat each frame twice
av::av_encode_video(
  input = rep_files,
  output = "ch_rgdp_time.mp4",
  framerate = 1,
  vfilter = "scale=1280:720"
)

9.9 Practical: Advanced Visualizations

For the practical, you need to search and clean the dataset on your own. Please search in PSA OpenStat: Number of Registered Live Births by Sex and by Usual Residence of Mother (Region, Province and Highly Urbanized City), Philippines: January - December 2013-2022. Only gather regional data since the shapefile that is given for this practical is for Regions only. Please save each animation using anim_save

  1. Using PSA OpenStat (2013-2022) data, create an animated bar chart showing the total number of live births per year. Differentiate Male vs. Female births using fill
  2. Create an animated bar chart that compares the number of live births per region over time (2013-2022). Use transition_states(year, wrap = FALSE) to animate regional birth counts changing over time. Determine the region with the highest live births in each frame.
  3. Select 3-5 regions and visualize their live birth trends (2013-2022) using an animated time series plot.
  4. Using PSA OpenStat (2013-2022) data and a Philippines regional shapefile, create a static choropleth map of the recent number of live births per region. Color the regions based on birth count
  5. Using PSA OpenStat (2013-2022) data and a Philippines regional shapefile, create an animated choropleth map showing how the number of live births per region changes over time. Color the regions based on birth count
  6. How do birth trends differ across Philippine regions?
  7. Which regions have experienced the highest increase or decrease in live births from 2013 to 2022?
  8. How does using gganimate with sf enhance your ability to analyze spatial(and time) data trends?