In this take-home exercise 2, I would apply relevant interactivity and animation methods to create an interactive data visualisation with RStudio.
This take-home exercise aims to explore the use of interactivity and animation methods to create an interactive age-sex pyramid based data visualisation with R. By allowing the graphics to be interactive, the changes of demographic structure of Singapore by age cohort and gender at planning area level would be demonstrated clearly to the viewers.
For this exercise, the data sets used are Singapore Residents by Planning Area / Subzone, Age Group, Sex and Type of Dwelling, June 2000-2010 and June 2011-2020, which could be downloaded from Department of Statistics home page.
Before processing to the next section, please use the code chunk below to install and launch the required packages in RStudio.
packages = c('ggiraph', 'plotly',
'DT', 'patchwork',
'gganimate', 'tidyverse',
'readxl', 'gifski', 'gapminder')
for (p in packages){
if (!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
To achieve data visualisation at the planning area level, a proposed sketch is created as below.

Using read_csv() of readr package, we first read the two sets of data (namely, Singapore Residents by Planning Area / Subzone, Age Group, Sex and Type of Dwelling, June 2000-2010.csv and Singapore Residents by Planning Area / Subzone, Age Group, Sex and Type of Dwelling, June 2011-2020.csv) used for this exercise into tibbles.
population_1 <- read_csv("Data/respopagesextod2000to2010.csv")
population_2 <- read_csv("Data/respopagesextod2011to2020.csv")
tbl_df(population_1)
# A tibble: 1,040,592 × 7
PA SZ AG Sex TOD Pop Time
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 Ang Mo Kio Cheng San 0_to_4 Males HDB 1- and 2-Room … 20 2000
2 Ang Mo Kio Cheng San 0_to_4 Males HDB 3-Room Flats 480 2000
3 Ang Mo Kio Cheng San 0_to_4 Males HDB 4-Room Flats 220 2000
4 Ang Mo Kio Cheng San 0_to_4 Males HDB 5-Room and Exe… 80 2000
5 Ang Mo Kio Cheng San 0_to_4 Males HUDC Flats (exclud… 0 2000
6 Ang Mo Kio Cheng San 0_to_4 Males Landed Properties 0 2000
7 Ang Mo Kio Cheng San 0_to_4 Males Condominiums and O… 0 2000
8 Ang Mo Kio Cheng San 0_to_4 Males Others 0 2000
9 Ang Mo Kio Cheng San 0_to_4 Females HDB 1- and 2-Room … 20 2000
10 Ang Mo Kio Cheng San 0_to_4 Females HDB 3-Room Flats 390 2000
# … with 1,040,582 more rows
tbl_df(population_2)
# A tibble: 984,656 × 7
PA SZ AG Sex TOD Pop Time
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males HDB 1… 0 2011
2 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males HDB 3… 10 2011
3 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males HDB 4… 30 2011
4 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males HDB 5… 50 2011
5 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males HUDC … 0 2011
6 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males Lande… 0 2011
7 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males Condo… 40 2011
8 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males Others 0 2011
9 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Females HDB 1… 0 2011
10 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Females HDB 3… 10 2011
# … with 984,646 more rows
As the both data sets have the same number of columns and the same column names, we could merge the two data sets int one by using rbind() fucntion. This function enables the users to stack the two data frames on top of each other, appending the second data frame to the first.
population <-rbind(population_1, population_2)
tbl_df(population)
# A tibble: 2,025,248 × 7
PA SZ AG Sex TOD Pop Time
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 Ang Mo Kio Cheng San 0_to_4 Males HDB 1- and 2-Room … 20 2000
2 Ang Mo Kio Cheng San 0_to_4 Males HDB 3-Room Flats 480 2000
3 Ang Mo Kio Cheng San 0_to_4 Males HDB 4-Room Flats 220 2000
4 Ang Mo Kio Cheng San 0_to_4 Males HDB 5-Room and Exe… 80 2000
5 Ang Mo Kio Cheng San 0_to_4 Males HUDC Flats (exclud… 0 2000
6 Ang Mo Kio Cheng San 0_to_4 Males Landed Properties 0 2000
7 Ang Mo Kio Cheng San 0_to_4 Males Condominiums and O… 0 2000
8 Ang Mo Kio Cheng San 0_to_4 Males Others 0 2000
9 Ang Mo Kio Cheng San 0_to_4 Females HDB 1- and 2-Room … 20 2000
10 Ang Mo Kio Cheng San 0_to_4 Females HDB 3-Room Flats 390 2000
# … with 2,025,238 more rows
To illustrate the demographic structure change of Singapore by age cohort and gender at planning area level across years, 5 columns would be kept by using select() function: Planning Area, Age Group, Sex, Population and Time. Then, we would preserve order of appearance of the age group (initially ranged from smallest to largest) and fix the order by using factor() function.
Then data set would be grouped by Planning Area, Age Group, Sex and Time. To compute the total population for each of the group, use summarise() to sum up the population count. Then we arrange the data by PA, Age Group and Sex, followed by ungrouping the data for future operations on the grouping variables.
population_sorted <-
population %>%
group_by(PA, AG, Sex, Time) %>%
summarise(Count = sum(Pop)) %>%
arrange(PA, AG, Sex) %>%
ungroup()
tbl_df(population_sorted)
# A tibble: 43,890 × 5
PA AG Sex Time Count
<chr> <ord> <chr> <dbl> <dbl>
1 Ang Mo Kio 0_to_4 Females 2000 4460
2 Ang Mo Kio 0_to_4 Females 2001 4320
3 Ang Mo Kio 0_to_4 Females 2002 4220
4 Ang Mo Kio 0_to_4 Females 2003 4010
5 Ang Mo Kio 0_to_4 Females 2004 3870
6 Ang Mo Kio 0_to_4 Females 2005 3640
7 Ang Mo Kio 0_to_4 Females 2006 3600
8 Ang Mo Kio 0_to_4 Females 2007 3670
9 Ang Mo Kio 0_to_4 Females 2008 3850
10 Ang Mo Kio 0_to_4 Females 2009 4120
# … with 43,880 more rows
To better illustrate the demographic change at planning area levels, a sum of five planning areas (Bedok, Clementi, Punggol, Novena and Woodlands) would be selected from five regions respectively (North, North-East, East, West and Central Region) by using filter() function.
population_filtered <- population_sorted %>%
filter(PA == "Bedok" |
PA == "Clementi" |
PA == "Novena" |
PA == "Punggol" |
PA == "Woodlands" )
Last step is to use mutate() function and ‘if_else’ to modify values in the ‘Count’ column by changing the signals of males (+) and females (-). This will lead to males at the right side and females at the left side of the plot.
population_filtered <- population_filtered %>%
mutate(Count = if_else(Sex == 'Males', -Count, Count))
In this case, we use geom_bar() again to create a bar with ‘Count’ column as the initial y-axis. The explanation for the rest of the functions are listed below for your reference:
coord_flip(): to flip the cartesian coordinates so that horizontal becomes vertical, and vertical becomes horizontal.
scale_fill_manual(): to change legend labels and colors.
scale_y_continuous(): to customize the y-axis (which is now at the bottom of the graph) breaks and labels.
labs():to set x and y-axis labels, add in titles and subtitles.
theme_cowplot(): to set a cowplot theme from cowplot package.
As our data set comprises of population count from five planning areas, it is helpful to use the facet() function to generate small multiples (trellis plots), each displaying a different subset of the data.
facet_wrap(), which wraps a 1d sequence of panels into 2d, is used instead of facet_grid() in this case as it allows a better use of screen space as the displays of the population pyramid are rectangular in shape. ‘PA’ serve as the parameter which organise the data into different subplots and the ‘n_row’ parameter specifies the number of rows in a facetted plot.
g <- ggplot(population_filtered, aes(x = AG, y = Count, fill = Sex)) +
geom_bar(stat = "identity", width = 0.5) +
coord_flip() +
scale_y_continuous(n.breaks=12, labels=abs) +
labs(x = "Age", y = "Population", title = 'Age-Sex Pyramid of Singapore',
subtitle = 'Year: {closest_state}',
caption='Source: Singapore Department of Statistics') +
scale_fill_manual(values=c("darkred", "steelblue"),
name='',
breaks=c("Males", "Females"),
labels=c("Males", "Females")) +
cowplot::theme_cowplot() +
theme(axis.text=element_text(vjust=0.5, size = 8),
panel.grid.major.y = element_line(color='lightgray',linetype='dashed'),
legend.position = 'top',
legend.justification = 'center') +
facet_wrap(~ PA, nrow = 3)
To make the age-sex population pyramid more interactive, we could utilise the transition_states() function to show the transition between several distinct stages of the data, in this case, the demographic structure change across the years. ‘Time’ is the column which holds the state levels in the data. The ‘transition_length’ parameter is used to indicate the relative length of the transition and it is set to 1 to indicate interval of years. This parameter will be recycled to match the number of states(e.g. years) in the data. The ‘state_length’ parameter is the relative length of the pause at the states, which will also be recycled to match the number of years.
The enter_() and exit_() function are used to define how new data should appear and how old data should disappear during the animated process. The modification type is set to ‘fade’ as this would reset alpha(transparency of the bar) to zero, making the elements fade in or out during the transition immediately.
The ease_aes() function defines how different aesthetics should be eased during transitions. For smoother appearance
Last but not least, to animate the graph, the animate() function is used to take a gganim object and render it into an animation. The nature of the animation is dependent on the renderer, but defaults to using gifski to render it to a gif. Using various parameters, other details including height and width, fps(number of frames per second), duration (length of the animation), end pause (set rewind = TRUE so that pause at the end for some time and then rewind), res (resolution) could be specified.
g <- g +
transition_states(Time, transition_length = 1, state_length = 1) +
enter_fade() +
exit_fade() +
ease_aes('cubic-in')
animate(g, height = 1000, width = 1000,
fps = 20, duration = 15,
end_pause = 15, rewind = TRUE,
renderer = gifski_renderer('age_sex_pyramid.gif'), res = 100)
