[Site Top] | Demography (in Japanese) | [in Japanese]

How to make population pyramid

Latest update on 28th August 2019

The purpose of this page is to explain how to easily make population pyramid using free software. Basically I explain it in Windows environment, but the all free softwares used here can run on not only Windows but also Mac OS X and Linux, so that the explanation may be applicable to Mac and Linux environment, too.

Required softwares and how to install those

The following 3 softwares are needed.

  1. LibreOffice: It's office software (available from here), but the needed components to make population pyramid are Calc (spread sheet) and optionally Draw (vector graphics editor). If you already have Microsoft Office in your computer, of course you can use Excel. If you don't like LibreOffice, other free spreadsheet software such as Gnumeric Portable or EtherCalc can be used.
  2. R: Now it's very famous as statistical data analyzer and/or script language, but it has powerful functions to make graph by combination of low-level graphic elements (available from CRAN). In addition, the graph made by R can easily be exported as raster graphics format such as jpeg and png, and also as vector graphics format such as pdf and windows metafile. To install it, you may download the suitable version of R for your OS from CRAN, and execute it. Further explanation is given within this site (in Japanese).
  3. The pyramid package: I developed R package specifically to make population pyramid. The development process is described here (in Japanese) (You can download pyramid_1.5.zip here). To install the pyramid package to your R, if internet connection is available on your computer, just type install.packages("pyramid") and press enter key to R console's prompt. If your computer is not connected to the internet, you may copy the pyramid_1.5.zip to somewhere in your computer and type utils:::menuInstallLocal() to prompt, then select the pyramid_1.5.zip.

Data Sources

You may use your own data based on hand-made census or aggregated data provided by government or other official institutions. Nonetheless, the data needed to make population pyramid is always sex-and-age specific population.

Basic procedures to make population pyramid (1)

First of all, I explain how to make population pyramid for already aggregated official statistics of Japan, using the function pyramid() or pyramids().

You need the sex/age class-specific population, the periods of age-classes, and axis information (if axis is not specified, automatically calculated from maximum value). After downloading the file from e-Stat or other web site, you have to make the text file extracted those information from original file. For that purpose, one of the easy ways is described below, using an example of Nagno prefecture Japan.

  1. The census data for Nagano prefecture 2015 is available from Census 2015 > Basic summary tables > Prefecture result > 20 Nagano. You can download 3-1 Age (for each years old) and sex specific population in CSV format. Run Calc of LibreOffice from task tray, then drag and drop the downloaded CSV file. You may find a dialogue to specify options. If the Japanese characters are shown improperly, you should change the character code pages and specify comma instead of tab as field separater. After reading, column B is classification of population, where the combimation of sex category (males and females, males, females) and nationality (total or only Japanese) is given as 3 digits code. To use total males, the row of 201 should be used, and to use total females, the row of 301 should be used. The column E indicates the classification of regions, 4 kinds [whole prefecture "ĹÌûÔ©", DID area only, town area, rural area]. For whole prefecture, select the row of "ĹÌûÔ©". The 12th row indicates age, where column G is for age 0, column DM is for 110 years or older, whereas column DN is for people with unknown age. How to treat unknown age is difficult issue, but I simply omit those from drawing here.
  2. Next, you may select [File (F)] and make new worksheet. Type "Age" into A1, "0" into A2, "1" into A3, and select A2 and A3, you may find black dot in bottom right of A3, then drag it downward to A112. A4 to A112 is automatically filled by sequential integers. Next, type "Males" into B1 and go back to the previous worksheet file, then select from G21 to DM21 and copy those to clipboard. Then go to the new worksheet again, right click B2 and paste special with transposing the copied area. Next, type "Females" into C1 and go back to the previous worksheet again, select from G29 to DM29 and copy to clipboard, then go to the new worksheet again, right click C2 and paste special with transposing the copied area. In this data several cells with no individual are filled by "-", but it has to be replaced by "0". If you forget it, when you read this worksheet as text file, both variables of Males and Females become factor type, which lead you to fail to make the pyramid graph.
  3. Select [File (F)] and [Save], then set "text CSV" as file type, then save the file with appropriate name. When you see the dialogue to ask the filed option, you should select {tab} for field separator. Then you can make the text file for R such as nagano-table-2015.csv. After that, you can close Calc.
  4. Run R, then type library(pyramid) to the prompt. Of course, before that, you have to install the pyramid package. Next, read the data from the file such as,
    x<-read.delim("e:/work/nagano-table-2015.csv")
    . If you skip to make the text file from e-Stat, you can simply execute
    x<-read.delim("http://minato.sip21c.org/demography/nagano-table-2015.csv")
    After reading data to the object x, you can make the population pyramid of Nagano prefecture by executing
    pyramids(Left=x[,2], Right=x[,3], Center=x[,1], Cstep=10)

If you just want to make the population pyramid, that's all. If you need more sophisticated graph, you can use LibreOffice Draw to edit it by the following process.

  1. Right click mouse at anywhere in the window where the pyramid is shown, select "copy as metafile", the graph is stored in the clipboard as windows metafile (.emf format). After that, you can close R by typing "q()" to R console. To the query "Save workspace image?", you can select "No".
  2. Then, run Draw from tray icon in taskbar. You may see a new slide. Paste the pyramid from clipboard (From the menu [Edit] and [Paste]). Right-click on the graph and select "Split", then you can edit each graphic element of the pyramid. You can move any graphic element, add text, and remove unnecessary component. To align several elements evenly distributed, you can apply the context menu (from right-click), "Distribute" (I'm not sure fo this item name in English, because I use Japanese version of LibreOffice).
  3. After editing, you can export this graph using (File > Export) as many kinds of grphic file such as PNG, JPEG, and PDF.

Without additional editing, it is possible to make a nicelook pyramid only using R with pyramid package. For example, type as follows (nagano-pyramid-2015.R).

# x <- read.delim("http://minato.sip21c.org/demography/nagano-table-2015.csv")
x <- read.delim("./nagano-table-2015.csv")
library(pyramid)
png("nagano-pyramid-2015-en.png", width=672, height=672)
par(family="sans", mar=c(2,3,2,2))
pyramid(data.frame(M=x[,2], F=x[,3], A=x[,1]), 
 Clab="", Llab="Males", Rlab="Females", Cstep=10, AxisFM="d",
 main="Population structure of Nagano, Japan\n (Census 2015)")
dev.off()

Then, you can get the following graph.

Population structure of Nagano, Japan (Census 2015)

Basic procedures to make population pyramid (2)

Next, I will explain how to make population pyramid from the raw data of small scale population census.

Two population pyramids from the small scale population's individual data

Draw many population pyramids in a single device

Above example showed to draw 2 population pyramids within a single device, here let's see how to draw 8 population pyramids of Gunma prefecture, Japan from 1980 to 2015 by every 5 years. The Excel file, da03.xls can be downloaded from e-Stat. This excel file contains each prefecture in seperated worksheet, in which 10th sheet is for Gunma prefecture. Of course, LibreOffice Calc can read the Excel-format file, so that you can manually open da03.xls by Calc and delete unnecessary rows and columns (you also need to delete commas, and to replace "-" by 0) and save as text file and read it from R. However, easier way is using read_xls() function included in readxl package developed by Dr. Hadley Wickham and others.

In the worksheet, row 7 is "0-4 years old", row 24 is "85 years old and older", columns E and F are males and females in 1920, columns L and M are males and females in 1925. Columns CK, CL, CR, CS, CY, CZ, DF, DG, DM, DN, DT, DU, EA, EB, EH, and EI are males and females in 1980 to 2015. By properly specifying the options sheet=¡¤range=¡¤col_names=¡¤col_types=, read_xls() function can read the data with selecting ranges and formats.

In my understanding, read_xls() function cannot directly read the file via internet, you have to put the downloaded da03.xls in your working directory.

I don't explain the coding in detail, but when you run gunmapyramids.R on R, you may find the graph below in PNG format.

Population pyramids in Gunma, Japan\n from 1980 to 2015.

Applied usage: comparison of 2 populations' specs

As written above, in the population pyramid graph, usually age-specific population structure of males and females of the one population are drawn left and right with opposite direction. However, it's also possible to align 2 different populations (or age-specific number of deaths) in left and right to compare those 2 populations.

For instance, let's see the example in Preston SH et al. (2001) Demography. Blackwell Pub. p.22. That compares the age-specific population and death of females in 1992 in Sweden and Kazakhstan.

Age
Group
Population
in Sweden
Death in
Sweden
Population
in Kazakhstan
Death in
Kazakhstan
0597272791740783720
1-4229775427547581220
5-924517231879129396
10-1424011033808510298
15-1926495761720161561
20-2428717687622988673
25-2931111198733057752
30-34280991140732312965
35-392868991976128251113
40-443082383624879961405
45-493201726432847991226
50-542422307385036082878
55-592107859723018793266
60-6421605816403743175212
65-6922447927522562476866
70-7422257845091546236182
75-7918410267451499178199
80-841406679587887169013
85+110242173405894010627

The tab-delimited text file of this table is deaths.txt. You can read this data and draw population, number of deaths, and age-specific mortality of Sweden at left panel and Kazakhstan at right panel, respectively, using comp2pop.R. The graph is drawn in a file with PNG format as below.

Compare age-structures between Sweden and Kazakhastan

Draw the population pyramid with enhancement for specific ages by different colors

The fmsb package (another package I developed and registered in CRAN) includes the data frames Jpop and Jpopl, extracted from Japan's population census (In Jpop, highest open-ended age-class is 85+ and in Jpopl, it is 110+). Variables are Age (caution! Not integer but factor, because open-ended age is given by "85+" or "110+") in the first column, and M1888 to F2015 from the second column to end (those variables are composed of the first character indicating sex of M/F and the following 4 digits indicating year). If you use those data frames, the national population pyramids in Japan for all years when national population cenus has ever been conductd can be drawn easily. National population in Japan is so large that numbers shown in axis is too large. I recommend that devide the populations by 10000 and give notation in title as "(Unit: 10,000)".

The options in pyramid() to specify colors to fill the bars, Lcol and Rcol for left and right respectively, can be given as vector. By doing so, bars of specific age-classes can be filled by different colors.

Enhancing the ages of the targets of Maternal and Child Health by different colors, then align 2 population pyramids of just after WWII and recent by the code of pyramid2mch.R. You may get the graph below.

Comparison of MCH targets in Japan between just after WWII and recent one.

Make animation GIF of Japan's population pyramids

Other required software

To make animation GIF, Giam can be used to integrate png files which is made by R.

To make png files with fixed size

for () loop can be used to automatically make population pyramids of all years. To give titles including variable years, sprintf() function is useful. The code make-all-pyramids-japan.R includes the process of reading data and automatically making png files of population pyramids for all years (When you execute this code on R, many png files are automatically generated in the working directory).

Integrate png files to animation GIF using Giam

Run Giam, select [File] menu and [Insert scenes]. Then, select all png files and click [OK]. There are many options, but I only changed waiting time from 10/100 sec to 20/100 sec, then save as GIF. You may find the file below.

Changes of Japan's population pyramids

Nonetheless, it's much better to use animation package now. Please see the presentation at Japanese R users meeting in 2012 (in Japanese)

Add population pyramid within a specified frame

As I wrote in my journal on 4th September 2014 (in Japanese), I made and added a new function pyramidf() to draw population pyramid within a specified frame. This function is almost samely used as pyramid(), but new option "frame=" is added to specify the bottom left and top right coodinates by the following order c(left-x, right-x, bottom-y, top-y). The default value set is frame=c(-1.15, 1.15, -0.05, 1.1), which is same as given by pyramid(data.frame, Cgap=0.3, ...). You can see an example of 2 population pyramids of Gunma prefecture, Japan (in 2005 and 2010) overlayed by typing library(pyramid); example(pyramidf).

Applied usage 1: Overlay population pyramids of each ward on the map of Kobe city

(Note: the explanation below is not sophisticated. I should revise it only to use R packages, without EpiInfo. Please suggest me how to do it.)

The R code kobe-plot.R is to overlay population pyramids of each ward on the map of Kobe city. It's somewhat complicated. You have to install maptools package. Then, you have to download the shape file of Hyogo prefecture as N03-140401_28_GML.zip from the web site of GIS homepage (probably chiiki_hani.zip from Kobe city's web site, or Japan' shapefile from ESRI Japan are also available, but I don't know how to use those for this purpose).

In the case of N03-140401_28_GML.zip, after extraction of all files from the zip archive, you may use EpiMap software (included in Free software EpiInfo for windows provided by CDC) to read the shape file and use AddLayerPartial function to limit the area to Kobe city, then save the geographic information as kobe-city.shp.

In the same directory with the shape file, you may place population census data by ward in Kobe city in 2010, which has been obtained from the census result page of Kobe city, then manually edited and saved as tab-delimited text file. Then you may place this code in the same directory and specify working directory there by setwd(), then type source("./kobe-plot.R"). You may find the map below (the map is saved as low resolution raster graphics, thus I think the special permission from MLIT is unnecessary).

Population pyramids of each ward in 2010 overlayed on the map of Kobe city

Applied usage 2: Overlay prefecture-specific population pyramids on the map of Kansai area, Japan

The mapdata package (available from CRAN) includes prefecture-level geographic information for Japan. Therefore, when you prepare each prefecture's sex/age-specific population data, overlay prefecture-specific population pyramids on the map of Kansai area is relatively easy, by this code (Note: it includes Japanese characters, so that you may need to specify appropriate code page). Of course, you have to install.packages("mapdata", dep=TRUE) in advance.

Prefecture-specific population pyramids in 2015 on the map of Kansai

Applied usage 3: Overlays population pyramids of Aomori, Tokyo, Nagano, Okinawa prefectures on the map of Japan and record it as an animation GIF

From the excel-data of prefecture-specific population estimates for every 5 year age-groups (explained above) and Japan's map data on mapdata package ("japan"), it's possible to overlay the prefecture-specific population pyramids on the map of Japan. However, when I draw the population pyramids of all 47 prefectures, it's too ugly, and thus I draw only 4 prefectures' population pyramids of Aomori, Tokyo, Nagano and Okinawa. In addition, the data include census-based population in 2015 and projected populations by every 5 years until 2045, so that I will show the change of the shapes of population pyramids as animation.

When you use animation package, you have to pay attention to several points. At first, the size of picture has to be specified by ani.options() function. Second, if you would like to make mp4 movie file, you may use saveVideo() function, when you need to install ffmpeg in advance and give the full path as an option in saveVideo() function. Third, and most important, only the graph drawing whole screen with setting axes can be recorded in animation.

Therefore, if you first draw map by map("japan") and insert prefecture-specific population pyramids by pyramidf() funtion followed by short-time pause by ani.pause(), and if, except for population pyramid in 2045, fill the pyramid by white rectangle by rect() and draw next population pyramid, only the last graph (map with 4 pyramids in 2045) is recorded in animation file.

Thus, you have to repeat the whole process of drawing the map and overlaying each pyramid within the animation recording function such as saveGIF() or saveVideo().

This code gives the drawing below (MP4 movie is also available).

Population pyramids of Aomori, Tokyo, Nagano, and Okinawa on the map of Japan from 2015 to 2045

Draw population pyramids of world countries/regions

As noted in my journal, the data in 2019 of World Population Prospects is already compiled as the package "wpp2019", which is already available from CRAN. Thus I wrote the R code to select country/region and year from wpp2019 and draw population pyramid.

As the next step, I enabled multiple selection (Ctrl+click) of countries/regions and years (1950 to 2020) from wpp2019 by this code. At first, automatic setting of axis was ugly, but Ura (back side) RjpWiki taught me of the function pretty() (this entry by hoxo_m is excellent information about pretty()), I changed the algorithm for auto-set of axis in pyramid() from simple same-width division into 5 parts to pretty()'s setting. Of course, when you manually specify the axis by Laxis= (and Raxis=), you can still give axis as you set. pyramid_1.5.zip or pyramid_1.5.tar.gz can be downloaded from this site, or you can install it from CRAN.

I give an example of the population pyramids of Brazil, China, France, Japan in 1950, 1985, 2020.

Population pyramids of Brazil, China, France, Japan in 1950, 1985, 2020

In addition, after the registration of version 1.5 of pyramid package to CRAN, Prof. Okumura in Mie Univ. taught me of the color of statistical graphs. Prof. Okumura suggested that the default color of graphs should be selected from base-colors of universal design. I didn't pay much attention to default colors (actullay the color of grid line is fixed as "blue", unchangeable), but I agree with the suggestion by Prof. Okumura, so that I will incorporate the change of default color setting and more flexible color settings in version 1.6. As base color of universal design, males should be filled by #bfe4ff" and females by "#ffcabf". I made the same graph as mentioned above with these color settings by this code. The graph is generated as shown below.

Population pyramids of Brazil, China, France, Japan in 1950, 1985, 2020 (filled by the base colors of universal design

Overlay the population pyramids on the world regional map

Combination of mapdata package, wpp2019 package, and the frame version population pyramid of pyramid package enables to overlay population pyramids on the world map without external shapefile nor population data.

Possible problems may involve the differences of countries/regions name between mapdata package and wpp2019 package. For instance, Vietnam is given the name of "Vietnam" in mapdata package, but "Viet Nam" in wpp2019 package. When you make R code, you have to pay attention to such difference. Except this, algorithm is relatively simple. The code to overlay the population pyramids on the map of South-East Asian countries gives the figure below.

Population pyramids on the map of South-East Asian countries

Overlay population pyramids of US states on the map

US states map is included in the maps package. Sex/age-specific data for each state of USA (CSV format) in US Census Bureau can be read by read.csv() function of R. Therefore, you can easily combine those. By running us-states.R, you can get the graph below.

Population pyramids of US 7 states on the map

Correspondence to: minato-nakazawa[atmark]umin.net.

For link and cite here (in Japanese)