PS 3780 Data Literacy & Visualization

Assignment 4
PS 3780 Data Literacy & Visualization, Fall 2019
Due Date: Thursday, October 31, 2019 at 11:59 p.m.
Please write complete sentences to answer these questions and include R command
you have used in one .pdf file (use the “save as” function in most word processors). Be
sure to include your name, your teammate’s name if there is anyone, and the assignment
number. Submit the file to Carmen by the due date.
Part I: Hans Rosling Boxplot
Find the health-wealth.csv file on Carmen (which contains the variables examined by
Hans Rosling in his talk: per-capita GDP, life expectancy, total population, and region
for every country in the world in 2010 that we used before) and load it to R. Produce
a figure of five boxplots to show the variation in life expectancy across 5 of the 7 regions
in the data: East Asia & the Pacific, Europe & Central Asia, Latin America &
the Caribbean, Middle East & North Africa, and North America (make sure to display
region names below the horizontal axis). Describe the variation across regions by using
the terms of mean, maximum, minimum, and quartiles (3 pts).
Some hints:
1. Begin by subsetting the Rosling data to include only the 5 regions listed above,
using the subset() function. Create a new data frame with the subsetted data,
and use this new data frame to create your boxplot.
2. Create a boxplot using the function boxplot(). Use the "~" symbol to divide the
boxplots up by region number, e.g. "life.expectancy ~ region".
3. You can add your own axes labels to the x and y axes by setting "axes=FALSE" in
the plot command (boxplot() in this case), and by then designing x and y axes
using the axis() command. The most important step here is to specify the axis
(either 1 or 2), to list the values for tick marks (at=c(1,2,3,etc.)) and then to
list the labels by name (labels=c(), with a list of the names of the regions in the
parentheses).
4. Your final boxplot should have 5 boxes, one for each of the regions, and each of these
boxes should be labeled by region name (not number) along the x-axis. Your graph
should also have a descriptive title, and informative labels for the x and y axes.
1
Finally, it should include a horizontal line indicating the median life expectancy
across the 5 regions included in the plot.
Part II: API and World Bank
Apply World Bank API to extract female life expectancy data. Display the data of all
countries from 1970 to 2015, and highlight the United States and the rest of the World in
different colors. Make sure that you write one short paragraph to describe the plot and
that the plot has labels of axes and a title. (4 pts).
Some hints:
1. This assignment follows Lecture 16a fairly closely.
2. Use WDI() command from WDI package to implement the World Bank API, and set
indicator = “SE.SCH.LIFE.FE” in the parentheses. You can also truncate data by
setting “start = ” and “end = ”.
3. Use xyplot() command from lattice package to display space-time variations.
4. You want to customize a color scheme in which the United State is assigned to a
different color before doing xyplot().
Part III: Dreamland
Sam Quinones, the author of Dreamland: The True Tale of America’s Opiate Epidemic,
is coming to Ohio State to discuss his book and the topic of opiate addiction. News of
your mad data visualization skills has spread, and you have been asked to come up with
graphics for the poster that will be used to advertise the event.
Downland the drug poisoning mortality data from Carmen and read it to R. Geographically
link it to the county name data from the maps() library in R. Then create two
county-level maps of drug poisoning mortality in the United States, one for 2004 and the
other for 2014.
Some hints:
1. This part follows Lecture 14c fairly closely.
2. R sometimes reads data in as factors rather than text. Factors are vectors of integer
values with corresponding sets of character values to use when the factor
is displayed. They are also incredibly confusing because they look like text but
they don’t act like text. This dataset has variables that will be read in as factors
unless you use the stringsAsFactors = FALSE subcommand in your read.csv()
command. We highly recommend doing so.
3. The variable of interest has the annoyingly long name Estimated Age-adjusted
Death Rate, 16 Categories (in ranges). It also, as the name implies, contains
ranges (0-2, 2.1-4, etc.) rather than actual numbers. Keeping in mind that you’ll
2
want to plot colors later, you’ll want to create a new variable in the dataset that
takes a value of 1 when age-adjusted death rate is 0-2, 2 when it’s 2.1-4, and so on.
4. The death rates are measured as deaths per 100,000 population.
5. You’ll need to specify a color scheme for your map, and very few of the spectra
in RColorBrewer can handle 16 colors, which is what you’ll need if you want to
represent all of the categories in the data. You probably want your color scheme
to be a gradient from a lighter color to a darker color. The best way to do this is
to pick a lighter color and a darker color from an online color-to-hex converter and
then use colorRampPalette() to generate the gradient from one to the other.
The final product of this part will be two maps of the United States, one for drug poisoning
mortality in 2004 and one for drug poisoning mortality in 2014. Each map should color
each county by drug poisoning mortality rate for the relevant year and also write a short
paragraph to explain what different colors indicate in the two maps (5 pts).

因为专业，所以值得信赖。如有需要，请加QQ：99515681 或邮箱：[email protected]

微信：codehelp

PS 3780 Data Literacy & Visualization

猜你喜欢