Tutorial 6 Answers

Problem Set 6: Questions and Answers

  1. In section B.1., why do
summer(x = 1, y = 2)
summer(x = 2, y = 1)

not yield the same output? Write in math what each one does.


Both functions calculate \[x^y\]. Therefore, the first function evaluates to \(1^2\), which is equal to 1. The second function evaluates to \(2^1\), which is equal to 2.

  1. In section C.1., why does the call summer3(x = 5, y = 3) return a value when summer2(x = 5, y = 3) does not?


The function summer2 requires a value for z so summer2(x = 5, y = 3) breaks.

The function summer3 can accept a z values, but R does not need a value for z1 to evaluate the function. Therefore,summer3(x = 5, y = 3)` works without error.

  1. In C.2., why does summer(x = "fred", y = "ted") yield an error?


The function summer uses x and y as numeric variables in a function. Therefore, when you try to do \[\text{fred}^{\text{ted}}\] R gets confused and breaks.

  1. Fix the function in part F to remove the graph background. In ggplot we remove the grey background by adding to the theme element:
plotto <- ggplot() +
  geom_whatever(data = df,
                mapping = aes(x = var, y = yvar)) +
  theme(panel.background = element_blank())

You can also get rid of the gridlines with

  theme(panel.background = element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank())

Answer: The original function was

# load data
crashes <- read.csv("H:/pppa_data_viz/2023/tutorials/data/tutorial_06/20230307_Crashes_in_DC.csv")

# graphing function
graphit2 <- function(xvar,namer1){
  ggplot() + 
    geom_histogram(data = crashes, 
         mapping = aes(x = {{xvar}})) +
    labs(title = paste0("Histogram of ",namer1),
         x = namer1)

# call the graphing function
graphit2(xvar = TOTAL_VEHICLES, namer1 = "total vehicles involved in crash")


We modify by adding theme elements:

graphit2 <- function(xvar,namer1){
  ggplot() + 
    geom_histogram(data = crashes, 
         mapping = aes(x = {{xvar}})) +
    labs(title = paste0("Histogram of ",namer1),
         x = namer1) +
    theme(panel.background = element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank())
graphit2(xvar = TOTAL_VEHICLES, namer1 = "total vehicles involved in crash")


  1. Make a function that automates a graphics operation of interest to you, using a dataset not from this tutorial.


Here is one example, using 311 data from the city of Los Angeles. I think that my histogram with the full distribution of values looks bad because of a few very high values. I use a function to make a variety of graphs dropping values above the 99th, 95th and 90th percentiles, sequentially.

# location of la's 311 data for 2022
# https://data.lacity.org/City-Infrastructure-Service-Requests/MyLA311-Service-Request-Data-2022/i5ke-k6by
three11 <- "https://data.lacity.org/resource/i5ke-k6by.csv"

# load the data
la3 <- read_csv(three11)
# --- calculate the length of time from start to close
# start date
la3$start.date <- as.Date(x = substr(la3$CreatedDate, start = 1, stop = 10), format = "%m/%d/%Y")
# stop date
la3$stop.date <- as.Date(x = substr(la3$ClosedDate, start = 1, stop = 10), format = "%m/%d/%Y")
# number of days between these two
la3$days <- la3$stop.date - la3$start.date


  0   1   2   3   4   5   6   7   8   9  10  11  12  13  17  18  19  24  25  32 
 61  45 234 191 142 154  74  29   4  10   5   4   3   2   2   1   1   4   6   1 
 36  37  40  43  44  45  54  59  78  96  97 101 120 163 174 325 423 
  5   1   1   1   1   1   1   1   1   1   2   2   1   1   1   1   2 
# -- function to see how the distribution looks when I cut off parts at the top 

histo <- function(topcoder){
   # find the value of the percentile in the function
   qer <- quantile(x = la3$days, probs = c(topcoder), na.rm = TRUE) 
   # keep only data below this value 
   la3.limit <- filter(la3, days < qer)
   # make a histogram
   la.hist <- ggplot() +
     geom_histogram(data = la3.limit,
                 mapping = aes(x = days)) +
     labs(subtitle = paste0("keep only values below ",topcoder*100," percentile"))

# --- call the function for various top-coding values
# drop above 99th p
histo(topcoder = 0.99)
Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.


# drop above 95th p
histo(topcoder = 0.95)
Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.


# drop above 90th p
histo(topcoder = 0.90)
Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.
