To get started with this assignment, run:
bio185::startAssignment("02-basic-plotting")
That will create a file named 02-basic-plotting.Rmd
in your current working directory and update any data files you need in your data/
directory.
Load the ggplot2
package:
We’ll use the diamonds
data set for the rest of the questions in this section; remind yourself what variables are available by printing a preview of the table:
## # A tibble: 53,940 x 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.290 Premium I VS2 62.4 58 334 4.2 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
## 10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39
## # … with 53,930 more rows
Make a histogram showing the distribution of the price
variable:
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Make a density distribution showing the same:
Does diamond color affect prices? Do the same as above, but group prices by the color of the diamond; optionally make it pretty using a little transparency:
Let’s end with a bit more of a challenge; you’ll probably need to consult the ggplot2
documentation with help
to do this one. Plot the price of diamonds as a function of carats (why does this relationship make more sense than the inverse?) with a scatter plot. Color the points based on the cut of the diamond. Use transparency to make it easier to see overlapping points.