View as a slideshow.
So far we’ve been working with data sets that are already tidy; we’ve loaded them from packages or sensibly formatted files. In the wild, however, it’s rare that you come across data that are ready-to-go like this. On data analysis projects, you usually have to spend a significant portion of time wrangling your data in to a usable state.
For today, let’s take a look at a data set assessing changes in the global land and ocean temperature data sets hosted at the Goddard Institute for Space Studies. I took a guess that NASA would be a good place to find a messy data file to work with and they didn’t disappoint!
The source file we’ll work with today lives here on the internet:
https://data.giss.nasa.gov/gistemp/tabledata_v3/GLB.Ts+dSST.txt.
Open it in a web browser and have a look at it in all its messy glory. Make sure you understand what the numbers mean.
Normally, we’d have three options for fetching data from a file on the internet:
I like to keep all of my source data files separate from my script and output files. If you don’t have one already make a folder named ‘data’ in your current working directory (“Files” tab -> “New Folder”). You don’t have to do it this way, but you’ll need to change the file paths below.
To download the file to your current directory in R Studio, open a Shell (this a full bash shell), and run:
wget -O data/global-mean.txt https://data.giss.nasa.gov/gistemp/tabledata_v3/GLB.Ts+dSST.txt
## --2019-09-05 08:50:23-- https://data.giss.nasa.gov/gistemp/tabledata_v3/GLB.Ts+dSST.txt
## Resolving data.giss.nasa.gov (data.giss.nasa.gov)... 129.164.128.233, 2001:4d0:2310:230::233
## Connecting to data.giss.nasa.gov (data.giss.nasa.gov)|129.164.128.233|:443... connected.
## HTTP request sent, awaiting response... 200 OK
## Length: 16198 (16K) [text/plain]
## Saving to: ‘data/global-mean.txt’
##
## 0K .......... ..... 100% 596K=0.03s
##
## 2019-09-05 08:50:23 (596 KB/s) - ‘data/global-mean.txt’ saved [16198/16198]
Note the capitol “-O”, which is specifying our output file.
Fiddle all you like with the data import wizard; you’re not going to get this file to parse correctly!
Take a moment to count all of the ways this file is messed up. Next week we’ll explore some text processing tools that would allow us to take a more automated approach to cleaning up input files (useful if you have many!), but for now we’ll hand edit the file recording all of the corrections we made.
Open up the text file in RStudio by navigating to your data
directory on the “Files” tab and clicking on it.
Hand edit it to:
We’ll do the rest of the clean up in R.
We can now use R’s read.table
function to load the file into a table in R:
raw_temps <- read.table("data/global-mean-clean.txt",
header = TRUE,
na.strings = "****"
)
As always, let’s sanity check what kinds of vectors we have in each of our columns:
summary(raw_temps)
## Year Jan Feb Mar
## Min. :1880 Min. :-70.0000 Min. :-61.000 Min. :-62.000
## 1st Qu.:1914 1st Qu.:-28.0000 1st Qu.:-24.000 1st Qu.:-24.000
## Median :1948 Median : -4.0000 Median : -6.000 Median : -1.000
## Mean :1948 Mean : 0.6934 Mean : 2.168 Mean : 3.781
## 3rd Qu.:1982 3rd Qu.: 27.0000 3rd Qu.: 30.000 3rd Qu.: 26.000
## Max. :2016 Max. :117.0000 Max. :135.000 Max. :130.000
##
## Apr May Jun Jul
## Min. :-59.000 Min. :-54.000 Min. :-52.0000 Min. :-48.000
## 1st Qu.:-26.000 1st Qu.:-25.000 1st Qu.:-25.0000 1st Qu.:-20.000
## Median : -5.000 Median : -6.000 Median : -7.0000 Median : -5.000
## Mean : 1.715 Mean : 1.248 Mean : -0.4161 Mean : 2.715
## 3rd Qu.: 25.000 3rd Qu.: 26.000 3rd Qu.: 16.0000 3rd Qu.: 15.000
## Max. :109.000 Max. : 93.000 Max. : 78.0000 Max. : 83.000
##
## Aug Sep Oct Nov
## Min. :-51.000 Min. :-47.000 Min. :-55.000 Min. :-56.00
## 1st Qu.:-20.000 1st Qu.:-17.000 1st Qu.:-19.000 1st Qu.:-19.00
## Median : -4.000 Median : -3.000 Median : -1.000 Median : -2.00
## Mean : 2.978 Mean : 4.701 Mean : 5.328 Mean : 3.81
## 3rd Qu.: 19.000 3rd Qu.: 20.000 3rd Qu.: 20.000 3rd Qu.: 15.00
## Max. : 98.000 Max. : 90.000 Max. :106.000 Max. :104.00
##
## Dec J.D D.N DJF
## Min. :-78.0000 Min. :-47.000 -9 : 7 Min. :-64.00
## 1st Qu.:-25.0000 1st Qu.:-21.000 -22 : 5 1st Qu.:-25.00
## Median : -8.0000 Median : -7.000 -10 : 4 Median : -8.50
## Mean : 0.5329 Mean : 2.438 -2 : 4 Mean : 1.11
## 3rd Qu.: 22.0000 3rd Qu.: 19.000 -25 : 4 3rd Qu.: 27.25
## Max. :111.0000 Max. : 99.000 -18 : 3 Max. :121.00
## (Other):110 NA's :1
## MAM JJA SON Year.1
## Min. :-56.000 Min. :-47.000 Min. :-47.000 Min. :1880
## 1st Qu.:-25.000 1st Qu.:-21.000 1st Qu.:-18.000 1st Qu.:1914
## Median : -6.000 Median : -6.000 Median : -3.000 Median :1948
## Mean : 2.255 Mean : 1.737 Mean : 4.628 Mean :1948
## 3rd Qu.: 27.000 3rd Qu.: 16.000 3rd Qu.: 17.000 3rd Qu.:1982
## Max. :111.000 Max. : 85.000 Max. : 97.000 Max. :2016
##
We can see D.N
and DJF
are messed up; why?
If we needed them we would need to convert ***
and ****
to NA
s and then use as.numeric
to convert these character vectors to numeric vectors. But we’re going to erase them instead.
Let’s load the tidyverse:
library(tidyverse)
The tidyverse’s definition of Tidy Data is a table of values where:
If all of these things are true, your path to exploring that data set will be much easier than if they aren’t! It also means all of the tidyverse
functions will “just work” with your table.
Now let’s take a moment to appreciate all of the ways in which this data set is not Tidy. It:
We have a few options for getting rid of the columns we don’t want (remember assigning NULL
to a column will erase it). But since we know we just want to keep the first 13
columns (year and months), we can easily index:
raw_temps[1:13]
## Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1 1880 -30 -21 -18 -27 -14 -29 -24 -8 -17 -16 -19 -22
## 2 1881 -10 -14 1 -3 -4 -28 -7 -3 -9 -20 -26 -16
## 3 1882 9 8 1 -20 -18 -25 -11 3 -1 -23 -21 -25
## 4 1883 -34 -42 -18 -25 -26 -13 -9 -14 -19 -12 -21 -19
## 5 1884 -18 -13 -36 -36 -32 -38 -35 -27 -24 -22 -30 -30
## 6 1885 -66 -30 -24 -45 -42 -50 -29 -27 -19 -20 -22 -7
## 7 1886 -43 -46 -41 -29 -27 -39 -16 -31 -19 -25 -26 -25
## 8 1887 -66 -48 -32 -37 -33 -21 -19 -28 -19 -32 -25 -38
## 9 1888 -43 -43 -47 -28 -22 -20 -10 -11 -7 1 0 -12
## 10 1889 -21 14 4 4 -3 -12 -5 -18 -18 -22 -32 -31
## 11 1890 -48 -48 -41 -38 -48 -27 -30 -36 -36 -23 -37 -30
## 12 1891 -46 -49 -15 -25 -17 -22 -22 -21 -13 -24 -37 -3
## 13 1892 -26 -15 -36 -35 -25 -20 -28 -20 -25 -17 -49 -29
## 14 1893 -69 -51 -24 -32 -35 -24 -14 -24 -18 -16 -17 -38
## 15 1894 -55 -31 -20 -41 -30 -43 -32 -29 -23 -17 -25 -22
## 16 1895 -44 -42 -30 -23 -23 -25 -16 -16 -2 -11 -15 -12
## 17 1896 -23 -15 -29 -33 -19 -13 -6 -9 -5 4 -16 -12
## 18 1897 -22 -19 -12 -1 0 -12 -4 -3 -4 -10 -18 -26
## 19 1898 -6 -34 -55 -33 -35 -20 -22 -22 -19 -32 -35 -22
## 20 1899 -18 -39 -35 -21 -20 -26 -13 -4 0 0 12 -27
## 21 1900 -40 -8 2 -14 -6 -15 -9 -4 1 8 -13 -14
## 22 1901 -30 -5 5 -6 -18 -10 -9 -13 -17 -29 -17 -30
## 23 1902 -19 -3 -29 -27 -31 -34 -26 -28 -20 -27 -36 -46
## 24 1903 -27 -6 -23 -39 -41 -44 -30 -44 -43 -42 -38 -47
## 25 1904 -64 -55 -46 -50 -50 -49 -48 -43 -47 -35 -16 -29
## 26 1905 -38 -59 -25 -36 -33 -31 -25 -21 -15 -23 -8 -21
## 27 1906 -31 -34 -15 -2 -21 -22 -27 -19 -25 -20 -38 -18
## 28 1907 -44 -53 -25 -40 -46 -43 -35 -37 -32 -24 -51 -50
## 29 1908 -46 -36 -58 -46 -40 -39 -35 -45 -33 -43 -51 -50
## 30 1909 -70 -47 -52 -59 -54 -52 -43 -30 -37 -39 -31 -55
## 31 1910 -44 -43 -47 -39 -34 -36 -31 -34 -37 -39 -56 -69
## 32 1911 -64 -60 -62 -55 -51 -47 -41 -43 -38 -26 -20 -25
## 33 1912 -27 -13 -37 -20 -20 -26 -41 -51 -47 -55 -38 -42
## 34 1913 -41 -44 -44 -36 -45 -46 -34 -32 -32 -34 -18 -4
## 35 1914 2 -13 -23 -28 -19 -22 -24 -15 -13 -5 -20 -10
## 36 1915 -20 -1 -8 7 -1 -16 -3 -15 -12 -22 -12 -25
## 37 1916 -20 -23 -31 -25 -27 -44 -34 -27 -29 -28 -42 -78
## 38 1917 -46 -53 -47 -38 -48 -40 -23 -26 -18 -35 -29 -71
## 39 1918 -44 -33 -21 -40 -37 -28 -22 -26 -14 -3 -16 -30
## 40 1919 -21 -19 -25 -17 -20 -28 -21 -19 -17 -16 -29 -35
## 41 1920 -15 -22 -8 -26 -26 -33 -32 -29 -20 -29 -33 -47
## 42 1921 -4 -21 -28 -36 -36 -31 -16 -24 -16 -6 -16 -18
## 43 1922 -34 -44 -13 -22 -34 -32 -27 -31 -29 -33 -17 -17
## 44 1923 -27 -37 -32 -38 -33 -24 -29 -30 -28 -13 3 -6
## 45 1924 -24 -27 -12 -35 -19 -28 -27 -35 -30 -36 -23 -43
## 46 1925 -34 -35 -24 -25 -30 -34 -30 -19 -13 -17 3 11
## 47 1926 20 7 12 -15 -25 -25 -21 -11 -11 -11 -6 -30
## 48 1927 -28 -21 -39 -31 -25 -27 -15 -19 -6 -1 -4 -36
## 49 1928 -4 -12 -28 -29 -30 -41 -21 -25 -20 -19 -9 -20
## 50 1929 -47 -61 -34 -40 -39 -43 -33 -29 -23 -15 -14 -55
## 51 1930 -29 -24 -8 -26 -25 -19 -17 -11 -11 -8 14 -9
## 52 1931 -10 -22 -6 -21 -22 -6 1 0 -6 0 -12 -10
## 53 1932 13 -18 -20 -7 -22 -30 -24 -24 -11 -10 -26 -22
## 54 1933 -34 -32 -29 -23 -25 -32 -20 -23 -26 -24 -31 -47
## 55 1934 -27 -4 -31 -27 -11 -14 -11 -10 -16 -11 -1 -9
## 56 1935 -37 11 -13 -35 -26 -23 -19 -17 -17 -8 -29 -22
## 57 1936 -29 -39 -23 -20 -17 -19 -6 -12 -6 -4 -5 -4
## 58 1937 -11 5 -17 -17 -7 -8 -5 3 14 10 9 -12
## 59 1938 0 -4 5 5 -7 -17 -9 -4 3 11 1 -26
## 60 1939 -13 -12 -20 -12 -7 -8 -6 -5 0 -3 6 40
## 61 1940 -15 6 12 16 5 5 10 1 12 7 13 19
## 62 1941 13 23 6 11 10 4 15 14 2 24 12 14
## 63 1942 26 5 13 14 14 11 2 -3 0 6 13 12
## 64 1943 -1 22 1 13 10 -1 14 3 11 30 25 28
## 65 1944 41 31 34 27 26 22 23 23 31 27 12 5
## 66 1945 13 2 11 24 10 2 7 25 22 22 10 -10
## 67 1946 15 6 0 11 -4 -17 -9 -8 -2 -6 -2 -29
## 68 1947 -13 -8 5 4 -6 0 -6 -8 -14 6 -1 -18
## 69 1948 5 -13 -23 -9 8 -5 -13 -10 -10 -7 -8 -23
## 70 1949 9 -16 -1 -7 -9 -22 -13 -8 -8 -3 -8 -19
## 71 1950 -30 -26 -6 -21 -12 -6 -9 -18 -10 -20 -35 -20
## 72 1951 -35 -44 -19 -10 -2 -5 0 5 7 6 0 15
## 73 1952 16 12 -10 2 -5 -4 5 7 8 -4 -17 -2
## 74 1953 9 16 11 20 8 8 2 8 6 5 -5 3
## 75 1954 -28 -10 -12 -18 -20 -16 -16 -13 -7 -1 8 -18
## 76 1955 11 -21 -36 -23 -20 -8 -9 4 -13 -5 -28 -32
## [ reached 'max' / getOption("max.print") -- omitted 61 rows ]
Alternatively, we could use the tidyverse
select
function (useful inside of a pipe chain):
raw_temps %>%
select(1:13)
## Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1 1880 -30 -21 -18 -27 -14 -29 -24 -8 -17 -16 -19 -22
## 2 1881 -10 -14 1 -3 -4 -28 -7 -3 -9 -20 -26 -16
## 3 1882 9 8 1 -20 -18 -25 -11 3 -1 -23 -21 -25
## 4 1883 -34 -42 -18 -25 -26 -13 -9 -14 -19 -12 -21 -19
## 5 1884 -18 -13 -36 -36 -32 -38 -35 -27 -24 -22 -30 -30
## 6 1885 -66 -30 -24 -45 -42 -50 -29 -27 -19 -20 -22 -7
## 7 1886 -43 -46 -41 -29 -27 -39 -16 -31 -19 -25 -26 -25
## 8 1887 -66 -48 -32 -37 -33 -21 -19 -28 -19 -32 -25 -38
## 9 1888 -43 -43 -47 -28 -22 -20 -10 -11 -7 1 0 -12
## 10 1889 -21 14 4 4 -3 -12 -5 -18 -18 -22 -32 -31
## 11 1890 -48 -48 -41 -38 -48 -27 -30 -36 -36 -23 -37 -30
## 12 1891 -46 -49 -15 -25 -17 -22 -22 -21 -13 -24 -37 -3
## 13 1892 -26 -15 -36 -35 -25 -20 -28 -20 -25 -17 -49 -29
## 14 1893 -69 -51 -24 -32 -35 -24 -14 -24 -18 -16 -17 -38
## 15 1894 -55 -31 -20 -41 -30 -43 -32 -29 -23 -17 -25 -22
## 16 1895 -44 -42 -30 -23 -23 -25 -16 -16 -2 -11 -15 -12
## 17 1896 -23 -15 -29 -33 -19 -13 -6 -9 -5 4 -16 -12
## 18 1897 -22 -19 -12 -1 0 -12 -4 -3 -4 -10 -18 -26
## 19 1898 -6 -34 -55 -33 -35 -20 -22 -22 -19 -32 -35 -22
## 20 1899 -18 -39 -35 -21 -20 -26 -13 -4 0 0 12 -27
## 21 1900 -40 -8 2 -14 -6 -15 -9 -4 1 8 -13 -14
## 22 1901 -30 -5 5 -6 -18 -10 -9 -13 -17 -29 -17 -30
## 23 1902 -19 -3 -29 -27 -31 -34 -26 -28 -20 -27 -36 -46
## 24 1903 -27 -6 -23 -39 -41 -44 -30 -44 -43 -42 -38 -47
## 25 1904 -64 -55 -46 -50 -50 -49 -48 -43 -47 -35 -16 -29
## 26 1905 -38 -59 -25 -36 -33 -31 -25 -21 -15 -23 -8 -21
## 27 1906 -31 -34 -15 -2 -21 -22 -27 -19 -25 -20 -38 -18
## 28 1907 -44 -53 -25 -40 -46 -43 -35 -37 -32 -24 -51 -50
## 29 1908 -46 -36 -58 -46 -40 -39 -35 -45 -33 -43 -51 -50
## 30 1909 -70 -47 -52 -59 -54 -52 -43 -30 -37 -39 -31 -55
## 31 1910 -44 -43 -47 -39 -34 -36 -31 -34 -37 -39 -56 -69
## 32 1911 -64 -60 -62 -55 -51 -47 -41 -43 -38 -26 -20 -25
## 33 1912 -27 -13 -37 -20 -20 -26 -41 -51 -47 -55 -38 -42
## 34 1913 -41 -44 -44 -36 -45 -46 -34 -32 -32 -34 -18 -4
## 35 1914 2 -13 -23 -28 -19 -22 -24 -15 -13 -5 -20 -10
## 36 1915 -20 -1 -8 7 -1 -16 -3 -15 -12 -22 -12 -25
## 37 1916 -20 -23 -31 -25 -27 -44 -34 -27 -29 -28 -42 -78
## 38 1917 -46 -53 -47 -38 -48 -40 -23 -26 -18 -35 -29 -71
## 39 1918 -44 -33 -21 -40 -37 -28 -22 -26 -14 -3 -16 -30
## 40 1919 -21 -19 -25 -17 -20 -28 -21 -19 -17 -16 -29 -35
## 41 1920 -15 -22 -8 -26 -26 -33 -32 -29 -20 -29 -33 -47
## 42 1921 -4 -21 -28 -36 -36 -31 -16 -24 -16 -6 -16 -18
## 43 1922 -34 -44 -13 -22 -34 -32 -27 -31 -29 -33 -17 -17
## 44 1923 -27 -37 -32 -38 -33 -24 -29 -30 -28 -13 3 -6
## 45 1924 -24 -27 -12 -35 -19 -28 -27 -35 -30 -36 -23 -43
## 46 1925 -34 -35 -24 -25 -30 -34 -30 -19 -13 -17 3 11
## 47 1926 20 7 12 -15 -25 -25 -21 -11 -11 -11 -6 -30
## 48 1927 -28 -21 -39 -31 -25 -27 -15 -19 -6 -1 -4 -36
## 49 1928 -4 -12 -28 -29 -30 -41 -21 -25 -20 -19 -9 -20
## 50 1929 -47 -61 -34 -40 -39 -43 -33 -29 -23 -15 -14 -55
## 51 1930 -29 -24 -8 -26 -25 -19 -17 -11 -11 -8 14 -9
## 52 1931 -10 -22 -6 -21 -22 -6 1 0 -6 0 -12 -10
## 53 1932 13 -18 -20 -7 -22 -30 -24 -24 -11 -10 -26 -22
## 54 1933 -34 -32 -29 -23 -25 -32 -20 -23 -26 -24 -31 -47
## 55 1934 -27 -4 -31 -27 -11 -14 -11 -10 -16 -11 -1 -9
## 56 1935 -37 11 -13 -35 -26 -23 -19 -17 -17 -8 -29 -22
## 57 1936 -29 -39 -23 -20 -17 -19 -6 -12 -6 -4 -5 -4
## 58 1937 -11 5 -17 -17 -7 -8 -5 3 14 10 9 -12
## 59 1938 0 -4 5 5 -7 -17 -9 -4 3 11 1 -26
## 60 1939 -13 -12 -20 -12 -7 -8 -6 -5 0 -3 6 40
## 61 1940 -15 6 12 16 5 5 10 1 12 7 13 19
## 62 1941 13 23 6 11 10 4 15 14 2 24 12 14
## 63 1942 26 5 13 14 14 11 2 -3 0 6 13 12
## 64 1943 -1 22 1 13 10 -1 14 3 11 30 25 28
## 65 1944 41 31 34 27 26 22 23 23 31 27 12 5
## 66 1945 13 2 11 24 10 2 7 25 22 22 10 -10
## 67 1946 15 6 0 11 -4 -17 -9 -8 -2 -6 -2 -29
## 68 1947 -13 -8 5 4 -6 0 -6 -8 -14 6 -1 -18
## 69 1948 5 -13 -23 -9 8 -5 -13 -10 -10 -7 -8 -23
## 70 1949 9 -16 -1 -7 -9 -22 -13 -8 -8 -3 -8 -19
## 71 1950 -30 -26 -6 -21 -12 -6 -9 -18 -10 -20 -35 -20
## 72 1951 -35 -44 -19 -10 -2 -5 0 5 7 6 0 15
## 73 1952 16 12 -10 2 -5 -4 5 7 8 -4 -17 -2
## 74 1953 9 16 11 20 8 8 2 8 6 5 -5 3
## 75 1954 -28 -10 -12 -18 -20 -16 -16 -13 -7 -1 8 -18
## 76 1955 11 -21 -36 -23 -20 -8 -9 4 -13 -5 -28 -32
## [ reached 'max' / getOption("max.print") -- omitted 61 rows ]
The data set still isn’t tidy: Month should be a variable, but it’s spread
across columns. In tidyverse lingo, what we need to gather
them up!
I’d like to take a moment here to point out a REALLY useful resource to use when you’re trying to figure out which tidyverse
function you need to help you wrangle data: Data Wrangling Cheetsheet.
You can find this file in RStudio: Help -> Cheetsheets -> Data Manipulation with dplyr and tidyr
The picture that matches what we need to do tells us it’s a gather
:
global_temps <- gather(raw_temps[1:13], key = "month", value = "index", Jan:Dec)
What happened there? Take a look at the table.
Last, but not least, it should bother you that the “Year” column is capitalized and the two others aren’t. I’d recommend always keeping your variable names in lower case for consistency.
We can use colnames
to update the names of our columns and tolower
to switch strings in character vectors to all lower case:
colnames(global_temps) <- tolower(colnames(global_temps))
The last problem we have to solve is to create dates from our year (numeric) and month (character) vectors. To do that, we’d first like to convert months from arbitrary strings into numbers.
In operations like this, where you want to translate one set of values one-to-one into another set of values, one robust solution is to create a vector of values you want to translate to and then add names to the vector containing the values you want to translate from. Confused? It’s easier to understand with an example!
Let’s make a vector holding month numbers:
month_n <- 1:12
Remember what this does:
month_n[3]
## [1] 3
We can set the “names” of the numbers in this vector with the names
function. I could type out my month strings by hand, but I’ll be lazy and use the fact that they’re already in columns in our original table:
colnames(raw_temps[2:13])
## [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov"
## [12] "Dec"
So we can do this:
names(month_n) <- colnames(raw_temps[2:13])
Once you have a named vector, you can index elements by name in addition to using numeric indexes. For example:
month_n["Apr"]
## Apr
## 4
month_n[c("Apr", "Aug")]
## Apr Aug
## 4 8
month_n[c("Apr", "Aug", "Apr")]
## Apr Aug Apr
## 4 8 4
Making our month number column then becomes as easy as indexing on our months column:
global_temps$month_n <- month_n[global_temps$month]
rep
eat functionThe solution above is robust in that it will work even if there’s isn’t a regular pattern in repetition of months down the rows of the table. But that actually is the case for our table: we have 12 months each repeated once for the 137 years contained in the data set.
If you need a new column that contains a set of values that regularly repeats, an alternative solution is to use the rep
eat function:
rep(1:12, each = 137)
The each
argument says repeat each value 137 times.
Alternatively, we can specify a times
:
rep(1:12, times = 137)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11
## [24] 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10
## [47] 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9
## [70] 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8
## [93] 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7
## [116] 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6
## [139] 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5
## [162] 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4
## [185] 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
## [208] 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2
## [231] 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1
## [254] 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
## [277] 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11
## [300] 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10
## [323] 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9
## [346] 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8
## [369] 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7
## [392] 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6
## [415] 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5
## [438] 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4
## [461] 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
## [484] 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2
## [507] 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1
## [530] 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
## [553] 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11
## [576] 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10
## [599] 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9
## [622] 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8
## [645] 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7
## [668] 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6
## [691] 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5
## [714] 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4
## [737] 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
## [760] 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2
## [783] 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1
## [806] 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
## [829] 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11
## [852] 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10
## [875] 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9
## [898] 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8
## [921] 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7
## [944] 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6
## [967] 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5
## [990] 6 7 8 9 10 11 12 1 2 3 4
## [ reached getOption("max.print") -- omitted 644 entries ]
Note the difference!
Finally, we’ll use a helper function from a new package called lubridate
to easily create a date column:
library(lubridate)
global_temps$date <- make_date(year = global_temps$year, month = global_temps$month_n)
We’ll explore Dates and Times in more depth next week.
Now, on your own:
geom_smooth
to add a kernel trend line (which is this a nice approach given these data)lm