Tidy(ing) Data

Data

The source file we’ll work with today lives here on the internet:

https://data.giss.nasa.gov/gistemp/tabledata_v3/GLB.Ts+dSST.txt.

Open it in a web browser and have a look at it in all its messy glory. Make sure you understand what the numbers mean.

Normally, we’d have three options for fetching data from a file on the internet:

  • Download the file to your computer and then upload it to your R Studio account using the “Upload” button on the “Files” tab.
  • Skip the middle man and download the file directly. You can do this using the “Shell” found in the “Tools” menu.

To download the file to your current directory in R Studio, open a Shell (this a full bash shell), and run:

--2019-09-05 08:50:22--  https://data.giss.nasa.gov/gistemp/tabledata_v3/GLB.Ts+dSST.txt
Resolving data.giss.nasa.gov (data.giss.nasa.gov)... 129.164.128.233, 2001:4d0:2310:230::233
Connecting to data.giss.nasa.gov (data.giss.nasa.gov)|129.164.128.233|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16198 (16K) [text/plain]
Saving to: ‘data/global-mean.txt’

     0K .......... .....                                      100%  592K=0.03s

2019-09-05 08:50:22 (592 KB/s) - ‘data/global-mean.txt’ saved [16198/16198]

Note the capitol “-O”, which is specifying our output file.

Fiddle all you like with the data import wizard; you’re not going to get this file to parse correctly!

Cleaning up the input file

Open up the text file in RStudio by navigating to your data directory on the “Files” tab and clicking on it.

Hand edit it to:

  • Get rid of all of the header lines (1-7)
  • Get rid of all of the blank lines and repeats of column names (eg. 23-24)
  • Get rid of all of the footer lines (last 7)
  • Leave a single blank line at the end of the file (should be 139).
  • Save it AS A NEW FILE: “global-mean-clean.txt”

We’ll do the rest of the clean up in R.

Reading the table

We can now use R’s read.table function to load the file into a table in R:

As always, let’s sanity check what kinds of vectors we have in each of our columns:

      Year           Jan                Feb               Mar         
 Min.   :1880   Min.   :-70.0000   Min.   :-61.000   Min.   :-62.000  
 1st Qu.:1914   1st Qu.:-28.0000   1st Qu.:-24.000   1st Qu.:-24.000  
 Median :1948   Median : -4.0000   Median : -6.000   Median : -1.000  
 Mean   :1948   Mean   :  0.6934   Mean   :  2.168   Mean   :  3.781  
 3rd Qu.:1982   3rd Qu.: 27.0000   3rd Qu.: 30.000   3rd Qu.: 26.000  
 Max.   :2016   Max.   :117.0000   Max.   :135.000   Max.   :130.000  
                                                                      
      Apr               May               Jun                Jul         
 Min.   :-59.000   Min.   :-54.000   Min.   :-52.0000   Min.   :-48.000  
 1st Qu.:-26.000   1st Qu.:-25.000   1st Qu.:-25.0000   1st Qu.:-20.000  
 Median : -5.000   Median : -6.000   Median : -7.0000   Median : -5.000  
 Mean   :  1.715   Mean   :  1.248   Mean   : -0.4161   Mean   :  2.715  
 3rd Qu.: 25.000   3rd Qu.: 26.000   3rd Qu.: 16.0000   3rd Qu.: 15.000  
 Max.   :109.000   Max.   : 93.000   Max.   : 78.0000   Max.   : 83.000  
                                                                         
      Aug               Sep               Oct               Nov        
 Min.   :-51.000   Min.   :-47.000   Min.   :-55.000   Min.   :-56.00  
 1st Qu.:-20.000   1st Qu.:-17.000   1st Qu.:-19.000   1st Qu.:-19.00  
 Median : -4.000   Median : -3.000   Median : -1.000   Median : -2.00  
 Mean   :  2.978   Mean   :  4.701   Mean   :  5.328   Mean   :  3.81  
 3rd Qu.: 19.000   3rd Qu.: 20.000   3rd Qu.: 20.000   3rd Qu.: 15.00  
 Max.   : 98.000   Max.   : 90.000   Max.   :106.000   Max.   :104.00  
                                                                       
      Dec                J.D               D.N           DJF        
 Min.   :-78.0000   Min.   :-47.000   -9     :  7   Min.   :-64.00  
 1st Qu.:-25.0000   1st Qu.:-21.000   -22    :  5   1st Qu.:-25.00  
 Median : -8.0000   Median : -7.000   -10    :  4   Median : -8.50  
 Mean   :  0.5329   Mean   :  2.438   -2     :  4   Mean   :  1.11  
 3rd Qu.: 22.0000   3rd Qu.: 19.000   -25    :  4   3rd Qu.: 27.25  
 Max.   :111.0000   Max.   : 99.000   -18    :  3   Max.   :121.00  
                                      (Other):110   NA's   :1       
      MAM               JJA               SON              Year.1    
 Min.   :-56.000   Min.   :-47.000   Min.   :-47.000   Min.   :1880  
 1st Qu.:-25.000   1st Qu.:-21.000   1st Qu.:-18.000   1st Qu.:1914  
 Median : -6.000   Median : -6.000   Median : -3.000   Median :1948  
 Mean   :  2.255   Mean   :  1.737   Mean   :  4.628   Mean   :1948  
 3rd Qu.: 27.000   3rd Qu.: 16.000   3rd Qu.: 17.000   3rd Qu.:1982  
 Max.   :111.000   Max.   : 85.000   Max.   : 97.000   Max.   :2016  
                                                                     

We can see D.N and DJF are messed up; why?

Tidy Data

Let’s load the tidyverse:

The tidyverse’s definition of Tidy Data is a table of values where:

  • Each row represents one and only one observation
  • Each column represents one and only one variable
  • ALL variables in the design get one column

Now let’s take a moment to appreciate all of the ways in which this data set is not Tidy. It:

  • Uses column names to store values of a variable (month of observation)
  • Uses columns differently: some are summaries of others
  • Inexplicably, has two Year columns

Removing unwanted columns

   Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1  1880 -30 -21 -18 -27 -14 -29 -24  -8 -17 -16 -19 -22
2  1881 -10 -14   1  -3  -4 -28  -7  -3  -9 -20 -26 -16
3  1882   9   8   1 -20 -18 -25 -11   3  -1 -23 -21 -25
4  1883 -34 -42 -18 -25 -26 -13  -9 -14 -19 -12 -21 -19
5  1884 -18 -13 -36 -36 -32 -38 -35 -27 -24 -22 -30 -30
6  1885 -66 -30 -24 -45 -42 -50 -29 -27 -19 -20 -22  -7
7  1886 -43 -46 -41 -29 -27 -39 -16 -31 -19 -25 -26 -25
8  1887 -66 -48 -32 -37 -33 -21 -19 -28 -19 -32 -25 -38
9  1888 -43 -43 -47 -28 -22 -20 -10 -11  -7   1   0 -12
10 1889 -21  14   4   4  -3 -12  -5 -18 -18 -22 -32 -31
11 1890 -48 -48 -41 -38 -48 -27 -30 -36 -36 -23 -37 -30
12 1891 -46 -49 -15 -25 -17 -22 -22 -21 -13 -24 -37  -3
13 1892 -26 -15 -36 -35 -25 -20 -28 -20 -25 -17 -49 -29
14 1893 -69 -51 -24 -32 -35 -24 -14 -24 -18 -16 -17 -38
15 1894 -55 -31 -20 -41 -30 -43 -32 -29 -23 -17 -25 -22
16 1895 -44 -42 -30 -23 -23 -25 -16 -16  -2 -11 -15 -12
17 1896 -23 -15 -29 -33 -19 -13  -6  -9  -5   4 -16 -12
18 1897 -22 -19 -12  -1   0 -12  -4  -3  -4 -10 -18 -26
19 1898  -6 -34 -55 -33 -35 -20 -22 -22 -19 -32 -35 -22
20 1899 -18 -39 -35 -21 -20 -26 -13  -4   0   0  12 -27
21 1900 -40  -8   2 -14  -6 -15  -9  -4   1   8 -13 -14
22 1901 -30  -5   5  -6 -18 -10  -9 -13 -17 -29 -17 -30
23 1902 -19  -3 -29 -27 -31 -34 -26 -28 -20 -27 -36 -46
24 1903 -27  -6 -23 -39 -41 -44 -30 -44 -43 -42 -38 -47
25 1904 -64 -55 -46 -50 -50 -49 -48 -43 -47 -35 -16 -29
26 1905 -38 -59 -25 -36 -33 -31 -25 -21 -15 -23  -8 -21
27 1906 -31 -34 -15  -2 -21 -22 -27 -19 -25 -20 -38 -18
28 1907 -44 -53 -25 -40 -46 -43 -35 -37 -32 -24 -51 -50
29 1908 -46 -36 -58 -46 -40 -39 -35 -45 -33 -43 -51 -50
30 1909 -70 -47 -52 -59 -54 -52 -43 -30 -37 -39 -31 -55
31 1910 -44 -43 -47 -39 -34 -36 -31 -34 -37 -39 -56 -69
32 1911 -64 -60 -62 -55 -51 -47 -41 -43 -38 -26 -20 -25
33 1912 -27 -13 -37 -20 -20 -26 -41 -51 -47 -55 -38 -42
34 1913 -41 -44 -44 -36 -45 -46 -34 -32 -32 -34 -18  -4
35 1914   2 -13 -23 -28 -19 -22 -24 -15 -13  -5 -20 -10
36 1915 -20  -1  -8   7  -1 -16  -3 -15 -12 -22 -12 -25
37 1916 -20 -23 -31 -25 -27 -44 -34 -27 -29 -28 -42 -78
38 1917 -46 -53 -47 -38 -48 -40 -23 -26 -18 -35 -29 -71
39 1918 -44 -33 -21 -40 -37 -28 -22 -26 -14  -3 -16 -30
40 1919 -21 -19 -25 -17 -20 -28 -21 -19 -17 -16 -29 -35
41 1920 -15 -22  -8 -26 -26 -33 -32 -29 -20 -29 -33 -47
42 1921  -4 -21 -28 -36 -36 -31 -16 -24 -16  -6 -16 -18
43 1922 -34 -44 -13 -22 -34 -32 -27 -31 -29 -33 -17 -17
44 1923 -27 -37 -32 -38 -33 -24 -29 -30 -28 -13   3  -6
45 1924 -24 -27 -12 -35 -19 -28 -27 -35 -30 -36 -23 -43
46 1925 -34 -35 -24 -25 -30 -34 -30 -19 -13 -17   3  11
47 1926  20   7  12 -15 -25 -25 -21 -11 -11 -11  -6 -30
48 1927 -28 -21 -39 -31 -25 -27 -15 -19  -6  -1  -4 -36
49 1928  -4 -12 -28 -29 -30 -41 -21 -25 -20 -19  -9 -20
50 1929 -47 -61 -34 -40 -39 -43 -33 -29 -23 -15 -14 -55
51 1930 -29 -24  -8 -26 -25 -19 -17 -11 -11  -8  14  -9
52 1931 -10 -22  -6 -21 -22  -6   1   0  -6   0 -12 -10
53 1932  13 -18 -20  -7 -22 -30 -24 -24 -11 -10 -26 -22
54 1933 -34 -32 -29 -23 -25 -32 -20 -23 -26 -24 -31 -47
55 1934 -27  -4 -31 -27 -11 -14 -11 -10 -16 -11  -1  -9
56 1935 -37  11 -13 -35 -26 -23 -19 -17 -17  -8 -29 -22
57 1936 -29 -39 -23 -20 -17 -19  -6 -12  -6  -4  -5  -4
58 1937 -11   5 -17 -17  -7  -8  -5   3  14  10   9 -12
59 1938   0  -4   5   5  -7 -17  -9  -4   3  11   1 -26
60 1939 -13 -12 -20 -12  -7  -8  -6  -5   0  -3   6  40
61 1940 -15   6  12  16   5   5  10   1  12   7  13  19
62 1941  13  23   6  11  10   4  15  14   2  24  12  14
63 1942  26   5  13  14  14  11   2  -3   0   6  13  12
64 1943  -1  22   1  13  10  -1  14   3  11  30  25  28
65 1944  41  31  34  27  26  22  23  23  31  27  12   5
66 1945  13   2  11  24  10   2   7  25  22  22  10 -10
67 1946  15   6   0  11  -4 -17  -9  -8  -2  -6  -2 -29
68 1947 -13  -8   5   4  -6   0  -6  -8 -14   6  -1 -18
69 1948   5 -13 -23  -9   8  -5 -13 -10 -10  -7  -8 -23
70 1949   9 -16  -1  -7  -9 -22 -13  -8  -8  -3  -8 -19
71 1950 -30 -26  -6 -21 -12  -6  -9 -18 -10 -20 -35 -20
72 1951 -35 -44 -19 -10  -2  -5   0   5   7   6   0  15
73 1952  16  12 -10   2  -5  -4   5   7   8  -4 -17  -2
74 1953   9  16  11  20   8   8   2   8   6   5  -5   3
75 1954 -28 -10 -12 -18 -20 -16 -16 -13  -7  -1   8 -18
76 1955  11 -21 -36 -23 -20  -8  -9   4 -13  -5 -28 -32
 [ reached 'max' / getOption("max.print") -- omitted 61 rows ]

Alternatively, we could use the tidyverse select function (useful inside of a pipe chain):

   Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1  1880 -30 -21 -18 -27 -14 -29 -24  -8 -17 -16 -19 -22
2  1881 -10 -14   1  -3  -4 -28  -7  -3  -9 -20 -26 -16
3  1882   9   8   1 -20 -18 -25 -11   3  -1 -23 -21 -25
4  1883 -34 -42 -18 -25 -26 -13  -9 -14 -19 -12 -21 -19
5  1884 -18 -13 -36 -36 -32 -38 -35 -27 -24 -22 -30 -30
6  1885 -66 -30 -24 -45 -42 -50 -29 -27 -19 -20 -22  -7
7  1886 -43 -46 -41 -29 -27 -39 -16 -31 -19 -25 -26 -25
8  1887 -66 -48 -32 -37 -33 -21 -19 -28 -19 -32 -25 -38
9  1888 -43 -43 -47 -28 -22 -20 -10 -11  -7   1   0 -12
10 1889 -21  14   4   4  -3 -12  -5 -18 -18 -22 -32 -31
11 1890 -48 -48 -41 -38 -48 -27 -30 -36 -36 -23 -37 -30
12 1891 -46 -49 -15 -25 -17 -22 -22 -21 -13 -24 -37  -3
13 1892 -26 -15 -36 -35 -25 -20 -28 -20 -25 -17 -49 -29
14 1893 -69 -51 -24 -32 -35 -24 -14 -24 -18 -16 -17 -38
15 1894 -55 -31 -20 -41 -30 -43 -32 -29 -23 -17 -25 -22
16 1895 -44 -42 -30 -23 -23 -25 -16 -16  -2 -11 -15 -12
17 1896 -23 -15 -29 -33 -19 -13  -6  -9  -5   4 -16 -12
18 1897 -22 -19 -12  -1   0 -12  -4  -3  -4 -10 -18 -26
19 1898  -6 -34 -55 -33 -35 -20 -22 -22 -19 -32 -35 -22
20 1899 -18 -39 -35 -21 -20 -26 -13  -4   0   0  12 -27
21 1900 -40  -8   2 -14  -6 -15  -9  -4   1   8 -13 -14
22 1901 -30  -5   5  -6 -18 -10  -9 -13 -17 -29 -17 -30
23 1902 -19  -3 -29 -27 -31 -34 -26 -28 -20 -27 -36 -46
24 1903 -27  -6 -23 -39 -41 -44 -30 -44 -43 -42 -38 -47
25 1904 -64 -55 -46 -50 -50 -49 -48 -43 -47 -35 -16 -29
26 1905 -38 -59 -25 -36 -33 -31 -25 -21 -15 -23  -8 -21
27 1906 -31 -34 -15  -2 -21 -22 -27 -19 -25 -20 -38 -18
28 1907 -44 -53 -25 -40 -46 -43 -35 -37 -32 -24 -51 -50
29 1908 -46 -36 -58 -46 -40 -39 -35 -45 -33 -43 -51 -50
30 1909 -70 -47 -52 -59 -54 -52 -43 -30 -37 -39 -31 -55
31 1910 -44 -43 -47 -39 -34 -36 -31 -34 -37 -39 -56 -69
32 1911 -64 -60 -62 -55 -51 -47 -41 -43 -38 -26 -20 -25
33 1912 -27 -13 -37 -20 -20 -26 -41 -51 -47 -55 -38 -42
34 1913 -41 -44 -44 -36 -45 -46 -34 -32 -32 -34 -18  -4
35 1914   2 -13 -23 -28 -19 -22 -24 -15 -13  -5 -20 -10
36 1915 -20  -1  -8   7  -1 -16  -3 -15 -12 -22 -12 -25
37 1916 -20 -23 -31 -25 -27 -44 -34 -27 -29 -28 -42 -78
38 1917 -46 -53 -47 -38 -48 -40 -23 -26 -18 -35 -29 -71
39 1918 -44 -33 -21 -40 -37 -28 -22 -26 -14  -3 -16 -30
40 1919 -21 -19 -25 -17 -20 -28 -21 -19 -17 -16 -29 -35
41 1920 -15 -22  -8 -26 -26 -33 -32 -29 -20 -29 -33 -47
42 1921  -4 -21 -28 -36 -36 -31 -16 -24 -16  -6 -16 -18
43 1922 -34 -44 -13 -22 -34 -32 -27 -31 -29 -33 -17 -17
44 1923 -27 -37 -32 -38 -33 -24 -29 -30 -28 -13   3  -6
45 1924 -24 -27 -12 -35 -19 -28 -27 -35 -30 -36 -23 -43
46 1925 -34 -35 -24 -25 -30 -34 -30 -19 -13 -17   3  11
47 1926  20   7  12 -15 -25 -25 -21 -11 -11 -11  -6 -30
48 1927 -28 -21 -39 -31 -25 -27 -15 -19  -6  -1  -4 -36
49 1928  -4 -12 -28 -29 -30 -41 -21 -25 -20 -19  -9 -20
50 1929 -47 -61 -34 -40 -39 -43 -33 -29 -23 -15 -14 -55
51 1930 -29 -24  -8 -26 -25 -19 -17 -11 -11  -8  14  -9
52 1931 -10 -22  -6 -21 -22  -6   1   0  -6   0 -12 -10
53 1932  13 -18 -20  -7 -22 -30 -24 -24 -11 -10 -26 -22
54 1933 -34 -32 -29 -23 -25 -32 -20 -23 -26 -24 -31 -47
55 1934 -27  -4 -31 -27 -11 -14 -11 -10 -16 -11  -1  -9
56 1935 -37  11 -13 -35 -26 -23 -19 -17 -17  -8 -29 -22
57 1936 -29 -39 -23 -20 -17 -19  -6 -12  -6  -4  -5  -4
58 1937 -11   5 -17 -17  -7  -8  -5   3  14  10   9 -12
59 1938   0  -4   5   5  -7 -17  -9  -4   3  11   1 -26
60 1939 -13 -12 -20 -12  -7  -8  -6  -5   0  -3   6  40
61 1940 -15   6  12  16   5   5  10   1  12   7  13  19
62 1941  13  23   6  11  10   4  15  14   2  24  12  14
63 1942  26   5  13  14  14  11   2  -3   0   6  13  12
64 1943  -1  22   1  13  10  -1  14   3  11  30  25  28
65 1944  41  31  34  27  26  22  23  23  31  27  12   5
66 1945  13   2  11  24  10   2   7  25  22  22  10 -10
67 1946  15   6   0  11  -4 -17  -9  -8  -2  -6  -2 -29
68 1947 -13  -8   5   4  -6   0  -6  -8 -14   6  -1 -18
69 1948   5 -13 -23  -9   8  -5 -13 -10 -10  -7  -8 -23
70 1949   9 -16  -1  -7  -9 -22 -13  -8  -8  -3  -8 -19
71 1950 -30 -26  -6 -21 -12  -6  -9 -18 -10 -20 -35 -20
72 1951 -35 -44 -19 -10  -2  -5   0   5   7   6   0  15
73 1952  16  12 -10   2  -5  -4   5   7   8  -4 -17  -2
74 1953   9  16  11  20   8   8   2   8   6   5  -5   3
75 1954 -28 -10 -12 -18 -20 -16 -16 -13  -7  -1   8 -18
76 1955  11 -21 -36 -23 -20  -8  -9   4 -13  -5 -28 -32
 [ reached 'max' / getOption("max.print") -- omitted 61 rows ]

Tidy-ifiying the columns

You can find this file in RStudio: Help -> Cheetsheets -> Data Manipulation with dplyr and tidyr

The picture that matches what we need to do tells us it’s a gather:

What happened there? Take a look at the table.

We can use colnames to update the names of our columns and tolower to switch strings in character vectors to all lower case:

Creating dates

Using a named vector

Let’s make a vector holding month numbers:

Remember what this does:

[1] 3

 [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov"
[12] "Dec"

So we can do this:

Once you have a named vector, you can index elements by name in addition to using numeric indexes. For example:

Apr 
  4 
Apr Aug 
  4   8 
Apr Aug Apr 
  4   8   4 

Making our month number column then becomes as easy as indexing on our months column:

Using the repeat function

If you need a new column that contains a set of values that regularly repeats, an alternative solution is to use the repeat function:

The each argument says repeat each value 137 times.

Alternatively, we can specify a times:

   [1]  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11
  [24] 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10
  [47] 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9
  [70] 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8
  [93]  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7
 [116]  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6
 [139]  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5
 [162]  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4
 [185]  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3
 [208]  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2
 [231]  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1
 [254]  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12
 [277]  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11
 [300] 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10
 [323] 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9
 [346] 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8
 [369]  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7
 [392]  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6
 [415]  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5
 [438]  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4
 [461]  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3
 [484]  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2
 [507]  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1
 [530]  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12
 [553]  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11
 [576] 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10
 [599] 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9
 [622] 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8
 [645]  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7
 [668]  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6
 [691]  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5
 [714]  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4
 [737]  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3
 [760]  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2
 [783]  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1
 [806]  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12
 [829]  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11
 [852] 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10
 [875] 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9
 [898] 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8
 [921]  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7
 [944]  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6
 [967]  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5
 [990]  6  7  8  9 10 11 12  1  2  3  4
 [ reached getOption("max.print") -- omitted 644 entries ]

Note the difference!

Lubridate

Finally, we’ll use a helper function from a new package called lubridate to easily create a date column:

We’ll explore Dates and Times in more depth next week.

Analysis

Now, on your own:

  • Plot temperature indexes as a function of time
  • Play with geom_smooth to add a kernel trend line (which is this a nice approach given these data)
  • If you’re feeling ambition try fitting a linear model to (some) of the data using lm
  • Make it interactive! Allow zooming on the x-axis.
  • Go back to the source data page and compare northern and southern hemisphere trends.