View as a slideshow.
I’m going to start class today by drawing on the whiteboard to try to illustrate the key concepts behind Git, the tool that we’ll be using to collaborate on our group projects. By the time we’re done, the following terms and definitions should make sense to you:
Now would be an excellent time to ask questions if you’re confused about anything!
We’ll first walk through the steps necessary to get your projects setup on GitHub and in RStudio. Then we’ll walk through an example workflow for committing and sharing changes to a project.
We’ll do all of the steps that follow today in class. You only have to do them ONCE. We’ll cover the day-to-day workflow with Git in the next section.
Confer with your group members to pick a short, informative, name to use for your project. Use “-” instead of spaces (ex. “our-cool-project”), and only alpha-numeric characters. Pay attention to capitalization because it WILL MATTER.
In all of the steps below replace PROJECTNAME with your actual project name (please don’t actually use all caps).
You will have just gotten an invitation to join our class’s GitHub Organization in your email. Click the link in the email or go to the page above and join the organization.
This step should be done by ONE PERSON per group.
In this step, we’ll create a repository to hold the source code for your Shiny app on GitHub, under our class’s organization. Login to GitHub and go to the WL-Biol185-ShinyProjects organization page.
Then:
This step should be done by EVERYONE in your group.
You should see the same set of files listed on your “Files” tab that you had in your GitHub repository.
From here on out you need to be careful about which project you are in on RStudio. When you want to work on your Shiny app, switch into this project from the Project menu (upper right). When you want to sketch out class work, or work on assignments, make sure you are in a different project or no project!
This step should be done by EVERYONE in your group.
In RStudio from the menu bar select “Tools” -> “Shell…”
In the shell enter these two commands (with your actual email and name; duh.)
git config --global user.email "your-email@mail.wlu.edu"
git config --global user.name "My Name"
From the files pane, open up your “.gitignore” file and add the following line:
*.Rproj*
The will prevent you or your groupmates from including RStudio configuration files in the code that you push to GitHub.
This step should be done by EVERYONE in your group.
Let’s test to make sure everything is working so far.
Hopefully that all worked and nothing caught fire.
If you get tired of having to enter your github account and password each time you push changes (and are feel adventurous), you can setup what’s called an SSH key-pair with github that will allow you to connect securely. It’s a two step process:
All of the “Bash” commands shown in these tutorials can be entered using the Shell in RStudio (Tools menu).
Now that we’ve got local projects setup on each of your accounts, and GitHub repositories for you to use to collaborate, let’s create the two required files for a full-fledged Shiny web application.
From the File menu choose “New File” -> “R Script”. Enter this example code:
library(shiny)
# Define UI for application that draws a histogram
fluidPage(
# Application title
titlePanel("Old Faithful Geyser Data"),
# Sidebar with a slider input for number of bins
sidebarLayout(
sidebarPanel(
sliderInput("bins",
"Number of bins:",
min = 1,
max = 50,
value = 30)
),
# Show a plot of the generated distribution
mainPanel(
plotOutput("distPlot")
)
)
)
…and save it as “ui.R”. This code describes the layout of the input and output widgets on the webpage.
Create a new script file as above and enter this example code:
library(shiny)
# Define server logic required to draw a histogram
function(input, output) {
output$distPlot <- renderPlot({
# generate bins based on input$bins from ui.R
x <- faithful[, 2]
bins <- seq(min(x), max(x), length.out = input$bins + 1)
# draw the histogram with the specified number of bins
hist(x, breaks = bins, col = 'darkgray', border = 'white')
})
}
…and save it as “server.R”. This code connects inputs to outputs.
You’ll see we need a bit more structure to describe standalone Shiny apps than we did when we were embedding Shiny widgets in R Markdown documents. In particular, we need to explicitly connect render*
functions to specific outputs in the UI.
Notice that:
sliderInput('bins', ...)
hooks this slider up to input$bins
in server.RplotOutput("distPlot")
hooks this plot in the layout up to output$distPlot
in server.RTo get ideas for what you might want to include in your Shiny apps:
We’ll walk through the steps of making changes to your apps in R Studio, staging and commiting those changes, and then pushing your changes up to GitHub to share with your group.
This initial pass is going to be a little bit out-of-order because there isn’t anything new to pull from GitHub yet; we’ll go over the normal order for your workflow at the end.
Let’s make some example changes to your app files. Assign each person in your group to do one of the following:
actionButton
after the sliderInput
in ui.Rmax
number of bins to 100 for the sliderInput
in ui.R#
somewhere in ui.ROn your Git panel, you should all see “?” next to the new files we created above. That’s because we haven’t yet told Git to track these files. Click the checkbox under “Staged” to add them to the list of files that Git tracks. The “?” became an “A” for Add. Also check “Staged” for any other files that have been changed.
When you hit “Commit” and then select “ui.R” you’ll each see the change you made highlighted.
Commiting changes only creates a local “save point” in your code. If you want to share all of your changes (the series of commits you’ve made) you’ll need to Push them to GitHub. Note the message at the top of the “Git” panel in RStudio: you’ve made two local commits that makes your local repository (branch) be “ahead” of the GitHub repository “origin/master” by 2 commits.
Pick one person in your group to go first. Hit the “Push” button in the Commit window or on the “Git” pane.
Pick the second person in your group to go; Hit “Push”.
You’ll see a message like:
To https://github.com/WL-Biol185-ShinyProjects/sample-project.git
! [rejected] master -> master (fetch first)
error: failed to push some refs to 'https://github.com/WL-Biol185-ShinyProjects/sample-project.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
If you take a moment to digest this somewhat cryptic message, you’ll see that the problem is that person #1 pushed changes to the project that you don’t have. Rather than erasing their work and replacing it with yours Git is going to help you merge the two versions (“branches”) of the project.
Following the message’s advice, you should click “Pull” to download person #1’s files.
Depending on how you each edited your files one of two things might happen:
Automerge succeeds without any help from you; this works when Git’s merging algorithm can be pretty sure it knows how to merge the files without loosing anything.
You see a message like this:
From https://github.com/WL-Biol185-ShinyProjects/sample-project
bcba6e0..d1b8b44 master -> origin/master
Auto-merging ui.R
CONFLICT (content): Merge conflict in ui.R
Automatic merge failed; fix conflicts and then commit the result.
This message is telling us that Git’s merge algorithm can’t automatically merge your changes with the changes on GitHub, so a human (you) is going to have to get involved. If this happens you’ll also notice big orange “U”’s next to the afflicted files on the “Git” pane.
When you open up a file where there’s a conflict from a merge, you’ll see something like this at each place in the file where you need to make an edit:
<<<<<<< HEAD
max = 100,
value = 30), sliderInput("blarg"),
=======
max = 300,
value = 30)
>>>>>>> d1b8b44153b3817104302a5d04458f25e61260bd
Obviously the stuff git just added isn’t valid R code! That’s a good thing. Your local copy of your app won’t run until you’ve attended to fixing all of the conflicts you need to.
Your version of the code is above the =======
and their version of the code is below it. You’ll need to pick which one you want or write a new block that integrates the two. You obviously also need to delete the <<<<<<< HEAD
, =======
, and >>>>>>> bigscarynumber
lines.
Do you believe me now that it will be important for everyone in your group to write well formatted, easy to read, code?
Once you’ve fixed the conflicts, stage the files, make a new comit (with a message like “fixed your stupid conflicts, person #1”), and then you should be able to push!
So, you’ve probably guess that it’s a good idea to start with a Pull, to get your group member’s changes from GitHub, before you make modifications. So the pattern that you’ll generally want to follow is:
In general, you shouldn’t Push a set of changes that leaves your project in a broken state. Finish the feature/bug fix/major task that you’re working on before commiting.
When you’re ready to publish a live version of your app in a publically available location, you can run:
bio185::publishApp("PROJECTNAME")
That will fetch the current version of your code from GitHub and update the app linked from our course project’s page:
So fixing merge conflicts is annoying, but in a good way: you won’t accidentally throw out someone else’s work. It’s also less annoying than having to hand integrate lots of people’s edits to a Word document, because Git merge is generally smart enough to only make you change what can’t be fixed automatically.
Fortunately, there are some simple things you can do to avoid lots of merge conflicts, that also happen to be good project management practices for coding on a team (which is nearly always the case in the “real world”").
Git can usually automerge if changes occure in two different files or in two different places within a file. So if you take the time with your group to map out the structure of your project, in terms of what features will be implemented in which files, you can save yourselves a lot of headaches. Here are a couple of practical examples.
Within a script file: Map out your goals using comments. For example, my ui.R
script above might have started it’s life looking like this:
# Load the packages we'll need
# Define UI for an application that draws a histogram
# Application title
# Sidebar with a slider input for number of bins
# Show a plot of the generated distribution
No need to write any implementation code; just start by sketching out your goals.
Use multiple script files: you aren’t limited to putting ALL of your code in ui.R
and server.R
; in fact on big projects that’d be a terrible idea! You can use the source()
function to load any script file from another.
Consider this refactoring of our ui.R
script:
library(shiny)
# Define UI for application that draws a histogram
fluidPage(
# Application title
titlePanel("Old Faithful Geyser Data"),
# Sidebar with a slider input for number of bins
sidebarLayout(
sidebarPanel(
sliderInput("bins",
"Number of bins:",
min = 1,
max = 300,
value = 30)
),
# Show a plot of the generated distribution
mainPanel(
plotOutput("distPlot")
)
)
)
Here we’ll split out the code that makes the major UI elements (sidebar and main panel) into their own script files:
sidebar.R
:
library(shiny)
sidebar <- sidebarPanel(
sliderInput("bins",
"Number of bins:",
min = 1,
max = 300,
value = 30)
)
main-panel.R
:
library(shiny)
main_panel <- mainPanel(plotOutput("distPlot"))
ui.R
:
library(shiny)
source("sidebar.R")
source("main-panel.R")
# Define UI for application that draws a histogram
fluidPage(
# Application title
titlePanel("Old Faithful Geyser Data"),
# Sidebar with a slider input for number of bins
sidebarLayout(sidebar, main_panel)
)
In a couple of weeks your group will pitch your “big picture” features for the Shiny app that you’re going to build (source data, visualizations, analyses, etc.). The basic workflow on a software project that takes a feature from dream to reality is:
Each of these steps can, in turn, be broken down into a set of tasks. On a collaborative project, it’s essential that everyone own their own set of tasks. We’re going to use a tool on GitHub to help us keep track of tasks on these projects: Issues.
Open up your group’s repository page on GitHub and click on the “Issues” tab. Create a new Issue like “Project Design Goals.” Assign everyone in your group to it.
You’ll see that issues give you a nice conversation thread. Use them to track each persons’s set of tasks on the project and bugs that you find in other people’s tasks. These text blocks support markdown markup, let you reference commits, insert code, make check lists and also use twitter style “mentions.” For example, if you’d like to get my attention you can add “@whitwort” to an issue comment.
Once you’ve finished with a task/fixed a bug/etc. you can “Close issue”. The record of that work will remain, but not be shown by default. I’ll use your commit history and participation in the discussion of Issues as evidence of your contribution to your group’s project when grading at the end of the term.
As I’ve mentioned in class, we’re looking for data sets in the neighborhood of 10^5 - 10^9 data points. If there are lots of variables for each observation that might mean thousands of observations; if there are few, perhaps millions. We want big(ish) data that let you flex your Data Scientist muscles, but that are still small enough to fit into memory.
You must document the source for your data in your “README.md” file. If the license on the data set allows you to redistribute it, you should include it in your repository (as always, I recommend a data/
subfolder to keep things tidy). If not you’ll need to provide information about how you downloaded it (URL, etc).
You must also document all data wrangling and tidying steps that aren’t part of your Shiny app in (a) RMarkdown document(s). Include both the .Rmd source files and .html (knit) files in your repository. If you’d like to include that html in your web app itself, see the Shiny includeHTML
function.
At a minimum your “README.md” file should also document why you choose to focus on the data you did, why we should care about it, and why your app is awesome. It’d be great if you also included this text in your app itself.
After your project pitches next week, I’ll offer advice to help your group calibrate goals. But err on the side of ambitious. You can always cut features that aren’t going to work out down the road. I’d like to hear what you’re optimistic you can include.
It must run by 5pm on Friday of Finals week (as published on the RNA Shiny server).
It’d be nice to include a link to the projects hosting page on RNA in your README.md: