bio185

Welcome to the course website for Biol-185: Data Science. This is where you will find tutorial pages, readings, assignments, and the class schedule along with grading information.

Schedule

All readings for this course will be from R for Data Science which is freely available on the web, but you might want to order a hard copy if you’d like to support the author. Please read the listed sections before class; assignments are meant to be done after class unless otherwise noted. Links in the table below will populate as the term progresses.

Week Topics Readings Assignments

0

Why R?

R 101

Friday

Introduction. You don't need to worry about installing anything; your RStudio accounts on RNA are already setup.

Explore: Introduction. A quick overview of a typical data analysis workflow.

Monday

Wednesday

1

Scripts and RMarkdowns and Packages, Oh My!

Visualizing data

Loading, Exploring and Transforming Data

Monday

Workflow: scripts. Read all of this short piece about R script files.

R Markdown. Look over sections 27.1 to 27.4, but only the introduction to 27.4 not all the details. Feel free to look over the rest if you're curious.

Wednesday

Workflow: projects. Read this entire section about R projects, environments, and file paths. Students often strugle to keep these things straight, so look over this section carefully and come to class with questions about where you got lost. For the section about file paths, when you are working in R Studio on rna.wlu.edu you're in a linux environment.

Data transformation. Read all of this section about how you can use tidyverse functions to filter, rearrange and calculate new values in tables of data.

Monday

Wednesday

2

Exploratory data analysis

Style Matters

Tidy(ing) data

Monday

Exploratory data analysis. Read over all of the topics in this excellent introduction to exploratory data analysis.

Wednesday

Tidy Data. Read over this whole section carefully. Your lives will be substantively improved by adhering to the principles of Tidy Data!

Monday

Wednesday

3

Data joins

Monday

Relational data. Read all of this section about how you can pull data from multiple sources.

Strings and Dates and times. These two sections are both a little long and a little dry; skim over them. It's likely that they contain information that will help you solve problems with strings and dates/time that you will run into working on your projects, and it will be good to have an idea of where to seek help when you do.

Wednesday

Factors. Read this section carefully as factors are a very powerful feature in R, but also very confusing at first.

Monday

Wednesday

4

Git, GitHub and Projects

Functions & Lexical scope

Understanding vectors and lists

Monday

Functions. Learn to write your very own functions! Doing it is easy. Doing it well is very hard. This chapter will give you a nice introduction.

Wednesday

Vectors. Vectors: we've been using them everywhere, without thinking too hard about what they are and how they work in R. This chapter is a nice codex on everything vector.

Iteration. We've also been iterating over things without thinking too hard about how it worked under the hood. This chapter introduces a number of very useful iteration design patterns.

Monday

Wednesday

5

Finding patterns: clustering

Finding patterns: machine learning

Monday

Wednesday

Reading Days

6

Leaflet & maps

Geospatial visualization

Monday

Wednesday

7

Deep learning

Monday

Shiny Gallery. Browse the Shiny app gallery to get an idea of what kinds of apps you'll be able to build using this R package.

Shiny Tutorial Video. I'd recommend watching at least Part 1 either before or after class. It is about 40 minutes long.

Wednesday

Project groups. Come to class ready to either give me the names of the people in your group and general project idea, or a list of possible projects ideas/types of data you're interested in.

8

Introducing Git and Github

[Project pitches]

Monday

Introduction to Git and RStudio. This is one of the best introductions I've found to Git, GitHub and using these tools in RStudio. Read the first part carefully; you can skim the Pull Request section for now.

Wednesday

Project Pitches. In a 5 minute project pitch, your group should introduce the data set(s) that you are working with and why you think the topic is an interesting candidate for your web app. You should also show mock-ups of what you want your app to look like and what features you'd like it to have; you can use pre-made slides or just draw on the whiteboard.

9

[Project workshop]

Monday

Deliverables Schedule. Before class Use your Github project site to create issues for each of the major tasks and features for your project and assign them to members of your group. I'll meet with each group individually durring class to discuss your anticipated timeline for completing each.

10

[Project workshop]

Wednesday

Deliverables check-in. I'll meet with each group again in class again to get a report on how well work has progressed on your major tasks. Your project should be at least in the alpha phase: this means a working Shiny app that you can show me, albeit likely still missing some/many features.

11

[Project workshop]

Wednesday

Deliverables check-in. By this point your project should be in the beta phase: this means a working app with most/all of the major features in place, but possibly with bugs and minor features still missing.

12

[Project workshop]

[Project Demonstrations & Feedback]

Wednesday

5:30 - 7pm Biology “Poster” Session

Wednesday

Project Demonstrations. In class today, you'll walk the class through a demonstration of your Shiny app. We'll provide feedback and polish suggestions; the final version of your app is due by the end of the day on Friday of finals week.

Course Description

We live in the era of Big Data. Major discoveries in science and medicine are being made by exploring large data sets in novel ways using computational tools. The challenge in the biomedical sciences is the same as in Silicon Valley: knowing what computational tools are right for a project and where to get started when exploring large data sets. In this course you will learn to use R, a popular open-source programming language and data analysis environment, to interactively explore data. Case studies will be drawn from across the sciences and medicine. Topics will include data visualization, machine learning, image analysis, geospatial analysis, and statistical inference on large data sets. We will also emphasize best practices in coding, data handling and adherence to the principles of reproducible research. No prior programming experience is required.

Course components

  • Course participation (20%)
  • Coding assignments (25%)
  • Shiny Project: Project pitch (5%)
  • Shiny Project: Weekly deliverables check-ins (5% x 2)
  • Shiny Project: Project demonstration (15%)
  • Shiny Project: Final submission (25%)

Course Participation: This portion of your grade will reflect (1) how actively you participate in the classroom and (2) the thoughtfulness and insightfulness of your contributions. If you have any questions about your participation in the course, I’d be happy to give you feedback at any point in the term.

Coding and data analysis assignments. Many of the exercises and case studies we explore in class will be followed by a coding or data analysis assignment for you to work on outside of class. These assignments will be graded on a binary basis: either your solution meets all of the requirements (full credit) or it doesn’t. You should try to complete these assignments after class on the day they are listed on the schedule above. It’s OK if you get stuck, just be prepared to ask questions when we meet next. Completed assignments are formally due on the last day of the term (Friday of finals week).

Shiny Project. Your final project in this course will be to design and implement a web application that gives the user an opportunity to explore a data set of your choosing in a novel way. We’ll spend lots of time in-class exploring the tools you’ll need to accomplish this task. These projects will be completed in groups of 2-3 people. We will use cool tools, task management, and in-class “check-ins” to track each group members’ contributions to the project. Grades related to these projects will be assigned individually. When grading project-related components of the course, I will use the following rubric: pieces of work can be evaluated on the basis of either their ambition or their level of perfection. A-level work distinguishes itself by being both ambitious (creative; insightful; pushes beyond the minimum requirements) and exhibiting perfection (fulfills the ambitions with precision and polish). B-level work achieves one of these goals (exhibits ambition or perfection), but not quite both. C-level work fulfills the minimum requirements of the assignment. At any point in the process please feel free to ask me where I see your project falling on these two spectrums.

Fig. 1 Grading schema

Learning Objectives

  • Effective scientific communication. In this course you will have many opportunities to practice effective scientific communication in both formal and informal settings, including: informal classroom discussions, a written project proposal, and through the code that you write describing statistical/computational approaches to data analysis problems.

  • Use and understanding of the scientific method. Your final data analysis project will challenge you to develop a clearly defined scientific question and to think deeply and creatively about computational approaches to answering that question.

  • Using the “tools and techniques” of science. This course is heavily focused on the computational and statistical tools commonly used to analyze large scale data sets in the sciences.

  • Scientific literacy. You will have ample opportunity in this course to increase your exposure to cutting-edge investigations and to practice critical analysis of experimental approaches and the interpretation of data.

Accessibility & Classroom Climate

Our goal is to make this course as accessible and inclusive to as diverse a range of participants as possible. If at any point during the term you have questions or comments concerning accessibility in our classroom, please feel free to discuss them with me. I am eager to consider ways to make the classroom more inviting and to ensure your full participation in the course.

I am committed to ensuring access to course content for all students. Reasonable accommodations are available for students with disabilities. Contact Lauren Kozak, Title IX Coordinator and Director of Disability Resources, to confidentially discuss your individual needs and the accommodation process. More information can be found at https://www.wlu.edu/disability-accommodations/undergraduate-accommodations.

If you have already been approved for accommodations, please meet with me within the first two weeks of the term so we can develop an implementation plan together. It is important to meet as early in the term as possible; this will ensure that your accommodations are implemented early on. If you have accommodations for test-taking, please remember that arrangements must be made at least a week before the date of the test or exam.

Attendance

If you miss class due to an event that you have previously discussed with me and that I have agreed constitutes a verified absence (examples include travel to interviews, academic conferences, or athletic competitions), you will not be penalized in the participation portion of your grade or for missed in-class work or projects. Because of the dynamic nature of this course, however, it will usually not be possible for you to make up missed classroom exercises or group work, so your final grade will simply be calculated from a smaller number of total possible points. If you know you are going to be missing class please talk to me about it as early in the term as possible. Also, please plan to remind me of your upcoming absence the week prior to the event.

Missing a class due to an un-verified absence will be reflected negatively in the class participation portion of your grade. Additionally, any missed in-class assignments or quizzes will be graded as a 0.