Welcome
This is the Syllabus for the Fall 2018 Computational Tools pilot class. This is an evolving document, please check back often.
About
No matter the discipline, scientific research and teaching in the 21st century relies heavily on computational tools. This class aims to introduce graduate students in the geological, geophysical and biogeochemical sciences to a wide range of commonly used concepts and open source data tools to empower them to find the right tool for their computational needs in research and teaching. Interested students should have a rough idea for a programming/modeling/data science project relevant to their research goals that they want to pursue as part of this class. Prior experience with at least one programming language is recommended.
Logistics
Schedule
Week 1
Topics (in class)
- Logistics
- Scheduling
- Class Format
- Intro to Git & GitHub
Materials & Resources
Homework
- Create a GitHub account
- Learn the basics of git & GitHub by completing the following steps:
- to get a visual of how git works, complete the first 3 lessons of the Introduction Sequence at Learning Git Branching (Introduction to Git Commits, Branching in Git, Merging in Git)
- follow the installation instructions for Git-it (Mac/Windows available), this includes the installation of a GitHub GUI with command line tools and a text editor of your choice (suggested are Atom or Sublime, both are excellent and will do the job, just read their websites and decide which resonates best with you)
- start Git-It on your desktop and complete the entire program to learn git configuration, working with repositories, commits, forks, branches, GitHub remotes, and an actual collaboration exercise.
- Join the Introductory Group exercise on GitHub Classroom by following this link and picking the already existing group
class
(we’ll all work together on this one)
- Complete the exercise outlined in the README of the group repository (only accessible after you’ve joined in the previous step)
- In preparation for next class, install the following software on your computer (you don’t have to use it yet, we just want you to be ready to dive straight in at the beginning of class):
Installing python and jupyter
- we recommend installing anaconda for python 3.6 (if you already have anaconda for python 2.7, no need to reinstall!)
- open a terminal and run
conda --version
to check the installed version (if lower than conda 4.4, recommend running conda update conda
)
- create a virtual environment for python 3.6 that we will use for the class (that way you can easily keep/run other python versions/configurations) by running
conda create -n class python=3.6 anaconda
- run
source activate class
(mac & linux) / activate class
(windows) to test-activate the new virtual environment
- check that running
python --version
gives Python 3.6.5 :: Anaconda, Inc.
- run
conda install -c conda-forge jupyterlab
to install jupyter lab
- run
source deactivate
(mac & linux) / deactivate
(windows) to leave the virtual environment again
- all done!
Installing R and RStudio
- install the newest version of R (should be 3.5.1), if you have older versions you want to keep you will still have access (see details on version switching)
- install/upgrade to the preview of the newest version of RStudio (should be >= 1.2.907)
- start RStudio
- in the console, run
install.packages(c("devtools", "tidyverse", "rmarkdown", "reticulate", "readxl", "latex2exp"))
- all done!
Week 2
Topics (in class)
- get your playground from GitHubClassroom (follow this link) and clone it to your system
- intro juypter notebooks & RMarkdown + \( \LaTeX \) math
- data types
Materials & Resources
Homework
Play with your playground-...
repository and the tutorials listed below. We recommend writing solutions down in Jupyter or RMarkdown notebooks for good practice. Also, git is your friend for keeping track of changes although branches are more optional if you’re the only user of the repository. Quick reminder on launching Jupyter and RStudio:
- Jupyter:
- either open a terminal in your repository folder and start jupyter from there, OR start it in any parent folder and navigate to your files manually
- run
source activate class
(mac & linux) / activate class
(windows) to activate the python environment (otherwise it will use your default environment)
- run
jupyter lab
to start the jupyter server (or jupyter notebook
if you prefer the single document version), navigate to your .ipynb
files from there (or create new ones)
- hit
Ctrl
+C
+C
to stop the notebook server when done
- run
source deactivate
(mac & linux) / deactivate
(windows) to leave the virtual environment again
- RStudio:
- double click the
project.Rproj
file in your repository folder or navigate to it from the project menu in the upper right corner of RStudio
- troubleshooting note on using python inside RMarkdown files:
- this does NOT work reliably with RStudio versions < 1.2.900, preview of the newest version of RStudio is recommended
- MacOS users may need to expose their path so the proper python version is found and used, do this by running the following command from your terminal:
echo -e "#expose bash PATH to R\nPATH=$PATH" >> ~/.Renviron
Python
- complete lessons Hello World through Functions in the Learn the basics section of learnpython.org
- create your own RMarkdown or Jupyter notebooks to complete the exercises
R
Complete one or both of the following lessons. Doing both is probably overkill if you’re already familiar with the materials but may be worth doing (or skimming) if you’d like to get some additional practice.
- Swirl (interactive exercises from within R but keeping a running tap of solutions in RMarkdown may still be useful)
- in your RStudio console, run/complete the following
install.packages("swirl")
swirl::install_course("R Programming")
swirl::swirl()
- select the R Programming course in the resulting menu and complete lessons Basic Building Blocks through Functions
- Data Carpentry workshop (create your own RMarkdown files to complete the challenges)
- go to the datacarpentry workshop (the data is for ecology but the approach is universal)
- complete chapters 1-3 (Before we start to Starting with data)
Week 3
Topics (in class)
- logic & operators
- functions
Materials & Resources
- Recommended books:
- Quick references (cheatsheets):
Homework
-
Add an item #5 to your profile in the group repository describing what you plan to do for your individual semester project. Be specific on the goal of your project and what aspects of/skills covered in this class you plan to make use of.
-
Additionally, continue to play with your playground-...
repository and the tutorials listed below.
Python
R
Again, complete one or both of the following lessons.
- Swirl
install.packages("swirl")
swirl::install_course("Getting and Cleaning Data")
swirl::install_course("Exploratory Data Analysis")
swirl::swirl()
- select the Getting and Cleaning Data course and complete lessons Manipulating Data with dplyr, Grouping and Chaining with dplyr and Tidying Data with tidyr
- select the Exploratory Data Analysis course and complete lesson 8: GGPlot2 Part2 (yes, part2, part1 is more about
qplot
which is not as powerful as ggplot
)
- Data Carpentry workshop
For some background and general reference on these tools, we recommend the online R for Data Science Book - an excellent resource for approaches to data science in R. Highly recommend bookmarking the link and using it as a reference whenever you want to learn more about the most up-to-date way of dealing with specific data manipulation and processing problems. For the tools in this homework assignment, the R for Data Science Book chapters on Tiday data and Visualization as well as the Introduction to aesthetics in Fundamentals of Data Visualization will be particularly valuable.
Week 4
Topics (in class)
- overview of individual projects
- function documentation
- analysis script examples
- data structures (matrices, frames, etc.)
- plotting
Homework
- set up a repository for your project - to start a private project repository in the Classroom organization, use this link
- if you want to use the project template we used earlier, you can download this repository and copy whichever parts you like into your new project
- create a branch for the week (
dev
or in-progress
or week-1
, whatever you prefer) and work on the project
- before next class, make a pull request to merge your week branch into
master
and request a review from your team partner (you will have to add them by their github user name as a Collaborator
in the repository Settings
–> Collaborators & Teams
section, with at least Read
access)
- review the pull request from your team partner adding comments for anything you are confused about or unclear why/what they are doing in their code
Week 5
Topics (in class)
- project progress updates
- loose ends on data structures
- numerical methods
Homework
- once again, create a branch for the week and work on your individual project
- before next class, make a pull request to merge your week branch into
master
and request a review from your team partner
- review the pull request from your team partner adding comments for anything you are confused about or unclear why/what they are doing in their code
Week 6
Topics (in class)
- problem set design:
- faculty present on their spring undergraduate classes
- group discussion, team selection & plan on problem sets implementation
- wrap up numerical integration
Homework
- continue individual projects (the usual weekly branch and review)
- decide which group project group you want to be in and think about ideas for problem set development
- in preparation for next class, accept the following assignments so we’re not waylaid by everyone doing this at the same time in class (no need to work on it yet!):
Week 7
Topics (in class)
- unit testing
- test-driven development
- package development
- error handling
Materials & Resources
Homework
- create a problem set repo for your problem set team using this template
- start on group problem sets
- continue individual projects
Week 8
Topics (in class)
- wrap up package development
- loose ends
- linting
Materials & Resources
Homework
Week 9
Topics (in class)
- microcontrollers intro
- microcontroller setup
- compile & flash code
- serial output
- reading analog data
- PWM output
Homework
- continue group problem sets
- continue individual projects
Materials & Resources
Week 10
Topics (in class)
Homework
- continue group problem sets
- continue individual projects
Week 11
Topics (in class)
Homework
- continue group problem sets
- continue individual projects
Materials & Resources
Week 12
Topics (in class)
- test the group problem sets
Homework
- finalize group problem sets
- Happy Thanksgiving!
Week 13
Topics (in class)
- present final group problem sets
- topic chosen by the class, some options:
- graphical user interfaces
- text processing & regular expressions
- databases
Homework
- finalize individual project
Week 14
Topics (in class)
- present final individual project
- wrap-up
Week 15
Party at AGU!