Find us on GitHub

Queen's University, Sir John A. MacDonald Hall Room 2

Feb 17-19, 2016

9:00 am - 4:00 pm

Instructors: Jeff Stafford, Hartmut Schmider, Robert Colautti

Helpers: Gang Liu

General Information

Software Carpentry's mission is to help scientists and engineers get more research done in less time and with less pain by teaching them basic lab skills for scientific computing. This hands-on workshop will cover basic concepts and tools, including program design, version control, and task automation. We are also giving a basic introduction to the usage of HPC resources such as the clusters at HPCVL. After this workshop, students will be able to effectively use UNIX systems, access and utilize supercomputing resources, and write basic programs for use in their research.

HPCVLis a consortium of four universities led by Queen's University, and includes Carleton University, University of Ottawa, and the Royal Military College of Canada. We specialize in secure, advanced computing resources and support for academic and medical clients. HPCVL operates a high performance data centre as part of the Compute Canada family.

For more information on what we teach and why, please see our paper "Best Practices for Scientific Computing".

Who: This course is aimed at graduate students and other researchers who want a basic introduction to scientific programming. The third day content has an emphasis on biology applications and is optional. You don't need to have any previous knowledge of the tools that will be presented at the workshop.

Where: 128 Union St. W, Kingston, Ontario. Get directions with OpenStreetMap or Google Maps.

Requirements: Participants must bring a laptop with an ssh client installed. They are also required to abide by Software Carpentry's Code of Conduct.

Contact: Please email hartmut.schmider@queensu.ca for more information.


Schedule

Wednesday, February 17, 2016

09:00 The Unix shell
10:30 Coffee (10 min)
12:00 Lunch break
13:00 Automation with Makefiles
14:30 Version Control with Git
16:00 Wrap-Up

Thursday, February 18, 2016

09:00 Programming with Python
10:30 Coffee (10 min)
12:00 Lunch break
13:00 Parallel Computing with Python
14:30 Using HPC Systems
16:00 Wrap-up

Friday, February 19, 2016

09:00 R for Reproducible Science
10:00 Coffee (10 min)
12:00 Lunch break
13:00 Elegant Graphics in R
15:00 Custom R packages
16:00 Wrap-up

Syllabus

The Unix Shell

  • Working with files and directories
  • Piping and redirecting input/output
  • Shell variables
  • Creating and running shell scripts
  • Looping over files
  • Useful UNIX tools
  • Reference...

Automation with Make

  • Makefiles
  • Automatic variables
  • Managing dependencies
  • Pattern rules
  • Variables and functions
  • Reference...

Version Control with Git

  • Creating a repository
  • Recording changes to files: add, commit, ...
  • Viewing changes: status, diff, ...
  • Ignoring files
  • Working on the web: clone, pull, push, ...
  • Resolving conflicts
  • Open licenses
  • Where to host work, and why
  • Reference...

Programming in Python

  • Using libraries
  • Working with arrays
  • Reading and plotting data
  • Creating and using functions
  • Loops and conditionals
  • Defensive programming
  • Using Python from the command line
  • Reference...

Programming in R

  • Getting started with R
  • Primitive types
  • List
  • Vectors, matrices and arrays
  • Data frames
  • Reading and plotting data
  • Statistics
  • Programming constructs
  • Functions
  • Reference...

Setup

To participate in this Software Carpentry workshop, you will need an ssh client to log into the cluster nodes that we provide. In addition, you will need an up-to-date web browser.

We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.

SSH access to cluster nodes

SSH is a Secure Shell protocol that encrytps all data streams. It is available on almost any platform.

Windows

  1. Download the MobaXterm SSH client for Windows.
  2. Run the installer and follow the instructions.

This will provide you with an ssh command prompt. You can now connect following the instructions below.

Mac OS X

The default shell in all versions of Mac OS X is Bash, so no need to install anything. You access Bash from the Terminal (found in /Applications/Utilities). You may want to keep Terminal in your dock for this workshop.

Linux

The default shell is usually Bash, but if your machine is set up differently you can run it by opening a terminal and typing bash. There is no need to install anything.

Git

Git is a version control system that lets you track who made changes to what when and has options for easily updating a shared or public version of your code on github.com. Ever wanted to make backups of your files? Git allows you to effectively maintain every version of every file in a project. You will need a supported web browser to access GitHub (current versions of Chrome, Firefox or Safari, or Internet Explorer version 9 or above). You do not have to do any setup for using Git, as it is installed on the systems we will work on. If you wish, you can install Git on your personal computer as well.

Text Editor

When you're writing code, it's nice to have a text editor that is optimized for writing code, with features like automatic color-coding of key words. You do not have to install a text editor, as it is installed on the systems we will work on.

Python

Python is a popular language for scientific computing, and great for general-purpose programming as well. Installing all of its scientific packages individually can be a bit difficult, so we recommend Anaconda, an all-in-one installer.

R

R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R, we will use RStudio.

To install R, you will need to install the base R distribution as well as the RStudio Release Preview. We recommend using the RStudio Release Preview over the normal version, as it provides several extra coding tools that are super helpful (like the ability to automatically check for errors in your code). Windows users should install RTools as well, as that is required for building packages on that OS.