• Introduction to Remote Computing (Pilot)
  • Overview
    • Introductory skills
    • Intermediate skills
    • Advanced skills
    • Additional materials
  • 1 Introduction to the UNIX Command Line
    • 1.1 Introduction to UNIX
      • 1.1.1 Learning Goals
    • 1.2 Navigation
      • 1.2.1 Learning Goals
    • 1.3 Viewing & Searching
      • 1.3.1 Learning Goals
    • 1.4 File Manipulation
      • 1.4.1 Learning Goals
      • 1.4.2 Renaming a bunch of files
    • 1.5 Some final notes
  • 2 Creating and modifying text files on remote computers
    • 2.1 Text files vs other files
      • 2.1.1 OK, OK, what does this all mean in practice?
      • 2.1.2 Working with gzipped files
      • 2.1.3 Digression: file extensions are often meaningful (but don’t have to be)
      • 2.1.4 Let’s edit this file!
      • 2.1.5 Running the editor and exiting/saving
      • 2.1.6 Navigating in nano
      • 2.1.7 Long lines - note!
      • 2.1.8 Slightly more advanced features
      • 2.1.9 Getting help!
      • 2.1.10 Challenges:
    • 2.2 Big Powerful Editors
      • 2.2.1 Big Powerful Editor 1: vi
      • 2.2.2 Big Powerful Editor 2: emacs
      • 2.2.3 An opinion
    • 2.3 Remote vs local, and why editors?
    • 2.4 Editors that run locally on your laptop/desktop
    • 2.5 Thinking about editors as a means to an end
    • 2.6 Other ways to create, edit, filter, and modify files
      • 2.6.1 Redirection, appending, and piping.
      • 2.6.2 The simplest possible “editor” - echo
      • 2.6.3 Piping and filtering
    • 2.7 Working with CSV files
      • 2.7.1 Use csvtk when working with CSV files, maybe.
    • 2.8 A quick primer on compression.
      • 2.8.1 Gzip and .gz files.
      • 2.8.2 zip and compressing multiple files.
    • 2.9 Concluding thoughts
  • 3 Connecting to remote computers with ssh
    • 3.1 SSH and Clients
      • 3.1.1 Some security thoughts
      • 3.1.2 ssh as a protocol - many clients!
    • 3.2 Mac OS X: Using the Terminal program
    • 3.3 Windows: Connecting to remote computers with MobaXterm
    • 3.4 Logging out and logging back in.
    • 3.5 You’re logged on to a remote computer. Now what?
      • 3.5.1 Welcome to your account!
      • 3.5.2 Loading some files into your account
      • 3.5.3 Revisiting file and path manipulation
      • 3.5.4 Revisiting file editing
    • 3.6 Copying files to and from your local computer.
      • 3.6.1 Mac OS X: Copying files using ssh.
      • 3.6.2 Windows: Copying files using MobaXterm.
      • 3.6.3 View and change the file you just downloaded
      • 3.6.4 Copy the file back to farm.
      • 3.6.5 Digression: why do you need to log into/log out of farm on Mac OS X?
    • 3.7 Some commands are available! Others are not.
    • 3.8 Summing up file transfer - a challenge!
    • 3.9 Summing things up
  • 4 Running programs on remote computers and retrieving the results
    • 4.1 Using SSH private/public key pairs
    • 4.2 Mac OS X and Linux: Using ssh private keys to log in
    • 4.3 Windows/MobaXterm: Using ssh private keys to log in
    • 4.4 Some tips on your private key
    • 4.5 Working on farm
      • 4.5.1 First, download some files:
      • 4.5.2 Configuring your account on login
    • 4.6 Using multiple terminals
      • 4.6.1 Who am I and where am I running!?
      • 4.6.2 Looking at what’s running
      • 4.6.3 E-mailing the systems administrators
    • 4.7 File systems, directories, and shared systems
      • 4.7.1 Read and write permissions into other directories
      • 4.7.2 Listing directory and file permissions
      • 4.7.3 Files have the same permission options
      • 4.7.4 How do groups work?
      • 4.7.5 How can you use this?
      • 4.7.6 Things that regular users cannot do
    • 4.8 Disk space, file size, and temporary files
    • 4.9 Summing things up
  • 5 Installing software on remote computers with conda
    • 5.1 Why is software installation hard?
    • 5.2 Getting started with conda
      • 5.2.1 Installing conda
      • 5.2.2 Log into farm
      • 5.2.3 Creating your first environment & installing csvtk!
      • 5.2.4 Installation!
      • 5.2.5 csvtk in a bit more detail
      • 5.2.6 Where is the software coming from!?
      • 5.2.7 Digression: there are many ways to install software!
    • 5.3 Installing more software in your current environment
      • 5.3.1 Finding and specifying versions
      • 5.3.2 Making and using environment files
      • 5.3.3 Updating, removing, etc software
      • 5.3.4 Creating multiple environments
      • 5.3.5 Tech interlude: what is conda doing?
      • 5.3.6 Challenges with using one big environment
      • 5.3.7 How Titus uses conda
      • 5.3.8 Finding packages within conda
    • 5.4 Using the ‘bioconda’ and ‘conda-forge’ channels
      • 5.4.1 Mac OS X and Linux, but not Windows
      • 5.4.2 How to engage with conda-forge and bioconda
    • 5.5 Conda and data science: R and Python
      • 5.5.1 Conda and R
      • 5.5.2 Conda and Python
      • 5.5.3 Supporting interactive packages (RStudio and JupyterLab)
    • 5.6 Tricky things to think about with conda
      • 5.6.1 It can take a long time to install lots of software
      • 5.6.2 Explicit package listing
    • 5.7 Reference list of Conda Commands
    • 5.8 More Reading on Conda
    • 5.9 Discussion items:
    • 5.10 In summary
  • 6 Structuring your projects for current and future you
    • 6.1 Learning Objectives
      • 6.1.1 Lesson requirements
    • 6.2 Transferring files around efficiently
      • 6.2.1 recursive scp with -r
      • 6.2.2 sftp
      • 6.2.3 zip -r to create collections of files
      • 6.2.4 Working with .tar.gz files
      • 6.2.5 Probably the most useful advice: use a transfer directory
    • 6.3 Retrieving remote files from Web sites
    • 6.4 Dealing with files: some recommendations
    • 6.5 Farm vs cloud
    • 6.6 Thinking about data science projects!
    • 6.7 One example: a rough bioinformatics workflow
    • 6.8 Sending and Receiving Data
      • 6.8.1 Downloading data - is it correct?
    • 6.9 Storing data
      • 6.9.1 Bioinformatics: What do I back up?
      • 6.9.2 Bioinformatics: How big should I expect the files to be?
      • 6.9.3 How often should I backup my data?
      • 6.9.4 Where do I back up my data?
    • 6.10 Where do I work with large amounts of data?
      • 6.10.1 High Performance Computing Clusters
      • 6.10.2 Amazon Web Service
    • 6.11 Setting up your project
      • 6.11.1 Things to think about
    • 6.12 Naming files
    • 6.13 Looking forward to the next few workshops: techniques for doing data science on remote computers.
    • 6.14 Additional resources
  • 7 Automating your analyses and executing long-running analyses on remote computers
    • 7.1 What is a script?
    • 7.2 Getting started
    • 7.3 Automating commands by putting them in a text file
      • 7.3.1 Running scripts with bash
    • 7.4 for Loops
      • 7.4.1 Subsetting
      • 7.4.2 Variables
    • 7.5 Troubleshooting scripts
      • 7.5.1 Practicing set -e in bash scripts
    • 7.6 If statements
      • 7.6.1 Running scripts in a loop
    • 7.7 Persistent sessions with screen and tmux
    • 7.8 Concluding thoughts
    • 7.9 Appendix: exercise answers
  • 8 Keeping Track of Your Files with Version Control
    • 8.1 Learning Objectives
    • 8.2 What is git?
    • 8.3 What is GitHub?
      • 8.3.1 Create a GitHub Account
      • 8.3.2 Create a New Repository
    • 8.4 Using git
      • 8.4.1 Set up git on Farm
      • 8.4.2 Optional: Set up a Password Helper
      • 8.4.3 Clone the Repository
      • 8.4.4 Edit a File
      • 8.4.5 Commit a File
      • 8.4.6 View the Repository History on GitHub
    • 8.5 Challenge Question 1
    • 8.6 Revisiting the Workflow
    • 8.7 Undoing Changes
      • 8.7.1 Restoring a File
      • 8.7.2 Reverting a Commit
    • 8.8 Challenge Question 2
    • 8.9 Working Collaboratively
      • 8.9.1 Editing on GitHub
      • 8.9.2 Merge Conflicts
    • 8.10 Challenge Question 3
    • 8.11 Odds and Ends
      • 8.11.1 Ignoring Files with .gitignore
      • 8.11.2 Setting up a Repository without GitHub
    • 8.12 Additional Resources
  • 9 Automating your analyses with the snakemake workflow system
    • 9.1 What is a workflow and why use one?
    • 9.2 Snakemake: A workflow management system
      • 9.2.1 Fun fact
      • 9.2.2 The Snakefile
    • 9.3 Getting started - logging into farm!
    • 9.4 Installing snakemake
    • 9.5 More setup
      • 9.5.1 Create a working directory
      • 9.5.2 Download some data
    • 9.6 RNA-Seq workflow we will automate
    • 9.7 First step: quality control with FASTQC
      • 9.7.1 Create a Snakefile
    • 9.8 Some features of workflows
      • 9.8.1 What are these flags (-p, -j)?
      • 9.8.2 When you run snakemake, by default, it runs the first rule.
    • 9.9 Making the rules more generic
    • 9.10 Wildcards
    • 9.11 Adding more rules
      • 9.11.1 Downloading the reference genome
      • 9.11.2 Add the index genome command:
      • 9.11.3 Running Salmon quant
      • 9.11.4 One version of the final Snakefile
    • 9.12 Titus’ version of the final snakefile as created during the workshop
    • 9.13 Random aside: --dry-run or -n
    • 9.14 Advanced features
      • 9.14.1 Rule-specific conda environments with conda: and --use-conda
      • 9.14.2 parallelizing snakemake: -j
    • 9.15 Practical advice: How to build your workflow
      • 9.15.1 Approach 1: write down your shell commands
      • 9.15.2 Approach 2: automate one step that you run a lot
    • 9.16 Summary of what we did today.
    • 9.17 More Snakemake resources
    • 9.18 A quick checklist:
  • 10 Executing large analyses on HPC clusters with slurm
    • 10.1 What is a cluster?
    • 10.2 How do clusters work?
      • 10.2.1 Job Schedulers
      • 10.2.2 EITHER: run an interactive session with srun
      • 10.2.3 OR: Submit batch scripts with sbatch
      • 10.2.4 Flags to use when submitting jobs with sbatch or srun
      • 10.2.5 Repeatability through SBATCH variables in shell scripts
      • 10.2.6 Reprise: running HelloWorld.sh via srun
      • 10.2.7 Choosing between srun and sbatch
      • 10.2.8 A stock sbatch script that includes activating a conda environment
    • 10.3 Some useful tips and tricks for happy slurm-ing
      • 10.3.1 Trick 1: running srun inside of a screen.
      • 10.3.2 Trick 2: running snakemake inside of an sbatch script.
      • 10.3.3 Monitoring your jobs with squeue
      • 10.3.4 Canceling your jobs with scancel
    • 10.4 More on resources and queues and sharing
      • 10.4.1 Measuring your resource usage
      • 10.4.2 Nodes vs CPUs vs tasks
      • 10.4.3 Partitions
      • 10.4.4 How to share within your group
      • 10.4.5 How can you get an account on your HPC?
    • 10.5 What we’ve shown you today.
    • 10.6 Some final thoughts before departing farm and moving into the cloud.
  • 11 Making use of on-demand “cloud” computers from Amazon Web Services
    • 11.1 Workshop structure and plan
    • 11.2 Some background
      • 11.2.1 Costs and payment
    • 11.3 Amazon, terminology, and logging in!
      • 11.3.1 EC2
      • 11.3.2 Some features of AWS
    • 11.4 Let’s get started!
      • 11.4.1 “Spinning up” instances
      • 11.4.2 Connecting to instance
    • 11.5 Using your computer “in the cloud”
      • 11.5.1 Inspecting your computer
      • 11.5.2 You can do all the UNIX things
      • 11.5.3 Install conda
      • 11.5.4 Run a snakemake workflow
      • 11.5.5 Summing things up, round 1
    • 11.6 Configuring your instance differently.
    • 11.7 Shutting down instances
    • 11.8 Exercise
    • 11.9 Checklist of things you learned today!
      • 11.9.1 Additional Resources
    • 11.10 FAQs
      • 11.10.1 What are my data transfer costs?
      • 11.10.2 What are data storage costs?
      • 11.10.3 What are the advantages of using AWS over an academic HPC?
      • 11.10.4 Can you set up multiple instances at once
      • 11.10.5 Can you launch more than one instance with the same configurations?
      • 11.10.6 Can you copy an instance or share an instance with collaborators?
    • 11.11 Concluding thoughts on the cloud
  • Appendix
    • Previous offerings
      • Videos from August, 2021
    • Workshop Protocol

Introduction to Remote Computing

Introduction to Remote Computing

C. Titus Brown, Saranya Canchi, Amanda Charbonneau, Marisa Lim, Abhijna Parigi, Pamela Reynolds, Nick Ulle, and Shannon Joslin.

2021-09-08

Overview

Introductory skills

Workshop 1: Introduction to the UNIX Command Line

Workshop 2: Creating and modifying text files on remote computers

Intermediate skills

Workshop 3: Connecting to remote computers with ssh

Workshop 4: Running programs on remote computers and retrieving the results

Workshop 5: Installing software on remote computers with conda

Workshop 6: Structuring your projects for current and future you

Workshop 7: Automating your analyses and executing long-running analyses on remote computers

Workshop 8: Keeping track of your files with version control

Advanced skills

Workshop 9: Automating your analyses with the snakemake workflow system

Workshop 10: Executing large analyses on HPC clusters with slurm

Workshop 11: Making use of on-demand “cloud” computers from Amazon Web Services

Additional materials

Please see Previous offerings if you are interested in videos.