Plotting and Programming in Python: Introduction, Variables, and DataFrames

Based on: Software Carpentry, Plotting and Programming in Python, Gapminder instructor materials: https://swcarpentry.github.io/python-novice-gapminder/instructor/index.html

Notebook format: Markdown notebook with explanatory text and Python code blocks. You can paste sections into a Jupyter Notebook, or use this as a teaching handout.

Learning Goals

By the end of this notebook, learners should be able to:

Explain what JupyterLab and Jupyter Notebooks are used for.
Create and run code cells and Markdown cells.
Store values in Python variables and use those variables in calculations.
Display output with print().
Understand common variable-related errors, especially NameError.
Use string indexing and slicing.
Import the Pandas library.
Read Gapminder CSV files into Pandas DataFrames.
Inspect DataFrame structure with .info(), .columns, .T, and .describe().
Select rows, columns, and individual values using .loc and .iloc.
Filter tabular data using Boolean conditions.

Required Setup

This lesson assumes that learners have Python 3 installed and have downloaded the Gapminder lesson data. The expected folder structure is:

project-folder/
├── data/
│   ├── gapminder_gdp_oceania.csv
│   ├── gapminder_gdp_europe.csv
│   ├── gapminder_gdp_americas.csv
│   └── gapminder_all.csv
└── notebook.ipynb

In a Jupyter Notebook, your notebook should be started from the same top-level folder that contains the data/ directory. That is why examples use paths such as:

'data/gapminder_gdp_oceania.csv'

Let’s start by using the same JupyterLite server as yesterday: ngs-docs.github.io/2026-summer-bcp-101-jupyterlite-demo/

Part 1 — Introduction to JupyterLite (Again), JupyterLab and Notebooks

1.1 The Difference between JupyterLite and JupyterLab

JupyterLite and JuypterLab are two different tools that we’ll be using throughout this workshop.

JupyterLab is the notebook workspace where we will write and run Python code. It lets us open notebooks, write notes, run code cells, view results, and work with data files all in one place.

JupyterLite is a browser-based version of Jupyter that lets us use JupyterLab without installing anything on our computer. Instead of setting up Python and Jupyter locally, we can open a link in a web browser and start working right away.

In this workshop, we will be using JupyterLab notebooks through JupyterLite. That means the notebook interface will look like JupyterLab, but it will run directly in the browser using JupyterLite.

1.2 What is JupyterLab?

JupyterLab is an interactive environment for writing, running, and documenting code. It lets you combine:

Python code
output from that code
tables
plots
explanatory text
Markdown notes

This makes it especially useful for data analysis, because you can keep the code, results, and interpretation together in one document.

A Jupyter Notebook is made of cells. The two main cell types are:

Cell type	Purpose
Code cell	Runs Python code
Markdown cell	Stores formatted notes, headings, lists, links, and explanations

1.3 Starting JupyterLab

We’ll use JupyterLite to open a JupyerLab notebook. We can use this link to directly access JupyterLite. It can also be found on our Github pages here and opened by clicking the link. From there, we’ll open a JupyterLab notebook using the first icon under Notebook.

bcp101_jupyterlab

This launches a local server and opens JupyterLab in your web browser. The browser is the interface, but the Python code runs through the Jupyter server and kernel.

1.4 Managing a Notebook

Once opened, we can see our notebook on the right, and a few buttoms on the right. The first, shaped like a folder, is our directory which is where our notebook is housed. It is currently named “Untitled.ipynb”, but we can rename it by right-clicking on the file and selecting “Rename”. Let’s rename it to something more intuitive, like:

bcp101_day2_morning_intro_python.ipynb

1.5 Command Mode vs. Edit Mode

Jupyter notebooks have two important modes:

Mode	How to enter	What it does
Command mode	Press `Esc`	Lets you manage cells
Edit mode	Press `Enter`/`Return`	Lets you type inside a cell

Useful shortcuts in Command mode:

Shortcut	Action
`a`	Add a cell above
`b`	Add a cell below
`x`	Delete selected cell
`z`	Undo last cell action
`m`	Convert cell to Markdown
`y`	Convert cell to Code
`Shift` + `Enter`	Run selected cell

1.6 Markdown Basics

Markdown is a lightweight way to format text. Try putting the following into a Markdown cell:

# My Gapminder Notebook

## Goals

- Learn basic Python syntax
- Store values in variables
- Read tabular data with Pandas
- Select rows and columns from a DataFrame

[Software Carpentry](https://software-carpentry.org)

When you run the Markdown cell, Jupyter renders it as formatted text.

Practice: Markdown Cell

Create a Markdown cell with:

A level-1 heading with the title of your notebook.
A short paragraph describing what Gapminder data contains.
A bullet list of three things you want to learn.

Part 2 — Variables and Assignment

2.1 What is a Variable?

A variable is a name that stores a value. In Python, the assignment operator = assigns the value on the right to the variable name on the left.

age = 42
first_name = 'Ahmed'

Here:

age stores the integer value 42
first_name stores the string value 'Ahmed'

There are some rules about what can and cannot be a variable. For example, Python variable names:

can contain letters, numbers, and underscores
cannot start with a number
are case-sensitive
should be meaningful

Valid examples:

age = 42
first_name = 'Ahmed'
gdp_per_capita = 10039.59
sample_count = 12

Invalid or poor examples:

# Invalid: starts with a number
1st_name = 'Ahmed'

# Valid, but not descriptive
x = 42

# Valid, but confusing because capitalization matters. These are two different variables. 
Age = 42
age = 55

2.2 Variable Types

In Python, variables can store different kinds of information. The kind of information stored in a variable is called its type.

For example, a variable can store a number, a word, a sentence, or a true/false value. Python uses the value assigned to the variable to figure out what type it is.

Some common variable types are:

Type	Name	Description	Example
`int`	integer	A whole number	`year = 2024`
`float`	float	A number with a decimal point	`life_expectancy = 72.5`
`str`	string	Any amount of text made up of characters, such as letters, numbers, punctuation, or spaces	`country = "Canada"`, `long_sentence = "This is also a string."`
`bool`	boolean	A true or false value	`is_large = True`

Each variable stores a different type of information. The variable year stores a whole number, life_expectancy stores a decimal number, country and long_sentence store text, and is_large stores a true/false value.

We can check the type of a variable using the type() function:

type(country)

This will show us that country is a string, or str, because it stores text. Importantly, if we assign a number in quotations, then it becomes a string and not an integer, or int, as shown below.

is_number = 1
type(is_number)

is_string = "1"
type(is_string)

Understanding variable types is useful because Python treats different types of information in different ways. For example, Python can do math with numbers, but text values need to be handled differently.

2.3 Displaying Values with `print()`

The print() function displays values as text.

age = 42
first_name = 'Ahmed'

print(first_name, 'is', age, 'years old.')

Expected output:

Ahmed is 42 years old

print() automatically adds spaces between comma-separated items and moves to a new line at the end.

Storing Strings with Other Variables

In the printed statement Ahmed is 42 years old, we used several variables alongside some text. But what if we wanted to store this in another variable? If we tried to put this in a variable, such as:

variable = first_name, 'is', age, 'years old.'

It would return this as a list of different elements instead of a single printed statement.

(first_name, 'is', age, 'years old.')

If we wanted to keep these as one string, we can use a f-string which lets us call the value that the variable is storing instead of the variable itself. It is denoted with a particular syntax as shown here:

f"{first_name} is {age} years old."

2.4 Variables Must Exist Before Use

If you try to use a variable before assigning it, Python raises a NameError.

print(last_name)

Expected error:

NameError: name 'last_name' is not defined

This often means one of two things:

The variable was never created.
The variable name was misspelled.

Important Jupyter Note: Execution Order Matters

In a notebook, Python remembers cells that have already been run. The order in which cells are executed matters more than the order in which they appear.

For example, this cell will fail if run first:

print(myval)

But this cell creates the variable:

myval = 1

If you run the assignment cell first and then run print(myval), it works. To check that a notebook runs cleanly from top to bottom, use the icon showing two arrows. If you have over it, it says “Restart the kernal and run all cells”. This ensures everything is ran in the correct order.

2.5 Updating Variables

Variables can be used in calculations.

age = 42
age = age + 3
print('Age in three years:', age)

Expected output:

Age in three years: 45

This line:

age = age + 3

means: take the current value of age, add 3, and assign the result back to age.

2.6 Creating a List

What if we have multiple values? Could we assign them to a single variable? We can using a list. Lists are useful when we want to keep related pieces of information together.

For example, we could store several country names in one list:

countries = ["Canada", "Mexico", "United States", "Brazil"]

This list contains four values. Each value in a list is called an element.

Lists are written using square brackets [], and each element is separated by a comma. The elements are the same types we just discussed (integer, float, string, boolean), and can be different from each other in the same list.

lucky_items = ["rabbit's foot", "four-leaf clover", 42]

However, most of the time we use lists to store values that are related to each other so it’s good practice to keep them as the same type.

2.7 Accessing Elements Through Indices

Each element in a list has a position, called an index.

In Python, counting starts at 0, not 1. This means the first element is at index 0, the second element is at index 1, and so on.

countries = ["Canada", "Mexico", "United States", "Brazil"]

countries[0]

This gives us:

"Canada"

We can use a different index to get a different element:

countries[2]

This gives us:

"United States"

Even though "United States" is the third item in the list, it has index 2 because Python starts counting at 0.

Negative Indexing

Python can also count from the end of a list using negative numbers.

countries[-1]

This gives us the last element:

"Brazil"

We can also get the second-to-last element:

countries[-2]

This gives us:

"United States"

Negative indexing is useful when we want to access values at the end of a list without knowing exactly how long the list is.

2.9 Adding and Removing Elements from a List

After we create a list, we can change it by adding or removing elements.

For these examples, let’s continue with our list of countries:

countries = ["Canada", "Mexico", "United States", "Brazil"]

Adding an Element

We can add a new element to the end of a list using .append().

countries.append("China")

Now the list contains five countries:

["Canada", "Mexico", "United States", "Brazil", "China"]

Note that the .append() method changes the original list. It does not create a new list.

Removing an Element

We can remove an element from a list using .remove().

countries.remove("Mexico")

Now the list no longer contains "Mexico":

countries

["Canada", "United States", "Brazil", "China"]

When using .remove(), we give Python the value that we want to remove. Notice that we didn’t have to specify the location, but we can by using that element’s index and .pop(). This would give the same result:

countries.pop(1)

2.10 Operations on Lists Using Functions

After we created our list, we often want to perform some calculation or modify the values somehow. Fortunately, we can easily do this with functions. These are the special commands we’ve been calling to do each operation. Here, we’ll explore some available for lists, but a bigger list is available here.

We will start with a list of life expectancy values:

life_expectancies = [72.5, 80.1, 67.9, 75.4]

This list contains four numbers of the data type float. Each number is one element in the list.

Get Length of List Using `len()`

We can use len() to count how many elements are in a list.

len(life_expectancies)

This gives us:

This list has four values.

Add Values Together Using `sum()`

We can use sum() to add all of the values in a list.

sum(life_expectancies)

This gives us:

295.9

This is the total of all values in the list.

Find the Smallest and Largest Values

We can use min() to find the smallest value in a list.

min(life_expectancies)

This gives us:

67.9

The smallest life expectancy value in the list is 67.9.

Similarly, we can use max() to find the largest value in a list.

max(life_expectancies)

This gives us:

80.1

This is the largest value in the list.

2.11 Additional Operations on Lists

Even if the function doesn’t exist, we can easily perform the calculation by combining existing functions with another operation. For example, Python does not have a built-in average() function, but we can calculate an average using sum() and len().

average_life_expectancy = sum(life_expectancies) / len(life_expectancies)
average_life_expectancy

This gives us:

73.975

This calculation adds all values together and then divides by the number of values.

Modifying Each Value in a List Using `for` Loops

Sometimes we want to do the same calculation to every value in a list or modify it in a particular way. If there isn’t an existing function, we can perform this using a for loop, which loops through each element in our list and performs the same operation.

For example, we might print every element one at a time. To do this, we’ll need to follow a basic syntax to setup our for loop.

First, we need a temporary variable to represent each element in our for loop. Because we can’t perform the operation on the entire list, we’ll need to call each element one at a time. Here, we call this temporary variable value, and specify that we want every element in our list using:

for value in life_expectancies

Next, we need to perform the operation to every element. It uses the same structure as our other operations, but we have to indent the operation to specify that the operation is occuring within the for loop. We also add a colon to the first line.

for value in life_expectancies:
    print("I will live", value, "years.")

This should print:

I will live 72.5 years.
I will live 80.1 years.
I will live 67.9 years.
I will live 75.4 years.

Suppose we wanted to keep each of these print statements as another list. To do that, we’ll need to create an empty list, and append each element to it. We’ll also use the f-string to reference the value that variable is representing, and not just the variable itself.

life_expectancies_sentence = []

for value in life_expectancies:
    life_expectancies_sentence.append(f"I will live {value} years.")

If we looked out our new list:

life_expectancies_sentence

We would see it now contains our modified strings.

['I will live 72.5 years.',
 'I will live 80.1 years.',
 'I will live 67.9 years.',
 'I will live 75.4 years.']

The original list did not change. We created a new list with the results.

2.12 Practice: Variables

Woah, we covered a lot!

Let’s try a few exercises and see how well we do.

Exercise 1 — Create and Print Variables

Create variables for:

your first name
your favorite organism
the number of samples in a small dataset

Then print a sentence using all three variables.

Exercise 2 — Working with Lists

Given this list of country names:

countries = ["Canada", "Mexico", "United States", "Brazil"]

Predict the result of each expression:

countries[0]
countries[2]
countries[-1]
countries[0:2]
countries[2:]
countries[:]

Part 3 — Reading Tabular Data into Pandas DataFrames

3.1 What is Pandas?

Pandas is a Python library for working with tabular data. It is especially useful for CSV files, spreadsheets, and datasets where observations are arranged in rows and variables are arranged in columns.

The standard import is:

import pandas as pd

The alias pd is a common convention that makes Pandas commands shorter to type.

3.2 What is a DataFrame?

A DataFrame is a two-dimensional table.

Rows usually represent observations.
Columns usually represent variables.
Each column has a name.
Different columns can contain different data types.

For Gapminder GDP data:

each row is a country
each column is GDP per capita for a particular year

3.3 Discrete and Continuous Data

When we work with data, it is helpful to think about what kind of information each column contains.

Discrete data are values that represent separate categories or countable groups. In the Gapminder dataset, country is discrete because each value is a separate country name, such as Albania or Denmark. The continent column in gapminder_all.csv is also discrete because each value belongs to a category, such as Europe, Asia, or Americas.

Continuous data are numeric values that can be measured on a scale. In the Gapminder dataset, columns such as gdpPercap_1952 or gdpPercap_2007 are continuous because GDP per capita can take many possible numeric values, including values with decimals.

A simple way to remember the difference is:

Data type	Meaning	Gapminder example
Discrete	Separate categories or countable values	`country`, `continent`
Continuous	Measured numeric values	`gdpPercap_1952`, `gdpPercap_2007`

This distinction matters because we often summarize and plot discrete and continuous data in different ways.

3.4 Read a CSV File

Read the Oceania GDP data:

import pandas as pd

# Read the CSV file into a DataFrame
data_oceania = pd.read_csv('data/gapminder_gdp_oceania.csv')

# Display the DataFrame
print(data_oceania)

This creates a DataFrame variable named data_oceania.

3.5 File Paths and Common Errors

The path:

data/gapminder_gdp_oceania.csv

means:

Start in the current working directory.
Go into the data/ folder.
Open the file named gapminder_gdp_oceania.csv.

If the file path is wrong, you may see:

FileNotFoundError: [Errno 2] No such file or directory

To debug this:

import os
print(os.getcwd())
print(os.listdir())
print(os.listdir('data'))

3.6 Use `index_col` for Row Labels

By default, Pandas gives rows numeric labels: 0, 1, 2, and so on. These are called indexes, and are unique for each row. However, numbers aren’t very informative to describing what’s in each row, so we should change the index. For this data, it is more useful to use the country column as the row index. Although you can do for any column, it’s good practice to use indexes that are unique.

data_oceania_country = pd.read_csv(
    'data/gapminder_gdp_oceania.csv',
    index_col='country'
)

print(data_oceania_country)

Now the country names are row labels instead of regular values in a column.

3.7 Inspect a DataFrame with `.info()`

Use .info() to summarize the structure of a DataFrame.

data_oceania_country.info()

This tells you:

the object type
the number of rows
the number of columns
the column names
how many non-null values each column contains
the data type of each column
memory usage

3.8 Inspect Column Names with `.columns`

The .columns attribute stores the column labels.

print(data_oceania_country.columns)

Notice that .columns does not use parentheses. It is an attribute, not a function.

Compare:

# Attribute: no parentheses
print(data_oceania_country.columns)

# Function: uses parentheses
print(data_oceania_country.describe())

3.9 Summary Statistics with `.describe()`

Use .describe() to calculate summary statistics for numerical columns. Notice that it handles which the different data types automatically. Neat!

data_oceania_country.describe()

This reports values such as:

count
mean
standard deviation
minimum
quartiles
maximum

3.10 Preview Rows with `.head()` and `.tail()`

Read the Americas dataset:

data_americas = pd.read_csv(
    'data/gapminder_gdp_americas.csv',
    index_col='country'
)

Show the first five rows:

data_americas.head()

Show the first three rows:

data_americas.head(n=3)

Show the last three rows:

data_americas.tail(n=3)

3.11 Writing a DataFrame to a CSV File

Use .to_csv() to save a DataFrame.

data_americas.to_csv('processed_americas.csv')

This writes the file to the directory where your notebook session is running.

3.12 Practice: Reading Data

Exercise 3 — Read Europe Data

Read the Europe Gapminder GDP data and use country as the row index.

Exercise 4 — Inspect the DataFrame

Use commands to answer:

How many rows are in the Europe dataset?
How many columns are in the Europe dataset?
What are the column names?
What data type is stored in the GDP columns?

Part 4 — Working with Pandas DataFrames

4.1 Load the Europe Dataset

import pandas as pd

data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')

Use .head() to confirm that the file loaded correctly:

data.head()

4.2 Select By Position with `.iloc`

Use .iloc when you want to select by integer position.

print(data.iloc[0, 0])

This means:

row position 0
column position 0

Because Python uses zero-based indexing, this selects the first row and first column.

4.3 Select by label with `.loc`

Use .loc when you want to select by row and column labels.

print(data.loc['Albania', 'gdpPercap_1952'])

This selects the value for Albania in 1952.

4.4 Select an Entire Row

Use : to mean “everything” along an axis.

print(data.loc['Albania', :])

This selects all columns for Albania.

The shorter version also works:

print(data.loc['Albania'])

4.5 Select an Entire Column

print(data.loc[:, 'gdpPercap_1952'])

This selects all rows for the column gdpPercap_1952.

A common shortcut for selecting a single column is:

print(data['gdpPercap_1952'])

4.6 Select Ranges of Rows and Columns

Select rows from Italy through Poland and columns from 1962 through 1972:

subset = data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972']
print(subset)

Warning: There’s an important distinction:

Selection method	Slice behavior
`.iloc`	stop position is excluded
`.loc`	stop label is included

For example:

print(data.iloc[0:2, 0:2])

includes row positions 0 and 1, and column positions 0 and 1.

But:

print(data.loc['Albania':'Belgium', 'gdpPercap_1952':'gdpPercap_1962'])

includes Belgium and gdpPercap_1962 because .loc includes the ending labels.

4.7 Calculate Statistics on a Subset

Once you select a subset, you can analyze it.

subset = data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972']

print(subset.max())
print(subset.min())
print(subset.mean())

These calculations are applied column by column.

4.8 Use Comparisons to Create Boolean Masks

A comparison such as subset > 10000 is applied to every value in the DataFrame.

subset = data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972']

print(subset > 10000)

This returns a DataFrame of True and False values.

4.9 Use a Boolean Mask to Filter Values

You can use the Boolean DataFrame as a mask.

high_gdp = subset[subset > 10000]
print(high_gdp)

Values that meet the condition are kept. Values that do not meet the condition become NaN, which means “Not a Number.” Many summary operations ignore NaN values.

high_gdp.describe()

4.10 Filter Rows Using a Column Value

Load the combined Gapminder dataset:

all_data = pd.read_csv('data/gapminder_all.csv', index_col='country')

Select only countries in the Americas:

americas = all_data[all_data['continent'] == 'Americas']
americas.head()

Break the filtering expression into pieces:

is_americas = all_data['continent'] == 'Americas'
print(is_americas.head())

americas = all_data[is_americas]
americas.head()

This is often easier for learners to understand because the Boolean condition is stored in a named variable.

4.11 Drop Rows or Columns

Drop a row by label:

americas_without_puerto_rico = americas.drop('Puerto Rico')

Drop a column by label:

americas_gdp_only = americas_without_puerto_rico.drop('continent', axis=1)

Use axis=1 when dropping a column. Rows are the default axis.

4.12 Save Results

Just like before, we’ll save this dataframe as a CSV in our local directory.

americas_gdp_only.to_csv('americas_gdp_only.csv')

4.13 Practice: DataFrame Selection

Exercise 5 — Individual Value

Find the GDP per capita of Serbia in 2007.

Exercise 6 — Column Selection

Select GDP per capita for all European countries in 1982.

Exercise 7 — Row Selection

Select GDP per capita for Denmark for all years.

Exercise 8 — Range Selection

Select GDP per capita for all countries for years after 1985.

Exercise 9 — Growth Ratio

Calculate GDP per capita in 2007 divided by GDP per capita in 1952 for each European country.

Exercise 10 — Boolean Filtering

Create a subset of European countries where GDP per capita in 2007 is greater than 30,000.

Part 5 — Resources

Common Errors

Error	Likely cause	Fix
`NameError`	Variable was not created or was misspelled	Re-run assignment cell; check spelling
`FileNotFoundError`	Wrong path or notebook started in wrong folder	Check `os.getcwd()` and `os.listdir()`
`KeyError`	Row or column label does not exist	Check `data.index` or `data.columns`
Confusing `.loc` and `.iloc`	Mixing labels with positions	Use `.loc` for names; `.iloc` for numbers
Unexpected old value	Notebook cells run out of order	Restart kernel and run all cells

Part 6 — Answer Key

Exercise 1 Example

first_name = 'Cassie'
favorite_organism = 'Ralstonia solanacearum'
sample_count = 12

print(first_name, 'is analyzing', sample_count, 'samples of', favorite_organism)

Expected output:

Cassie is analyzing 12 samples of Ralstonia solanacearum

Exercise 2 Answer

countries = ["Canada", "Mexico", "United States", "Brazil"]

print(countries[0])
print(countries[2])
print(countries[-1])
print(countries[0:2])
print(countries[2:])
print(countries[:])

Expected output:

Canada
United States
Brazil
['Canada', 'Mexico']
['United States', 'Brazil']
['Canada', 'Mexico', 'United States', 'Brazil']

Exercise 3 Answer

import pandas as pd

data_europe = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
data_europe.head()

Exercise 4 Answer

data_europe.info()
print(data_europe.columns)
print(data_europe.shape)
print(data_europe.dtypes)

Exercise 5 Answer

print(data_europe.loc['Serbia', 'gdpPercap_2007'])

Exercise 6 Answer

data_europe['gdpPercap_1982']

Exercise 7 Answer

data_europe.loc['Denmark', :]

Exercise 8 Answer

data_europe.loc[:, 'gdpPercap_1987':]

Note: the dataset uses 5-year intervals. Because there is no exact gdpPercap_1985 column, gdpPercap_1987 is the first available column after 1985.

Exercise 9 Answer

data_europe['gdpPercap_2007'] / data_europe['gdpPercap_1952']

Exercise 10 Answer

high_2007 = data_europe[data_europe['gdpPercap_2007'] > 30000]
high_2007

Source and attribution

This notebook is adapted from the Software Carpentry lesson Plotting and Programming in Python, specifically the Gapminder instructor materials for:

Summary and Schedule: https://swcarpentry.github.io/python-novice-gapminder/instructor/index.html
Running and Quitting: https://swcarpentry.github.io/python-novice-gapminder/instructor/01-run-quit.html
Variables and Assignment: https://swcarpentry.github.io/python-novice-gapminder/instructor/02-variables.html
Reading Tabular Data into DataFrames: https://swcarpentry.github.io/python-novice-gapminder/instructor/07-reading-tabular.html
Pandas DataFrames: https://swcarpentry.github.io/python-novice-gapminder/instructor/08-data-frames.html

Original Carpentries materials are licensed under CC BY 4.0 by the authors. This adapted notebook should retain attribution when reused or modified.

Plotting and Programming in Python: Introduction, Variables, and DataFrames

Learning Goals

Required Setup

Part 1 — Introduction to JupyterLite (Again), JupyterLab and Notebooks

1.1 The Difference between JupyterLite and JupyterLab

1.2 What is JupyterLab?

1.3 Starting JupyterLab

1.4 Managing a Notebook

1.5 Command Mode vs. Edit Mode

1.6 Markdown Basics

Practice: Markdown Cell

Part 2 — Variables and Assignment

2.1 What is a Variable?

2.2 Variable Types

2.3 Displaying Values with print()

Storing Strings with Other Variables

2.4 Variables Must Exist Before Use

Important Jupyter Note: Execution Order Matters

2.5 Updating Variables

2.6 Creating a List

2.7 Accessing Elements Through Indices

Negative Indexing

2.9 Adding and Removing Elements from a List

Adding an Element

Removing an Element

2.10 Operations on Lists Using Functions

Get Length of List Using len()

Add Values Together Using sum()

Find the Smallest and Largest Values

2.11 Additional Operations on Lists

Modifying Each Value in a List Using for Loops

2.12 Practice: Variables

Exercise 1 — Create and Print Variables

Exercise 2 — Working with Lists

Part 3 — Reading Tabular Data into Pandas DataFrames

3.1 What is Pandas?

3.2 What is a DataFrame?

3.3 Discrete and Continuous Data

3.4 Read a CSV File

3.5 File Paths and Common Errors

3.6 Use index_col for Row Labels

3.7 Inspect a DataFrame with .info()

3.8 Inspect Column Names with .columns

3.9 Summary Statistics with .describe()

3.10 Preview Rows with .head() and .tail()

3.11 Writing a DataFrame to a CSV File

3.12 Practice: Reading Data

Exercise 3 — Read Europe Data

Exercise 4 — Inspect the DataFrame

Part 4 — Working with Pandas DataFrames

4.1 Load the Europe Dataset

4.2 Select By Position with .iloc

4.3 Select by label with .loc

4.4 Select an Entire Row

4.5 Select an Entire Column

4.6 Select Ranges of Rows and Columns

4.7 Calculate Statistics on a Subset

4.8 Use Comparisons to Create Boolean Masks

4.9 Use a Boolean Mask to Filter Values

4.10 Filter Rows Using a Column Value

4.11 Drop Rows or Columns

4.12 Save Results

4.13 Practice: DataFrame Selection

Exercise 5 — Individual Value

Exercise 6 — Column Selection

Exercise 7 — Row Selection

Exercise 8 — Range Selection

Exercise 9 — Growth Ratio

Exercise 10 — Boolean Filtering

Part 5 — Resources

Common Errors

Part 6 — Answer Key

Exercise 1 Example

Exercise 2 Answer

Exercise 3 Answer

Exercise 4 Answer

Exercise 5 Answer

Exercise 6 Answer

Exercise 7 Answer

Exercise 8 Answer

2.3 Displaying Values with `print()`

Get Length of List Using `len()`

Add Values Together Using `sum()`

Modifying Each Value in a List Using `for` Loops

3.6 Use `index_col` for Row Labels

3.7 Inspect a DataFrame with `.info()`

3.8 Inspect Column Names with `.columns`

3.9 Summary Statistics with `.describe()`

3.10 Preview Rows with `.head()` and `.tail()`

4.2 Select By Position with `.iloc`

4.3 Select by label with `.loc`