Introduction to programming in Python
February 14, 2023, 3:00-4:20pm EST
Presented by: Alex Razoumov
Duration: 80 minutes
Description: In this very short (1h20m) introduction to Python, we will cover the basics of running Python inside Jupyter notebooks, basic use of variables, lists, dictionaries, talk about defining functions and using external libraries. We will provide Python in the cloud, so there is no need to install it on your own computer for this session.
Register here
Le même séminaire en français.
Biographies
Alex Razoumov: Alex earned his PhD in computational astrophysics from the University of British Columbia and held postdoctoral positions in Urbana–Champaign, San Diego, Oak Ridge, and Halifax. He has worked on numerical models ranging from galaxy formation to core-collapse supernovae and stellar hydrodynamics, and has developed a number of computational fluid dynamics and radiative transfer codes and techniques. He spent five years as HPC Analyst in SHARCNET helping researchers from diverse backgrounds to use large clusters, and in 2014 moved back to Vancouver to focus on scientific visualization and training researchers to use advanced computing tools. He is now with Simon Fraser University.
Mohamed Jabir: Détenant une maîtrise en intelligence d’affaire, Mohamed a travaillé pendant plus de 20 ans dans un laboratoire de recherche où il offrait son support aux professeurs et étudiants de 2ème cycle. Il a rejoint récemment Calcul Québec où il fait partie des analystes de support aux utilisateurs des ressources de calcul.
Course notes
Disclaimer: These notes started few years ago from the official SWC lesson but then evolved quite a bit to include other topics.
Why Python?
Python is a free, open-source programming language developed in the 1980s and 90s that became really popular for scientific computing in the past 15 years.
Python vs. Excel
- Unlike Excel, Python can essentially read any type of data, both structured and unstructured.
- Python is free and open-source.
- Data manipulation is much easier in Python. There are thousands of data processing, machine learning, and visualization libraries.
- Python can handle much larger amounts of data: limited not by Python, but by your available computing resources. In addition, Python can run at scale on larger systems.
- Python is more reproducible (rerun / modify the script).
- Python can run on any platform (Windows, Mac, Linux).
Python vs. other programming languages
Python pros | Python cons |
---|---|
elegant scripting language | slow (interpreted, dynamically typed) |
powerful, compact constructs for many tasks | |
very popular across all fields | |
huge number of external libraries |
Starting Python
There are many ways to run Python commands:
- from a Unix shell you can start a Python shell and type commands there,
- you can launch Python scripts saved in plain text *.py files,
- you can execute Python cells inside Jupyter notebooks; the code is stored inside JSON files, displayed as HTML
Today we will be using a Jupyter notebook at https://jupyter.pyten.calculquebec.cloud (English) or https://jupyter.pytfr.calculquebec.cloud (French).
- we will distribute the usernames and password now
- please login with your unique username
- start a new Python 3 notebook
Local option for more advanced users: if you have Python + Jupyter installed locally on your computer, and
you know what you are doing, you can start a Jupyter notebook locally from your shell by typing jupyter notebook
.
Navigating Jupyter interface
- File | Save As - to rename your notebook
- File | Download - download the notebook to your computer
- File | New Launcher - to open a new launcher dashboard, e.g. to start a terminal
- File | Logout - to terminate your job (everything is running inside a Slurm job!)
Explain: tab completion, annotating code, displaying figures inside the notebook.
- Esc - leave the cell (border changes colour) to the control mode
- A - insert a cell above the current cell
- B - insert a cell below the current cell
- X - delete the current cell
- M - turn the current cell into the markdown cell
- H - to display help
- Enter - re-enter the cell (border becomes green) from the control mode
- can enter Latex equations in a markdown cell, e.g. $int_0^\infty f(x)dx$
print(1/2) # to run all commands in the cell, either use the Run button, or press shift+return
Variables and Assignment
- Python is a dynamically typed language: all variables have types, but types can change on the fly
- possible names for variables
- don’t use built-in function names for variables, e.g. declaring
sum
will prevent you from using sum(), same forprint
- don’t use built-in function names for variables, e.g. declaring
- Python is case-sensitive
age = 100
name = 'Jason'
print(name, 'is', age, 'years old')
a = 1; b = 2 # can use ; to separate multiple commands in one line
a, b = 1, 2 # assign variables in a tuple notation; same as last line
a = b = 10 # assign a value to multiple variables at the same time
b = "now I am a string" # variables can change their type on the fly
- variables persist between cells
- variables must be defined before use
- variables can be used in calculations
age = age + 3 # another syntax: age += 3
print('age in three years:', age)
Question 1
What is the final value of position
in the program below? (Try to predict the value without running the program, then
check your prediction.)
initial = 1
position = initial
initial = 2
print(position)
With simple variables in Python, assigning var2 = var1
will create a new object in memory var2
. Here we have two
distinct objects in memory: initial
and position
.
Note: With more complex objects, its name could be a pointer. E.g. when we study lists, we’ll see that
initial
andnew
below really point to the same list in memory:initial = [1,2,3] new = initial # create a pointer to the same object initial.append(4) # change the original list to [1, 2, 3, 4] print(new) # [1, 2, 3, 4] new = initial[:] # one way to create a new object in memory import copy new = copy.deepcopy(initial) # another way to create a new object in memory
Use square brackets to get a substring:
element = 'helium'
print(element[0]) # single character
print(element[0:3]) # a substring
Question 2
If you assigna=123
, what happens if you try to get the second digit of a
?
- Python is case-sensitive
- use meaningful variable names
Data Types and Type Conversion
print(type(52))
print(type(52.))
print(type('52'))
print(name+' Smith') # can add strings
print(name*10) # can replicate strings by mutliplying by a number
print(len(name)) # strings have lengths
print(1+'a') # cannot add strings and numbers
print(str(1)+'a') # this works
print(1+int('2')) # this works
Builtin libraries
- Python comes with a number of built-in functions
- a function may take zero or more arguments
print('hello')
print()
print(max(1,2,3,10))
print(min(5,2,10))
print(min('a', 'A', '0')) # works with characters, the order is (0-9, A-Z, a-z)
print(max(1, 'a')) # can't compare these
round(3.712) # to the nearest integer
round(3.712, 1) # can specify the number of decimal places
help(round)
round? # Jupyter Notebook's additional syntax
- every function returns something, whether it is a variable or None
result = print('example')
print('result of print is', result) # what happened here? Answer: print returns None
Conditionals
Python implements conditionals via if
, elif
(short for “else if”) and else
. Use an if
statement to
control whether some block of code is executed or not. Let’s consider the boundary between the Antiquity and
the Middle Ages:
year = 830
if year > 476:
print('year', year, 'falls into the medieval era')
Let’s modify the year:
year = 205
if year > 476:
print('year', year, 'falls into the medieval era')
Add an else
statement:
year = 205
if year > 476:
print('year', year, 'falls into the medieval era')
else:
print('year', year, 'falls into the classical antiquity period')
Add an elif
statement:
x = 5
if x > 0:
print(x, 'is positive')
elif x < 0:
print(x, 'is negative')
else:
print(x, 'is zero')
What is the problem with the following code?
grade = 85
if grade >= 70:
print('grade is C')
elif grade >= 80:
print('grade is B')
elif grade >= 90:
print('grade is A')
Lists
A list stores many values in a single structure.
T = [27.3, 27.5, 27.7, 27.5, 27.6] # array of temperature measurements
print('temperature:', T)
print('length:', len(T))
print('zeroth item of T is', T[0])
print('fourth item of T is', T[4])
T[0] = 21.3
print('temperature is now:', T)
primes = [2, 3, 5]
print('primes is initially', primes)
primes.append(7) # append at the end
primes.append(11)
print('primes has become', primes)
print('primes before', primes)
primes.pop(4) # remove element #4
print('primes after', primes)
primes.remove(2) # remove first value 2
a = [] # start with an empty list
a.append('Vancouver')
a.append('Toronto')
a.append('Kelowna')
print(a)
a[99] # will give an error message (past the end of the array)
a[-1] # display the last element; what's the other way?
a[:] # will display all elements
a[1:] # starting from #1
a[:1] # ending with but not including #1
Lists can be heterogeneous and nested:
a = [11, 21, 31]
b = ['Mercury', 'Venus', 'Earth']
c = 'hello'
nestedList = [a, b, c]
print(nestedList)
You can search inside a list:
'Venus' in b # returns True
'Mars' in b # returns False
b.index('Venus') # returns 1 (position index)
And you sort lists alphabetically:
b.sort()
b # returns ['Earth', 'Mercury', 'Venus']
To delete an item from a list:
b.pop(2) # you can use its index
b.remove('Earth') # or you can use its value
Question 2b
Write a script to find the second largest number in the list [77,9,23,67,73,21].Loops
For loops are very common in Python and are similar to for in other languages, but one nice twist with Python is that you can iterate over any collection, e.g., a list, a character string, etc.
for number in [2, 3, 5]: # number is the loop variable; [...] is a collection
print(number) # Python uses indentation to show the body of the loop
This is equivalent to:
print(2)
print(3)
print(5)
What will this do:
for number in [2, 3, 5]:
print(number)
print(number)
- the loop variable could be called anything
- the body of a loop can contain many statements
- use range to iterate over a sequence of numbers
for i in 'hello':
print(i)
for i in range(0,3):
print(i)
Let’s sum numbers 1 to 10:
total = 0
for number in range(10):
total = total + (number + 1) # what's the other way to sum numbers 1 to 10? how about range(1,11)?
print(total)
Question 3a
Write a Python code to revert a string, e.g. ‘computer’ should become ‘retupmoc’.Question 3b
Print a difference between two lists, e.g. [1, 2, 3, 4] and [1, 2, 5].Question 3c
Write a script to get the frequency of the elements in the lista = [77, 9, 23, 67, 73, 21, 23, 9]
. You can google
this problem :)
While loops
Since we talk about loops, we should also briefly mention while loops, e.g.
x = 2
while x > 1.:
x /= 1.1
print(x)
More on lists in loops
You can also form a zip object of tuples from two lists of the same length:
for i, j in zip(a,b):
print(i,j)
And you can create an enumerate object from a list:
for i, j in enumerate(b): # creates a list of tuples with an iterator as the first element
print(i,j)
List comprehensions
It’s a compact way to create new lists based on existing lists/collections. Let’s list squares of numbers from 1 to 10:
[x**2 for x in range(1,11)]
Of these, list only odd squares:
[x**2 for x in range(1,11) if x%2==1]
You can also use list comprehensions to combine information from two or more lists:
week = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
weekend = ['Sat', 'Sun']
print([day for day in week]) # the entire week
print([day for day in week if day not in weekend]) # only the weekdays
print([day for day in week if day in weekend]) # in both lists
The syntax is:
[something(i) for i in list1 if i [not] in list2 if i [not] in list3 ...]
Question 4a
Write a one-line code to sum up the squares of numbers from 1 to 100.Question 4b
Write a script to build a list of words that are shorter thann
from a given list of words
['red', 'green', 'white', 'black', 'pink', 'yellow']
.
Dictionaries
Lists in Python are ordered sets of objects that you access via their position/index. Dictionaries are unordered sets in which the objects are accessed via their keys. In other words, dictionaries are unordered key-value pairs.
favs = {'mary': 'orange', 'john': 'green', 'eric': 'blue'}
favs
favs['john'] # returns 'green'
favs['mary'] # returns 'orange'
list(favs.values()).index('blue') # will return the index of the first value 'blue'
for key in favs:
print(key) # will print the names (keys)
print(favs[key]) # will print the colours (values)
for k in favs.keys():
print(k, favs[k]) # the same as above
for v in favs.values():
print(v) # cycle through the values
for i, j in favs.items():
print(i,j) # both the names and the colours
Now let’s see how to add items to a dictionary:
concepts = {}
concepts['list'] = 'an ordered collection of values'
concepts['dictionary'] = 'a collection of key-value pairs'
concepts
Let’s modify values:
concepts['list'] = 'simple: ' + concepts['list']
concepts['dictionary'] = 'complex: ' + concepts['dictionary']
concepts
Deleting dictionary items:
concepts.pop('list') # remove the key 'list' and its value
Values can also be numerical:
grades = {}
grades['mary'] = 5
grades['john'] = 4.5
grades
And so can be the keys:
grades[1] = 2
grades
Sorting dictionary items:
favs = {'mary': 'orange', 'john': 'green', 'eric': 'blue', 'jane': 'orange'}
sorted(favs) # returns the sorted list of keys
sorted(favs.keys()) # the same
for k in sorted(favs):
print(k, favs[k]) # full dictionary sorted by key
sorted(favs.values()) # returns the sorted list of values
Question 4c
Write a script to print the full dictionary sorted by the value.
Hint: create a list comprehension looping through all (key,value) pairs and then try sorting the result.
Similar to list comprehensions, we can form a dictionary comprehension:
{k:'.'*k for k in range(10)}
{k:v*2 for (k,v) in zip(range(10),range(10))}
{j:c for j,c in enumerate('computer')}
Functions
- functions encapsulate complexity so that we can treat it as a single thing
- functions enable re-use: write one time, use many times
First define:
def greeting():
print('Hello!')
and then we can run it:
greeting()
def printDate(year, month, day):
joined = str(year) + '/' + str(month) + '/' + str(day)
print(joined)
printDate(1871, 3, 19)
Every function returns something, even if it’s None.
a = printDate(1871, 3, 19)
print(a)
How do we actually return a value from a function?
def average(values): # the argument is a list
if len(values) == 0:
return None
return sum(values) / len(values)
print('average of actual values:', average([1, 3, 4]))
Here is an example of a more complex calendar function returning an alphabetical day of the week:
def dayOfTheWeek(year, month, day):
import datetime
week = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
return week[datetime.datetime(year, month, day).weekday()]
dayOfTheWeek(2022, 11, 10) # 'Thu'
Question 5
Write a function to convert from Fahrenheit to Celsius, e.g. typingcelsius(77)
would produce 25.
Question 6
Write a function to convert from Celsius to Fahrenheit. Test it with celcius(), e.g. by converting Fahrenheit → Celsius → Fahrenheit, or Celsius → Fahrenheit → Celsius.Question 7
Now modify celsius() to take a list of Fahrenheit temperatures, e.g.,celcius([70,80,90,100])
, to return a list of
Celsius temperatures.
Function arguments in Python can take default values becoming optional:
def addNumber(a, b=1):
return a+b
print(addNumber(5))
print(addNumber(5,3))
With several optional arguments it is important to be able to differentiate them:
def modify(a, b=1, coef=1):
return a*coef + b
print(modify(10))
print(modify(10, 1)) # which argument did we add?
print(modify(10, coef=2))
print(modify(10, coef=2, b=5))
Any complex python function will have many optional arguments, for example:
?print
Variable scope
The scope of a variable is the part of a program that can see that variable.
a = 5
def adjust(b):
sum = a + b
return sum
adjust(10) # what will be the outcome?
a
is the global variable ⇨ visible everywhereb
andsum
are local variables ⇨ visible only inside the function
Inside a function we can access methods of global variables:
a = []
def add():
a.append(5) # modify global `a`
add()
print(a) # [5]
However, from a local scope we cannot assign to a global variable directly:
a = []
def add():
a = [1,2,3] # this will create a local copy of `a` inside the function
print(a) # [1,2,3]
add()
print(a) # []
If we have time
How would you explain the following:
1 + 2 == 3 # returns True (makes sense!)
0.1 + 0.2 == 0.3 # returns False -- be aware of this when you use conditionals
abs(0.1+0.2 - 0.3) < 1.e-8 # compare floats for almost equality
import numpy as np
np.isclose(0.1+0.2, 0.3, atol=1e-8)
Libraries
Most of the power of a programming language is in its libraries. This is especially true for Python which is an interpreted language and is therefore very slow (compared to compiled languages). However, the libraries are often compiled (can be written in compiled languages such as C/C++) and therefore offer much faster performance than native Python code.
A library is a collection of functions that can be used by other programs. Python’s standard library includes many functions we worked with before (print, int, round, …) and is included with Python. There are many other additional modules in the standard library such as math:
print('pi is', pi)
import math
print('pi is', math.pi)
You can also import math’s items directly:
from math import pi, sin
print('pi is', pi)
sin(pi/6)
cos(pi)
help(math) # help for libraries works just like help for functions
from math import *
You can also create an alias from the library:
import math as m
print m.pi
Question 8
What function from the math library can you use to calculate a square root without usingsqrt
?
Question 9
You want to select a random character from the stringbases='ACTTGCTTGAC'
. What standard library would you most expect
to help? Which function would you select from that library? Are there alternatives?
Question 10
A colleague of yours typeshelp(math)
and gets an error: NameError: name 'math' is not defined
. What has your
colleague forgotten to do?
Question 11
Convert the angle 0.3 rad to degrees using the math library.Virtual environments and packaging
To install a package into the current Python environment from inside a Jupyter notebook, simply do (you will probably need to restart the kernel before you can use the package):
%pip install packageName # e.g. try bson
In Python you can create an isolated environment for each project, into which all of its dependencies will be installed. This could be useful if your several projects have very different sets of dependencies. On the computer running your Jupyter notebooks, open the terminal and type:
(Important: on a cluster you must do this on the login node, not inside the JupyterLab terminal.)
module load python/3.9.6 # specific to HPC clusters
pip install virtualenv
virtualenv --no-download climate # create a new virtual environment in your current directory
source climate/bin/activate
which python && which pip
pip install --no-index netcdf4 ...
pip install --no-index ipykernel # install ipykernel (IPython kernel for Jupyter) into this environment
python -m ipykernel install --user --name=climate --display-name "My climate project" # add your environment to Jupyter
...
deactivate
Quit all your currently running Jupyter notebooks and the Jupyter dashboard. If running on syzygy.ca, logout from your session and then log back in.
Whether running locally or on syzygy.ca, open the notebook dashboard, and one of the options in New
below Python 3
should be climate
.
To delete the environment, in the terminal type:
jupyter kernelspec list # `climate` should be one of them
jupyter kernelspec uninstall climate # remove your environment from Jupyter
/bin/rm -rf climate
Quick overview of some of the libraries
pandas
is a library for working with 2D tables / spreadsheetsnumpy
is a library for working with large, multi-dimensional arrays, along with a large collection of linear algebra functions- provides missing uniform collections (arrays) in Python, along with a large number of ways to quickly process these collections ⮕ great for speeding up calculations in Python
matplotlib
andplotly
are two plotting packages for Pythonscikit-image
is a collection of algorithms for image processingxarray
is a library for working with labelled multi-dimensional arrays and datasets in Python- “
pandas
for multi-dimensional arrays” - great for large scientific datasets; writes into NetCDF files
- “
Coming up: 2-part Scientific Python course on Feb-23, Mar-02 (not part of the Winter HSS Series) will cover most of these libraries.
Pandas quick example
Let’s try reading some public-domain data about Jeopardy games with pandas
:
import pandas as pd
data = pd.read_csv("https://raw.githubusercontent.com/razoumov/publish/master/jeopardy.csv")
data.shape # 216930 rows, 7 columns
data.head(10) # first 10 rows
data.columns # names of the columns
data.loc[data['Category']=='HISTORY'].shape # 349 matches
data.loc[data['Category']=='HISTORY'].to_csv("history.csv") # write to a file