CPSC 340/540 - Machine Learning and Data Mining (2025W2)

Lectures Sections (beginning Jan 5th): Instructors: Frank Wood and Mi Jung Park

Instructor office hours: When Dr. Park teaches she will have office hours at 2:50 (i.e. right after the second class ends, including walking back with anyone who wants to chat while walking, and then office hours will conclude in that professor's office). The hour will be 2:50-3:50. Students can go straight to the office and wait for the professor to return. Professor Park's office is ICICS X539. This schedule tells you which instructor is teaching which day (⚠ subject to change). It also has other important dates.

Tutorials (beginning Jan 12):

Starting in the second week of classes, we will have weekly tutorials run by the TAs. These often spend time on the most commonly misunderstood things from lectures over the past week, backfilling knowledge from prerequisite courses that people may be rusty on yet is helpful to be fluent in, and doing things like going through provided assignment code, reviewing background material, reviewing big concepts, and/or doing exercises. You can register for particular tutorial sections if you want to save a seat at a particular time, but note that you do not need to register in a tutorial section and you can attend whichever one you like. Attending them is optional, but recommended.

Teaching assistants:

TA Office Hours:

Frequently Answered Questions

Midterm information



Synopsis: We introduce basic principles and techniques in the fields of data mining and machine learning. These techniques are now running behind the scenes to discover patterns and make predictions in various applications in our daily lives. We will focus on many of the core data mining and machine learning technologies, with motivating applications from a variety of disciplines.

Registration: Undergraduate and graduate students from any department are welcome to take the course. Undergraduate students should enroll in CPSC 340 while graduate students should enroll in CPSC 540 (when it is offered; CPSC 540 also has an extra small project component). Below are more details on registration for each course:

Prerequisites: Students who do not meet these requirements should consider taking CPSC 330 ("Applied Machine Learning").

Textbook: There is no required textbook for the class. A introductory book that covers many (but not all) the topics we will discuss is the Artificial Intelligence book of Rusell and Norvig (AI:AMA) or the Artificial Intelligence book of Poole and Mackworth (you may need these for other classes). More advanced books include The Elements of Statistical Learning (ESL) by Hastie et al., Murphy's Machine Learning: A Probabilistic Perspective (ML:APP) which can be accessed through the library here, and Bishop's Pattern Recognition and Machine Learning (PRML). For books with a bigger focus on data mining, see Introduction to Data Mining (IDM) and Mining of Massive DataSets.

Related Courses: The most related course is CPSC 330: Applied Machine Learning. This course has fewer prerequisities and covers some of the same material, but focuses more on applications rather than understanding ML ideas in depth. A discussion on the difference between CPSC 340 and similar courses in statistics written by a former student (Geoff Roeder) is available here (this was written in 2016 so may be out of date).

Grading (tentative):

Assignments: There are a total of 6 written assignments for this course. Please follow the instructions linked here to submit your assignments.

List of topics

We will roughly cover the following topics:

Lectures, Assignments, Related Readings, and Links

Date Slides Related Readings and Links Homework and Notes
Jan 5 Motivation and Syllabus What is Machine Learning? Machine Learning
Rise of the Machines Talking Machine Episode 1
Mathematics for Machine Learning
Assignment 1 (pdf)
Assignment 1 (tex/code/data)
Jan 7 Exploratory Data Analysis Gotta Catch'em all Why Not to Trust Statistics
Visualization Types Google Chart Gallery Other Tools
Jan 9 Decision Trees A Visual Introduction to Machine Learning, Decision Trees Entropy What is Big O Notation?
AI:AMA 19.2-3, ESL: 9.2, ML:APP 16.2
Big-O Notes
Jan 12 Fundamentals of Learning 7 Steps of Machine Learning IID Cross-validation Bias-variance No Free Lunch
AI:AMA 19.4-5, ESL 7.1-7.4, 7.10, ML:APP 1.4, 6.5
Course Notation Guide
Jan 14 Probabilistic Classifiers Conditional probability (demo) Naive Bayes Probabilities and Battleship
AI:AMA 12.6, ESL 4.3, ML:APP 2.2, 3.5, 4.1-4.2

Probability Notes Probability Slides
Jan 16 Non-Parametric Models K-nearest neighbours Decision Theory for Darts Norms
AI:AMA 19.7, ESL 13.3, ML:APP 1.4
Assignment 1 due
Withdrawal deadline
Assignment 2 (pdf)
Assignment 2 (tex/code/data)
Jan 19 Ensemble Methods Ensemble Methods Random Forests Empirical Study Kinect
AI:AMA 19.8, ESL: 7.11, 8.2, 15, 16.3, ML:APP 6.2.1, 16.2.5, 16.6
Jan 21 Clustering Clustering K-means clustering (demo) K-Means++ (demo)
IDM 8.1-8.2, ESL: 14.3
Jan 23
More Clustering DBSCAN (video, demo) Hierarchical Clustering Phylogenetic Trees
IDM 8.4
Jan 26
Outlier Detection Empirical Study
IDM 8.3, ESL 14.3.12, ML:APP 25.5
Jan 28
Least Squares Linear Regression (demo, 2D data, 2D video) Least Squares Essence of Calculus Partial Derivative Gradient
ESL 3.1-2, ML:APP 7.1-3, AI:AMA 19.6
Jan 30
Nonlinear Regression Why should one learn machine learning from scratch? Essence of Linear Algebra Matrix Differentiation Fluid Simulation (video)
ESL 5.1, 6.3
Linear Algebra Notes
Linear/Quadratic Gradients

Assignment 2 due
Assignment 3 (pdf)
Assignment 3 (tex/code/data)
Feb 2
Gradient Descent Gradient Descent Convex Functions
Feb 4
Robust Regression ML:APP 7.4
Feb 6
Feature Selection Genome-Wide Association Studies AIC, BIC
ESL 3.3 , 7.5-7
Feb 9
Regularization ESL 3.4., ML:APP 7.5, AI:AMA 19.4
Feb 11
More Regularization RBF video RBF and Regularization video
ESL 6.7, ML:APP 13.3-4
Feb 13
Linear Classifiers Perceptron
ESL 4.5, ML:APP 8.5
Assignment 3 due
Feb 16
No Class: Midterm Break
Feb 18
No Class: Midterm Break
Feb 20
No Class: Midterm Break
Feb 23
More Linear Classifiers Support Vector Machines
ESL 4.4, 12.1-2, ML:APP 8.1-3, 9.5 14.5, AI:AMA 19.6
Feb 25
Feature Engineering Gmail Priority Inbox
Feb 27
Kernel Trick ESL 12.3, ML:APP 14.1-4 Assignment 4 (pdf)
Assignment 4 (tex/code/data)
Mar 2
Stochastic Gradient Stochastic Gradient Descent, Theory and Practice
ML:APP 8.5
Mar 4
Boosting, Start of MLE AdaBoost (video) XGBoost (video)
ML:APP 16.4
Mar 6
MLE and MAP Maximum Likelihood Estimation
ML:APP 9.3-4
Mar 9
PCA Principal Component Analysis
ESL 14.5, IDM B.1, ML:APP 12.2
Mar 11
More PCA Making Sense of PCA SVD Eigenfaces Max and Argmax Notes
Mar 13
Sparse Matrix Factorization Non-Negative Matrix Factorization (original - access from UBC)
ESL 14.6, ML: APP 13.8
Assignment 4 due
Assignment 5 (pdf)
Assignment 5 (tex/code/data)
Mar 16
Recommender Systems & MDS Recommender Systems Netflix Prize
Mar 18
Neural Networks Google Video What is a Neural Network? Interactive Guide
ML:APP 16.5, ESL 11.1-4, AI:AMA 21.1
Mar 20
Guest Lecture
Mar 23
Deep Learning Fortune Article Deep Learning References Alchemy
ML:APP 28.3, ESL 11.5, AI:AMA 21.2 and 21.4-5
Mar 25
Deep Learning But what is a convolution?
Mar 27
Convolutions Convolutional Neural Networks
ML:APP 28.4, ESL 11.7, AI:AMA 21.3
Assignment 5 due
Assignment 6 (pdf)
Assignment 6 (tex/code/data)
Mar 30
Guest Lecture
Apr 1
CNNs and RNNs
Apr 3
No Class: Easter Friday
Apr 6
No Class: Easter Monday
Apr 8
LSTMs, Attention, and Transformers
Apr 10
Generative Sampling + Conclusion Assignment 6 due
Apr 25
Final

Frank's Recorded Lectures

During the COVID-19 pandemic, Frank recorded video lectures for the course. These videos are available here. Note that these videos may not exactly match the current semester's course, however in most cases the content will be very similar if not identical.

Mike's Demos

In semesters where Mike Gelbart taught the course, he included Jupyter notebooks associated with most lectures. These notebooks are available here (note that the lecture numbers may not exactly match the current semester's course).

Future Homeworks and Lectures

It is possible to search for and potentially find future homeworks and lectures from previous runnings of the course, however, be aware that both of these can change from year to year and that you are responsible for this year's materials.

Related courses that have online notes