CPSC 340/540 - Machine Learning and Data Mining (2025W2)
Lectures Sections (beginning Jan 5th):
- 12pm - 1pm (Monday/Wednesday/Friday in DMP 310)
- 2pm - 3pm (Monday/Wednesday/Friday in DMP 310)
Instructors: Frank Wood and Mi Jung Park
Instructor office hours: When Dr. Park teaches she will have office hours at 2:50 (i.e. right after the second class ends, including walking back with anyone who wants to chat while walking, and then office hours will conclude in that professor's office). The hour will be 2:50-3:50. Students can go straight to the office and wait for the professor to return. Professor Park's office is ICICS X539. This schedule tells you which instructor is teaching which day (⚠ subject to change). It also has other important dates.
Tutorials (beginning Jan 12):
Starting in the second week of classes, we will have weekly tutorials run by the TAs. These often spend time on the most commonly misunderstood things from lectures over the past week, backfilling knowledge from prerequisite courses that people may be rusty on yet is helpful to be fluent in, and doing things like going through provided assignment code, reviewing background material, reviewing big concepts, and/or doing exercises. You can register for particular tutorial sections if you want to save a seat at a particular time, but note that you do not need to register in a tutorial section and you can attend whichever one you like. Attending them is optional, but recommended.
- T2B: Mon, 3-4pm - Shenran Wang
- T2F: Mon, 4-5pm - Frederick Shpilevskiy
- T2G: Tues, 12-1pm - Pierre Lardet
- T2H: Tues, 1-2pm - George Cao
- T2C: Tues, 2-3pm - Andrew Ahn
- T2E: Wed, 4-5pm - George Cao
- T2D: Wed, 5-6pm - Nima Norouzi
- T2A: Fri, 4-5pm - Minuk Ma
Teaching assistants:
TA Office Hours:
- Minuk Ma: Monday, 1-2pm (in-person). ICCS x150 - DLC Table 1
- Nima Norouzi: Monday, 6-7pm (online). Zoom link
- Shenren Wang: Tuesday, 2-3pm (in-person and online). ICCS X139, Zoom link
- Pierre Lardet: Tuesday, 4-5pm (in-person). ICCS x150 - DLC Table 5
- Fengzhe Shi: Wednesday, 1-2pm (online). Zoom link
- Fengzhe Shi: Wednesday, 3-4pm (in-person). ICCS x150 - DLC Table 1
- Mohammad Moshtaghifar: Thursday, 11am-12pm (in-person and online). ICCS x337, Zoom link
- Frederick Schpilevskiy: Thursday, 1pm-2pm (in-person). ICCS X337.
- Andrew Ahn - Friday, 11am-12pm (in-person). ICCS x150 - DLC Table 5
- Bowen Cui - Friday, 1-2pm (in-person). ICCS X139.
- Ali Mehrabian - Friday, 3-4pm (online). Zoom link
Frequently Answered Questions
Midterm information
Synopsis: We introduce basic principles and techniques in the fields of data mining and machine learning. These techniques are now running behind the scenes to discover patterns and make predictions in various applications in our daily lives. We will focus on many of the core data mining and machine learning technologies, with motivating applications from a variety of disciplines.
Registration: Undergraduate and graduate students from any department are welcome to take the course. Undergraduate students should enroll in CPSC 340 while graduate students should enroll in CPSC 540 (when it is offered; CPSC 540 also has an extra small project component). Below are more details on registration for each course:
- The majority of the seats in 340 are reserved for undergraduate computer science majors. For other students, to enroll in the course you need to sign up for the wait list. Note that all students on the wait list are typically accepted into the course. If 340 becomes full then signing up for the waiting list is the only way to enroll in the course. In previous years all students on the wait list were ultimately accepted into the course.
Prerequisites:
- Basic algorithms and data structures (CPSC 221, or both of CPSC 260 and EECE 320 as well as one of CPSC 210, EECE 201, or EECE 309).
- Linear algebra (one of MATH 152, 221, or 223).
- Probability (one of STAT 241, STAT 251, ECON 325, ECON 327, MATH 302, STAT 302, or MATH 318).
- Multivariate calculus (one of MATH 200, 217, 226, 253, or 263).
Students who do not meet these requirements should consider taking CPSC 330 ("Applied Machine Learning").
Textbook: There is no required textbook for the class. A introductory book that covers many (but not all) the topics we will discuss is the Artificial Intelligence book of Rusell and Norvig (AI:AMA) or the Artificial Intelligence book of Poole and Mackworth (you may need these for other classes). More advanced books include The Elements of Statistical Learning (ESL) by Hastie et al., Murphy's Machine Learning: A Probabilistic Perspective (ML:APP) which can be accessed through the library here, and Bishop's Pattern Recognition and Machine Learning (PRML). For books with a bigger focus on data mining, see Introduction to Data Mining (IDM) and Mining of Massive DataSets.
Related Courses:
The most related course is CPSC 330: Applied Machine Learning. This course has fewer prerequisities and covers some of the same material, but focuses more on applications rather than understanding ML ideas in depth. A discussion on the difference between CPSC 340 and similar courses in statistics written by a former student (Geoff Roeder) is available here (this was written in 2016 so may be out of date).
Grading (tentative):
- 340: Assignments 30%, Midterm 20%, Final 50%.
- 540: Assignments 25%, Midterm 15%, Final 40%, Project 20%.
Assignments:
There are a total of 6 written assignments for this course. Please follow the instructions linked here to submit your assignments.
List of topics
We will roughly cover the following topics:
- Supervised learning with frequencies and distances.
- Data clustering, outlier detection, and association rules.
- Linear prediction, regularization, and kernels.
- Latent-factor models and collaborative filtering.
- Neural networks and deep learning.
Lectures, Assignments, Related Readings, and Links
| Date |
Slides |
Related Readings and Links |
Homework and Notes |
| Jan 5 |
Motivation and Syllabus
| What is Machine Learning?
Machine Learning
Rise of the Machines
Talking Machine Episode 1
Mathematics for Machine Learning
|
Assignment 1 (pdf)
Assignment 1 (tex/code/data)
|
| Jan 7 |
Exploratory Data Analysis
| Gotta Catch'em all Why Not to Trust Statistics
Visualization Types Google Chart Gallery
Other Tools
|
|
| Jan 9 |
Decision Trees
| A Visual Introduction to Machine Learning,
Decision Trees Entropy What is Big O Notation?
AI:AMA 19.2-3, ESL: 9.2, ML:APP 16.2
|
Big-O Notes |
| Jan 12 |
Fundamentals of Learning
|
7 Steps of Machine Learning
IID Cross-validation Bias-variance
No Free Lunch
AI:AMA 19.4-5, ESL 7.1-7.4, 7.10, ML:APP 1.4, 6.5 |
Course Notation Guide
|
| Jan 14 |
Probabilistic Classifiers
|
Conditional probability (demo)
Naive Bayes Probabilities and Battleship
AI:AMA 12.6, ESL 4.3, ML:APP 2.2, 3.5, 4.1-4.2
|
Probability Notes
Probability Slides
|
| Jan 16 |
Non-Parametric Models
|
K-nearest neighbours
Decision Theory for Darts
Norms
AI:AMA 19.7, ESL 13.3, ML:APP 1.4 |
Assignment 1 due
Withdrawal deadline
Assignment 2 (pdf)
Assignment 2 (tex/code/data)
|
| Jan 19 |
Ensemble Methods
|
Ensemble Methods
Random Forests
Empirical Study
Kinect
AI:AMA 19.8, ESL: 7.11, 8.2, 15, 16.3, ML:APP 6.2.1, 16.2.5, 16.6
|
|
| Jan 21 |
Clustering
|
Clustering
K-means clustering (demo)
K-Means++ (demo)
IDM 8.1-8.2, ESL: 14.3
|
|
Jan 23
|
More Clustering |
DBSCAN
(video,
demo)
Hierarchical Clustering Phylogenetic Trees
IDM 8.4
|
|
Jan 26
|
Outlier Detection |
Empirical Study IDM 8.3, ESL 14.3.12, ML:APP 25.5 |
|
Jan 28
|
Least Squares |
Linear Regression
(demo,
2D data, 2D video)
Least Squares
Essence of Calculus Partial Derivative Gradient
ESL 3.1-2, ML:APP 7.1-3, AI:AMA 19.6
|
|
Jan 30
|
Nonlinear Regression |
Why should one learn machine learning from scratch?
Essence of Linear Algebra
Matrix Differentiation
Fluid Simulation (video)
ESL 5.1, 6.3
|
Linear Algebra Notes
Linear/Quadratic Gradients
Assignment 2 due
Assignment 3 (pdf)
Assignment 3 (tex/code/data)
|
Feb 2
|
Gradient Descent |
Gradient Descent Convex Functions
|
|
Feb 4
|
Robust Regression |
ML:APP 7.4
|
|
Feb 6
|
Feature Selection |
Genome-Wide Association Studies
AIC, BIC
ESL 3.3 , 7.5-7
|
|
Feb 9
|
Regularization |
ESL 3.4., ML:APP 7.5, AI:AMA 19.4
|
|
Feb 11
|
More Regularization |
RBF video RBF and Regularization video
ESL 6.7, ML:APP 13.3-4
|
|
Feb 13
|
Linear Classifiers |
Perceptron
ESL 4.5, ML:APP 8.5
|
Assignment 3 due
|
Feb 16
|
No Class: Midterm Break |
|
|
Feb 18
|
No Class: Midterm Break |
|
|
Feb 20
|
No Class: Midterm Break |
|
|
Feb 23
|
More Linear Classifiers |
Support Vector Machines
ESL 4.4, 12.1-2, ML:APP 8.1-3, 9.5 14.5, AI:AMA 19.6
|
|
Feb 25
|
Feature Engineering |
Gmail Priority Inbox
|
|
Feb 27
|
Kernel Trick |
ESL 12.3, ML:APP 14.1-4
|
Assignment 4 (pdf)
Assignment 4 (tex/code/data)
|
Mar 2
|
Stochastic Gradient |
Stochastic Gradient Descent, Theory and Practice ML:APP 8.5
|
|
Mar 4
|
Boosting, Start of MLE |
AdaBoost (video) XGBoost (video)
ML:APP 16.4
|
|
Mar 6
|
MLE and MAP |
Maximum Likelihood Estimation
ML:APP 9.3-4
|
|
Mar 9
|
PCA |
Principal Component Analysis ESL 14.5, IDM B.1, ML:APP 12.2
|
|
Mar 11
|
More PCA |
Making Sense of PCA
SVD Eigenfaces
|
Max and Argmax Notes
|
Mar 13
|
Sparse Matrix Factorization |
Non-Negative Matrix Factorization (original - access from UBC)
ESL 14.6, ML: APP 13.8
|
Assignment 4 due
Assignment 5 (pdf)
Assignment 5 (tex/code/data)
|
Mar 16
|
Recommender Systems & MDS |
Recommender Systems Netflix Prize
|
|
Mar 18
|
Neural Networks |
Google Video What is a Neural Network? Interactive Guide
ML:APP 16.5, ESL 11.1-4, AI:AMA 21.1 |
|
Mar 20
|
Guest Lecture |
|
|
Mar 23
|
Deep Learning |
Fortune Article Deep Learning References
Alchemy
ML:APP 28.3, ESL 11.5, AI:AMA 21.2 and 21.4-5 |
|
Mar 25
|
Deep Learning |
But what is a convolution?
|
|
Mar 27
|
Convolutions |
Convolutional Neural Networks ML:APP 28.4, ESL 11.7, AI:AMA 21.3
|
Assignment 5 due
Assignment 6 (pdf)
Assignment 6 (tex/code/data)
|
Mar 30
|
Guest Lecture |
|
|
Apr 1
|
CNNs and RNNs |
|
|
Apr 3
|
No Class: Easter Friday |
|
|
Apr 6
|
No Class: Easter Monday |
|
|
Apr 8
|
LSTMs, Attention, and Transformers |
|
|
Apr 10
|
Generative Sampling + Conclusion |
|
Assignment 6 due
|
Apr 25
|
Final
|
|
|
Frank's Recorded Lectures
During the COVID-19 pandemic, Frank recorded video lectures for the course. These videos are available here. Note that these videos may not exactly match the current semester's course, however in most cases the content will be very similar if not identical.
Mike's Demos
In semesters where Mike Gelbart taught the course, he included Jupyter notebooks associated with most lectures. These notebooks are available here (note that the lecture numbers may not exactly match the current semester's course).
Future Homeworks and Lectures
It is possible to search for and potentially find future homeworks and lectures from previous runnings of the course, however, be aware that both of these can change from year to year and that you are responsible for this year's materials.
Related courses that have online notes