Research Project
During this course, the students will define and execute a small research project form writing a proposal, to doing the research and implementation, up to submitting it to a the UBC Conference of Operating Systems (UBCOS). This gives students the opportunity to conduct systems research.
For this project, you need to pose a question, design a framework in which to answer the question, conduct the research, and write up your experience and results. There will be five deliverables for this project.
Note, we would like you to succeed in this project. We encourage you to reach out to us to discuss your projects and any questions you may have.
A sample history of a paper from its initial submission to the final paper may include:
- The original (rejected) submission (extended abstract) and reviews;
- Another (accepted) submission and reviews;
- The final paper (camera ready)
This collection should give you an idea of how to give and respond to constructive criticism. It will also give you a sense of what we mean by "conference paper."
Tasks Summary and Grading Scheme
- Find a project partner
- Select a project and write a proposal with research plan (15%)
- Submit the first draft to UBCOS (25%)
- Write reviews for the UBCOS program committee (10%)
- Present your project at UBCOS (10%)
- Submit your final paper (40%)
Teams
Generally, the final research project will be completed by teams of two students. A team of three students may be allowed in exceptional cases (e.g., if the number of students taking the course is odd; or if the project is sufficiently large to warrant more people. In that case come and talk with the instructors).
Research is a lot about collaboration and teamwork. A lot of great ideas happen through interactions with others. Therefore, we want the final project to be a team effort.
Already formed a team? Send us an e-mail with names and CWLs of your team members. We will then create a GitHub repository for you and assign you to the project team.
Task: Send an e-mail with the team members to the instructors
Project Proposal and Research Plan
After you have formed a team, it is time to select a research project and write a proposal with a research plan.
Project Selection
The size of the project can vary, but thinking of it as a workshop paper is probably a good model. We do have a few project suggestions at the end of the page. Of course, you are welcome to come up with your own project idea.
Projects overlapping other courses/research/... Projects may also be undertaken in cooperation with other graduate courses, but any such project must be approved by the professors of both courses. Not surprisingly, we expect more depth and work for a project that is satisfying two class requirements. Similarly, if you wish to undertake a project related to your own research, we will permit it, but you must demonstrate how what you've learned in CPSC 508 influences your work and/or ways in which your research would have been different had you not been also conducting a project in CPSC 508. In other words, your project in CPSC 508 must extend work you would normally have done in some new and/or different way. In either way: please come and talk to us if your project will be overlapping with other courses/research.
Deliverable Part 1: The Proposal
Recall, research is very much about story telling. Thus the proposal covers the story you like to tell and the research question that you are asking:
- Start the proposal with a single paragraph fairytale: what is the story you would like to tell? Recall the structure of the fairly tale: Once upon a time (context), there was a problem (villain), then we did our research (hero), and now look at all the great things we can do (happily ever after).
- Then describe your project more formally or seriously. Clearly motivate and state the research question you are investigating. You should provide a few sentences of explanation about why you find this to be an interesting question, why it is important, and how it qualifies as research.
Task: Write the proposal for your project covering the two points above.
Deliverable Part 2: The Research Plan
Compared to the proposal, the research plan is a more comprehensive document outlining how you want to execute your project including the time frame etc. It should include the following components (the numbers in parentheses are an indication of an estimate of the number pages that you might need for the section; it is just an estimate; in practice you should write exactly as much as you need to write to convey answers to questions we pose).
- Related Work (~1 page)
Broadly speaking, related work falls into two categories: background and contextual work.
- Background: Provides the required information for the reader to understand your work. Write about the background readings you have identified for your project. You should be able to explain the background in sufficient detail to your class mates so they can understand what you are doing and why.
- Contextual: Provides the state-of-the-art upon which you are building. This includes others solving the same problem you are planning to tackle and work that is in adjacent areas that may influence what you are doing. The purpose of this part is to draw the research landscape and place your work within it. Ideally you have done some literature search, but at a minimum you should have identifier what areas you need to cover. Include a list of papers that you plan to read. Have at least a couple of specific comparisons between prior work and what you are doing (write down a few sentences).
You may have spotted a flaw in related work and want to correct it. This section is where you can describe that.
Hint: you may be able to reuse a lot of your related work section in the final paper! - Conclusion Format (~1 paragraph) Write down the expected conclusions you will obtain from your research project. Think of this as a sentence that may appear in your abstract or conclusion. Obviously, you do not have the results yet, but you should know the the format of them. For example, you compare your system A with system B, you would be able to say: "Our system outperforms System B in scenarios X and Y, while System B outperforms our system in scenario Z."
- Experimental Setup (~1/2 page)
Describe the set of experiments you plan to conduct. You may think of experiments you want
to execute conditionally, depending on the outcome. For each experiment list:
- Purpose: Why do you want to conduct the experiment? What question do you want to answer and what are you hoping to learn from the experiment
- Execution: How are you going to conduct our experiment and what tools are you planning on using? How do you know whether your measurements are accurate?
- Results: Give a brief (1-2 sentence) description of the results you expect to see.
- Resources Needed (~1/4 page)You may need access to specific hardware and software to successfully execute your project. List all your equipment needs here including hardware, software and tools you may need. This is of high importance so we can ensure that you have the right resources available to do your work. If you have doubts about the availability of some of your needed resources come and talk to as before you submit your research plan! You do not want to find out in the middle of the term that some of the resources you need are not available.
- Schedule (~1/4 page) Have a look at the course schedule and come up with concrete milestones and dates when you want to finish them. Don't forget to account for time to write the paper.
Task: Write the research plan for your project
Submission
Prepare the submission of your assignment.
- The PDF document must be no longer than 3 pages (roughly 0.5 for the proposal and 2.5 for the research plan) including figures and tables, plus as many pages as needed for references. (see Formatting Guidelines)
- Submit the PDF following the submission instructions
Task: Submit the proposal and research plan.
Status Meeting
At a minimum, there will be two status meetings with each group before the paper is due. These meetings are intended for your benefit; you decide what is of most value to you. Give a brief overview: what have you done, what next steps have you planned? Do you have any questions about things you've already done, or would you like to brainstorm what you could do next?
Generally, we encourage you to come to talk to us about your project! We can schedule meetings as the need arises. We are happy to advise you so you can succeed in your project.
Task: Schedule the first status meeting (see schedule)
Task: Schedule the second status meeting (see schedule)
Paper Submission (First Draft)
Awesome, you finished a draft of your paper and are ready to submit it to UBCOS where your peers will review it and discuss it at the Mock Program Committee.
In contrast to a real paper, you may not have all the results yet. This if fine (see below). However, you should have at least completed some part of your research by this point.
Paper Parts
Your draft should contain all the parts of a real paper:
- Abstract (~0.25 page): Follow the high-level structure of the fairly tale (you should already have written this)
- Introduction (~3/4 page): Tell the story of your paper (you should already have written parts of this)
- Background / Related Work (~1/2 page): Explain the necessary background and related work (you may already have most of this) You may also move the background later in the paper before the conclusion. Compare and contrast your work with others: how are you similar, how are you different?
- Design / Implementation (~2 pages): Describe your high-level design and provide the important details on your implementation.
- Evaluation (~1.25 pages):
Describe your evaluation and results. (you should already have written parts of this).
For each experiment clearly state:
- the purpose of the experiment (why are we doing this experiment),
- how you are conduction the experiment,
- the description of the results. and
- discussion / interpretation of the results.
For points 3 + 4 above: You may only have preliminary results or you are still missing them. This is fine. For any results you do not have, think what you would expect then make them up. However, clearly state that those are made up! Then write parts 3 and 4 of the experiment based on those made up results.
The reason we do that is because when you get actual results, you then have something against which to compare them. When you get your actual results later, do they match your predictions? If not, why not?
- Conclusion (~0.25 pages): This conclusion is also based on your results.
- References: Have a list of references to background and related work.
Hint: You should be able to write significant parts of this immediate after your project proposal is turned in. Don't write this all the night before it is due.
Submission
Prepare the submission of your first draft.
- The PDF document must be no longer than 5 pages including figures and tables, plus as many pages as needed for references. (see Formatting Guidelines)
- Submit the PDF following the submission instructions
Task: Submit the first draft of your paper. Notify us by e-mail when done so.
Conference Presentation
Congratulations! Your paper got accepted at UBCOS! Now it's up to you to present your work.
Each group will present a short talk on their research project during normal class hours. We will allocate about 12 minutes for each group. This means 10 minutes for the presentation and about 2 minutes for questions.
This is short, about 10-12 slides. You probably cannot go into the details of your work. Instead, focus on the story (motivation), the high-level picture and the most-important results of your work.
Task: Present your research in class.
Task: Add a PDF of your slides to the hand-in repository.
Camera Ready (Final Submission)
Congratulations again to your accepted paper! You have gotten plenty of feedback from your program committee that you can use to improve your paper! To do so, you will get an additional page.
Ideally, you want to complete your writing early enough that you have time to reread and critique it with the rigor that you have applied to Assignment 1. Be honest. State shortcomings in your work. Discuss follow on projects.
Several of your final papers will be suitable for submitting to an actual conference or workshop (maybe with some additional work). We are happy to work with you to turn them into submissions.
Hint: You have received feedback that your shepherd (i.e., your instructors) is expecting you to incorporate into your camera ready version. Thus, part of your final grade will be how well you addressed the comments. So don't ignore them!
Submission
Prepare the submission of your assignment.
- The PDF document must be no longer than 6 pages including figures and tables, plus as many pages as needed for references. (see Formatting Guidelines)
- Submit the PDF following the submission instructions
Task: Submit the final version of your paper.
Task: Make sure that you've committed everything that is related to the project to your course GitHub repository.
Project Suggestions
Here are some topics and/or more concrete project proposals that students may take as inspiration. Of course, you are free to define your own.
Isolation Model
What is isolated and what is not is a subtle and often challenging task due to the number
of different hardware and software constructs that provide isolation. We developed a model
that can express this and are in the process of formalizing it. We are interested in extracting
protection domain information from applications and kernels -- ideally automatically. Tools
like KSplit
can help with analyzing this. Can we use KSplit or similar tools to automatically extract
the set of protection domains of the Linux kernel (or other applications)?
Contact Sid Agrawal / Reto Achermann.
Formally specify the behavior of a concurrent / distributed / interesting system
Many complex, critical pieces of systems tooling are understood using prose specifications and diagrams. While systems built like this are a majority, there are limits to this development style's reliability. Real outages often stem from unanticipated interactions between protocols and subsystems, and formal verification can help us find and understand counter-intuitive behaviors implied by our system designs (some of which may be bugs). For example, specifying the behavior of the production key-value store Cosmos DB lead to the documentation of multiple surprising things.
One way to do this is with TLA+. TLA+ is a formal modeling tool that can check properties of concurrent or distributed systems. It works as an extension of set theory, so rather than grabbing your system's source code you would be reading its description and trying to qualify the exact rules and properties involved. Its most common verification method is model checking, where it enumerates all possible behaviors in your system up to a bounded depth.
You can use your own "system of interest", but as a starting point you can try formally specifying Practical Byzantine Fault Tolerance and checking that its properties hold.
See the Cosmos DB paper above and accompanying Github repo for an example of a larger-scale TLA+ specification. The PGo project and its related work are about taking a similar type of specification and implementing a running system from it, and include larger-scale things like Raft-based key-value stores.
Contact Finn Hackett / Ivan Beschastnikh
Securing Containerized Computation
Leveraging recent development of Kernel Runtime Security
Instrumentation by Google and looking at security namespacing,
can you propose a solution to allow containers to load Linux Security Monitors and Berkeley Packet Filters on a
per container (a.k.a. namespace) basis? You may want to look at existing cgroup-ed BPF programs and understand
the necessary requirements to enable such a feature in the eBPF-LSM context. Can you reproduce the use cases
presented here? Can
you present new interesting use cases?
Contact Thomas Pasquier.
Execution Grammars
Expanding on prior work, Thomas Pasquier's group has
started to build a tool that creates a "graph grammar" that describes the provenance subgraph
corresponding to a given system call. Given code snippets or malware binaries, it should be possible to build a
grammar representing their execution and therefore to detect the subgraph they generate. How could one build
such grammar? Can this be done at scale (e.g., given a malware library)? Can you automatically extract a
generalization that matches a given malware family to keep detection high in the case of a new malware variant?
Could this solution be used to remove the need for human expertise compared to POIROT?
Contact: Thomas Pasquier.
Real End-to-end Provenance
Data provenance is metadata that describes how a digital artifact came to be in its
present state. One problem with existing provenance capture systems is that they capture only local provenance:
provenance from a particular language (e.g., R, Python) or from a particular workflow system (e.g., VisTrails). However, once you copy files or use
multiple langauges, or connect different programs together in a script, you run the risk of breaking the
provenance chain. We believe that whole system provenance (e.g., CamFlow) could
provide the glue that connects different provenance systems. Your goal is to demonstrate some application
that uses provenance from multiple different collection sources to do something interesting. For example, given
a shell script that calls both R and Python programs, can you automatically build a container or VM that
precisely and exactly reproduces the experiment? Alternately, could you use provenance to build a debugging
tool?
Contact: Margo Seltzer
Using Provenance to Solve OS Problems
There exist many systems papers of the form, "We wanted to solve some problem, so we modified the kernel to produce a bunch of data, and then we used that data to do something." I'd like to see how many of these projects could be done via a single provenance capture system. CamFlow is a selective whole-system provenance capture system. It also has a lovely front-end display engine. How many special-purpose systems could be replaced by scripts running over CamFlow data. I could imagine doing this dynamically over streaming data (using CamQuery) or statically over collected data.
For example, prefetching files requires that you know what files are likely to be accessed, before programs actually access them -- Camflow captures much of that data. So, see if you can replicate the work in "An Analytical Approach to File Prefetching (1997 USENIX)" using Camflow. Here are other papers on file prefetching to examine: Marginal Cost-Benefit Analysis for Predictive File Prefetching (ACSME 2003) and Design and Implementation of Predictive File Prefetch ing (USENIX 2002).
Another area where provenance might be useful is in cache replacement
algorithms -- if you knew what you might need again soon, you would keep
A study of integrated prefetching and caching strategies (Sigmetrics PER 1995).
Informed prefetching and caching (SOSP 1995) and
Application controlled prefetching and caching (USENIX 2002).
The Coda file system was designed to help users work in a disconnected
mode. One component of that system was a hoarding mechanism where the
system would try to figure out what files you were going to need
to function while disconnected. It seems that one could exploit provenance
to perform better hoarding. Do it!
Contact: Margo Seltzer
Prove that LSM-based provenance capture is guaranteed to detect a security breach.
This is a two-step process. First, using the methodology used in this paper,
show that the current LSM interface captures all security-related flows in the kernel.
Next, given provenance captured at these points, prove (or disprove) that a
security violation must show up as an anomaly in the provenance graph.
Contact: Margo Seltzer
Device Generation
As mentioned in previous project descriptions, we have an ongoing
project to synthesize device drivers from specifications.
This project is a dual: synthesizing emulated devices.
The question that we would like to ask is, "To what extent
can we generate an emulated device for the Qemu virtual machine
and/or the Arm FastModels simulator from the specification we use
to synthesize device drivers?"
Contact: Reto Achermann
Specifying System Calls
System calls are used by applications to interact with the kernel. Like a function call,
arguments are prepared and then the systemcall instruction is executed that triggers a
control transfer into the corresponding entry function in the kernel including a protection
mode switch. There are a few things that are important here: 1) how to pack the arguments,
2) how the kernel checks which handler function to call, and 3) validation of the arguments.
What systems use some form of specification of system calls and what guarantees do they
provide? Can we improve upon them? There are two directions 1) specify the semantics of
the system calls or 2) specify the arguments and generate handlers for possible different
architectures.
Contact: Reto Achermann
Specifying System Calls
When a computer starts, hardware needs to be initialized. Often, there is also a
chain of software components that are executed after each other even before we
jump into the kernel. Can we specify some contract between the stages including the
desired hardware state, and use this to generate code that establishes the requirements?
Contact: Reto Achermann