Distributed information systems are increasingly integral to modern life, powering the large-scale cloud services we rely on as well as critical infrastructure systems. The goal of this course is to understand how to create distributed systems that can operate correctly and with high performance despite failures and attacks.
We will study fault- and intrusion-tolerant distributed systems design, as well as formal verification and testing of distributed systems, and explore the application of these techniques in both modern cloud systems and critical infrastructure (e.g. energy systems). Students will read recent research papers and will design and carry out a project using the studied techniques. Students will leave the course with an understanding of how to design systems with provable correctness and performance guarantees, and hands-on experience working with such systems.
Prerequisites: CS 2510 (Computer Operating Systems) and 2550 (Principles of Database Systems), or instructor permission.