Course:CICS525/Notes/Introduction
< Course:CICS525 | Notes
Introduction
- cics 525 is about building distributed and real-time systems
- big ideas, useful techniques, implementation, case studies
- lectures on ideas, papers for case studies
- why take study the principles of distributed systems?
- synthesize many different areas in order to build working systems
- Internet access has made area much more attractive and timely
- hard/unsolved: not many deployed sophisticated distributed systems
What is a distributed system?
- multiple connected computers
- cooperate to provide some service
- Examples: Internet E-Mail, Athena file server, Google MapReduce
What is the point of distributing a system?
- communication, among distant components
- Web
- reliability, from unreliable components
- performance
- sum of many components
- aggregate cycles+memory
- aggregate data transfer rates
- aggregate storage units
- isolation, to increase security and failure tolerance
- authentication server
- backup server
Challenges
- system design
- how to split functions among computers? (e.g. clients, servers)
- who talks to whom? what info do they exchange?
- performance
- how to divide work? to maximize parallelism, minimize interaction
- load balance
- avoid bottlenecks like network failures
- replication usually used to cope with failures
- how to tell which replicas are live? or is network down?
- which replica has the freshest data?
- consistency
- how to keep replicas identical?
- how to sort out many concurrent clients using shared data in server?
- security
- adversary may compromise machines or manipulate messages
- network properties
- clusters -- high b/w, low latency, high reliability
- wide-area -- low b/w, high latency, low reliability
Advice: solutions are complex and hard to get right
- easy to make distributed system slower, less reliable, than a centralized system.
- Lamport's definition: a distributed system is one where a computer you don't know about renders your own useless.
- use a central system if you can
- build a distributed system only if you're forced to
Major themes
- infrastructure for building servers and clients
- threads+RPC
- distributed programming
- consistency
- fault tolerance
- peer-to-peer/decentralized
- case studies
- [continuous thread: system design]