Course:CICS525/Notes/Introduction

From UBC Wiki

Introduction

  • cics 525 is about building distributed and real-time systems
    • big ideas, useful techniques, implementation, case studies
    • lectures on ideas, papers for case studies
  • why take study the principles of distributed systems?
    • synthesize many different areas in order to build working systems
    • Internet access has made area much more attractive and timely
    • hard/unsolved: not many deployed sophisticated distributed systems

What is a distributed system?

  • multiple connected computers
  • cooperate to provide some service
  • Examples: Internet E-Mail, Athena file server, Google MapReduce

What is the point of distributing a system?

  • communication, among distant components
  • Web
  • reliability, from unreliable components
  • performance
    • sum of many components
    • aggregate cycles+memory
    • aggregate data transfer rates
    • aggregate storage units
  • isolation, to increase security and failure tolerance
    • authentication server
    • backup server

Challenges

  • system design
    • how to split functions among computers? (e.g. clients, servers)
    • who talks to whom? what info do they exchange?
  • performance
    • how to divide work? to maximize parallelism, minimize interaction
    • load balance
    • avoid bottlenecks like network failures
  • replication usually used to cope with failures
    • how to tell which replicas are live? or is network down?
    • which replica has the freshest data?
  • consistency
    • how to keep replicas identical?
    • how to sort out many concurrent clients using shared data in server?
  • security
    • adversary may compromise machines or manipulate messages
  • network properties
    • clusters -- high b/w, low latency, high reliability
    • wide-area -- low b/w, high latency, low reliability

Advice: solutions are complex and hard to get right

  • easy to make distributed system slower, less reliable, than a centralized system.
  • Lamport's definition: a distributed system is one where a computer you don't know about renders your own useless.
  • use a central system if you can
  • build a distributed system only if you're forced to

Major themes

  • infrastructure for building servers and clients
    • threads+RPC
    • distributed programming
  • consistency
  • fault tolerance
  • peer-to-peer/decentralized
  • case studies
  • [continuous thread: system design]