Course:CICS525/Notes/Infrastructure-1

From UBC Wiki

Introduction to RPCs

  • goal: easy-to-program network communication
  • hides most details of client/server communication
  • makes call look much like ordinary procedure call
  • server handlers also look much like ordinary procedures

Alternatives

  • directly programming with sockets
  • distributed-shared memory
  • map/reduce
  • MPI (message passing)

RPC is widely used

  • XML RPC
  • Java RMI
  • Sun RPC

RPC structure

  • client application
  • client stubs
  • RPC library
  • network
  • server RPC library
  • dispatch
  • server application handlers

Example: lock server

  • a server that manages locks on objects (e.g., files)
  • server handles grants and releases of locks
  • want client app code to look like
acquire(lid)     
...
release(lid)     
  • much like a local call, very convenient
  • actually
lc->acquire(lid)
...
lc->release(lid)     
  • lc indicates which lock server we want to talk to

Easy challenges

  • how client indicates server and procedure
  • automatic marshaling/unmarshaling of arguments/return value

Hard challenge

  • failures
  • network may drop, delay, duplicate, re-order messages
  • network might break altogether, and maybe recover
  • server might crash, and maybe re-start
  • how to provide easy-to-use behavior to clients?

The original RPC mechanism (Birrell and Nelson)

  • from Xerox PARC, which invented LANs and workstations in 1970s
  • main concerns
    • naming
    • minimize # of packets (slow CPUs -> slow pkt handling)
    • failures
  • Naming RPC servers
    • Used Grapevine, a name service (a little like DNS)
    • Export(service name, server host)
    • Import(service name) -> server host
    • level of indirection
      • clients need not hard-code server names
      • multiple servers (use closest)
      • replacement of servers

How does one handle failures?

  • client sends a request
  • suppose network discards the request packet
    • what will client observe?
    • what should the client do?
    • how long should client wait before rxmt?
  • Now suppose the network delivered request, but discarded response
    • what will client observe?
    • what should the client do?
    • Simple retransmission leads to "at-least-once" behavior
      • Are there any situations where at-least-once is OK? yes: if no side effects -- read-only operations
  • How can RPC system provide better behavior?
    • remember the RPC requests it has seen, detect duplicates
    • requests need unique IDs, ID repeated on retransmit
    • what to do if server sees a duplicate?
      • client still needs the reply
      • so server remembers replies to previously executed RPCs
      • this yields "exactly-once" behavior
  • Exactly-once is difficult
    • Why?
    • the hard case: server crashes just as it receives request
      • did it execute, and crash before sending reply?
      • or crash before executing?
      • should server re-execute procedure call after restart?
    • Birrell RPC protocol provides "at-most-once"
      • server says "ok" -> executed once
      • server says "???" -> zero or one times, unknown which
      • if server restarts, forgetting replies[] table of completed RPCs
  • Key remaining problem w/ at-most-once
    • client sends request
    • server crashes before sending reply
    • server restarts
    • client re-sends request
    • how does server realize it is a duplicate?
  • What exact situation do we need to detect?
    • retransmitted request
    • server might have seen earlier transmission before crash
    • How to detect cross-crash retransmission?
    • server has a number that uniquely identifies restarts
    • Birrell calls it an ID, and more common use now is a nonce (server nonce)
    • client obtains server's ID when it first connects during "bind"
    • client sends server ID in every RPC request
    • server checks whether ID in request == current ID
      • if equal, then any previous transmission will be in server's replies[] table
      • if not equal, then there is a problem
  • What to do when server detects cross-crash retransmission?
    • might have been executed already, might not have been
    • send error back to the client and hope it knows how to deal
    • this situation is pretty rare
  • How to ensure server never reuses a nonce?
    • server could store ID on disk (if it has a disk)
    • or use boot time (if it has access to a clock)
    • or use a big random number (if it has a source of randomness)
  • When can server discard old saved return values?
    • after e.g. five seconds? no!
    • server can discard if client will never retransmit
    • have client tell server which replies it has received
    • streamlined version:
      • client gives requests ascending numbers, called xids
      • includes xid in every request
      • server includes xid in reply
      • client tells server highest xid for which it has all prev replies
    • includes this in every request