Course:CICS525/Notes/Infrastructure-1
< Course:CICS525 | Notes
Introduction to RPCs
- goal: easy-to-program network communication
- hides most details of client/server communication
- makes call look much like ordinary procedure call
- server handlers also look much like ordinary procedures
Alternatives
- directly programming with sockets
- distributed-shared memory
- map/reduce
- MPI (message passing)
RPC is widely used
- XML RPC
- Java RMI
- Sun RPC
RPC structure
- client application
- client stubs
- RPC library
- network
- server RPC library
- dispatch
- server application handlers
Example: lock server
- a server that manages locks on objects (e.g., files)
- server handles grants and releases of locks
- want client app code to look like
acquire(lid) ... release(lid)
- much like a local call, very convenient
- actually
lc->acquire(lid) ... lc->release(lid)
- lc indicates which lock server we want to talk to
Easy challenges
- how client indicates server and procedure
- automatic marshaling/unmarshaling of arguments/return value
Hard challenge
- failures
- network may drop, delay, duplicate, re-order messages
- network might break altogether, and maybe recover
- server might crash, and maybe re-start
- how to provide easy-to-use behavior to clients?
The original RPC mechanism (Birrell and Nelson)
- from Xerox PARC, which invented LANs and workstations in 1970s
- main concerns
- naming
- minimize # of packets (slow CPUs -> slow pkt handling)
- failures
- Naming RPC servers
- Used Grapevine, a name service (a little like DNS)
- Export(service name, server host)
- Import(service name) -> server host
- level of indirection
- clients need not hard-code server names
- multiple servers (use closest)
- replacement of servers
How does one handle failures?
- client sends a request
- suppose network discards the request packet
- what will client observe?
- what should the client do?
- how long should client wait before rxmt?
- Now suppose the network delivered request, but discarded response
- what will client observe?
- what should the client do?
- Simple retransmission leads to "at-least-once" behavior
- Are there any situations where at-least-once is OK? yes: if no side effects -- read-only operations
- How can RPC system provide better behavior?
- remember the RPC requests it has seen, detect duplicates
- requests need unique IDs, ID repeated on retransmit
- what to do if server sees a duplicate?
- client still needs the reply
- so server remembers replies to previously executed RPCs
- this yields "exactly-once" behavior
- Exactly-once is difficult
- Why?
- the hard case: server crashes just as it receives request
- did it execute, and crash before sending reply?
- or crash before executing?
- should server re-execute procedure call after restart?
- Birrell RPC protocol provides "at-most-once"
- server says "ok" -> executed once
- server says "???" -> zero or one times, unknown which
- if server restarts, forgetting replies[] table of completed RPCs
- Key remaining problem w/ at-most-once
- client sends request
- server crashes before sending reply
- server restarts
- client re-sends request
- how does server realize it is a duplicate?
- What exact situation do we need to detect?
- retransmitted request
- server might have seen earlier transmission before crash
- How to detect cross-crash retransmission?
- server has a number that uniquely identifies restarts
- Birrell calls it an ID, and more common use now is a nonce (server nonce)
- client obtains server's ID when it first connects during "bind"
- client sends server ID in every RPC request
- server checks whether ID in request == current ID
- if equal, then any previous transmission will be in server's replies[] table
- if not equal, then there is a problem
- What to do when server detects cross-crash retransmission?
- might have been executed already, might not have been
- send error back to the client and hope it knows how to deal
- this situation is pretty rare
- How to ensure server never reuses a nonce?
- server could store ID on disk (if it has a disk)
- or use boot time (if it has access to a clock)
- or use a big random number (if it has a source of randomness)
- When can server discard old saved return values?
- after e.g. five seconds? no!
- server can discard if client will never retransmit
- have client tell server which replies it has received
- streamlined version:
- client gives requests ascending numbers, called xids
- includes xid in every request
- server includes xid in reply
- client tells server highest xid for which it has all prev replies
- includes this in every request