Jump to content

UBC Wiki

Toggle the table of contents

Course:CICS525/Notes/Infrastructure-1

From UBC Wiki

< Course:CICS525‎ | Notes

Introduction to RPCs

goal: easy-to-program network communication
hides most details of client/server communication
makes call look much like ordinary procedure call
server handlers also look much like ordinary procedures

Alternatives

directly programming with sockets
distributed-shared memory
map/reduce
MPI (message passing)

RPC is widely used

XML RPC
Java RMI
Sun RPC

RPC structure

client application
client stubs
RPC library
network
server RPC library
dispatch
server application handlers

Example: lock server

a server that manages locks on objects (e.g., files)
server handles grants and releases of locks
want client app code to look like

acquire(lid)     
...
release(lid)

much like a local call, very convenient
actually

lc->acquire(lid)
...
lc->release(lid)

lc indicates which lock server we want to talk to

Easy challenges

how client indicates server and procedure
automatic marshaling/unmarshaling of arguments/return value

Hard challenge

failures
network may drop, delay, duplicate, re-order messages
network might break altogether, and maybe recover
server might crash, and maybe re-start
how to provide easy-to-use behavior to clients?

The original RPC mechanism (Birrell and Nelson)

from Xerox PARC, which invented LANs and workstations in 1970s
main concerns
- naming
- minimize # of packets (slow CPUs -> slow pkt handling)
- failures
Naming RPC servers
- Used Grapevine, a name service (a little like DNS)
- Export(service name, server host)
- Import(service name) -> server host
- level of indirection
  - clients need not hard-code server names
  - multiple servers (use closest)
  - replacement of servers

How does one handle failures?

client sends a request
suppose network discards the request packet
- what will client observe?
- what should the client do?
- how long should client wait before rxmt?
Now suppose the network delivered request, but discarded response
- what will client observe?
- what should the client do?
- Simple retransmission leads to "at-least-once" behavior
  - Are there any situations where at-least-once is OK? yes: if no side effects -- read-only operations
How can RPC system provide better behavior?
- remember the RPC requests it has seen, detect duplicates
- requests need unique IDs, ID repeated on retransmit
- what to do if server sees a duplicate?
  - client still needs the reply
  - so server remembers replies to previously executed RPCs
  - this yields "exactly-once" behavior

Exactly-once is difficult
- Why?
- the hard case: server crashes just as it receives request
  - did it execute, and crash before sending reply?
  - or crash before executing?
  - should server re-execute procedure call after restart?
- Birrell RPC protocol provides "at-most-once"
  - server says "ok" -> executed once
  - server says "???" -> zero or one times, unknown which
  - if server restarts, forgetting replies[] table of completed RPCs

Key remaining problem w/ at-most-once
- client sends request
- server crashes before sending reply
- server restarts
- client re-sends request
- how does server realize it is a duplicate?

What exact situation do we need to detect?
- retransmitted request
- server might have seen earlier transmission before crash
- How to detect cross-crash retransmission?
- server has a number that uniquely identifies restarts
- Birrell calls it an ID, and more common use now is a nonce (server nonce)
- client obtains server's ID when it first connects during "bind"
- client sends server ID in every RPC request
- server checks whether ID in request == current ID
  - if equal, then any previous transmission will be in server's replies[] table
  - if not equal, then there is a problem

What to do when server detects cross-crash retransmission?
- might have been executed already, might not have been
- send error back to the client and hope it knows how to deal
- this situation is pretty rare

How to ensure server never reuses a nonce?
- server could store ID on disk (if it has a disk)
- or use boot time (if it has access to a clock)
- or use a big random number (if it has a source of randomness)

When can server discard old saved return values?
- after e.g. five seconds? no!
- server can discard if client will never retransmit
- have client tell server which replies it has received
- streamlined version:
  - client gives requests ascending numbers, called xids
  - includes xid in every request
  - server includes xid in reply
  - client tells server highest xid for which it has all prev replies
- includes this in every request

Retrieved from "https://wiki.ubc.ca/index.php?title=Course:CICS525/Notes/Infrastructure-1&oldid=100381"