id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc	launchpad_bug
877	reentrant confusion in mutable publish causes incorrect UCWE	warner		"The incident reports in #786 reveal a problem in the mutable-file publish
control loop which is likely to trigger an inappropriate Uncoordinated Write
Error when a server goes away between mapupdate and publish. Here's the
setup:

 * the user modifies an existing healthy mutable file, 3-of-10 encoding
 * Mapupdate identifies a bunch of shares on their servers, Retrieve gets the
   data, Publish prepares to upload a new version. Let's say each share N
   goes to server N (sh1 to server 1, etc), and that there's an extra copy of
   share 3 on server 11. And let's say that server 3 disconnected while the
   Retrieve was running (so it's still in the sharemap but the
   {{{RemoteReference}}} is now dead).
 * {{{loop()}}} is entered the first time
  * it calls {{{update_goal()}}} to figure out where all the shares want to
    live. Since mapupdate found all shares, no new shares need to be placed:
    each known share will be updated, including the doubled share 3.
  * it then calls {{{_send_shares()}}} to deliver messages to the 11 servers
    that we're using
  * {{{_send_shares()}}} builds a dictionary named {{{all_tw_vectors}}} that
    contains all of the messages it will send, batched by serverid (so it can
    send exactly one message to each server, even if there are multiple
    shares per server). Then it loops through this dictionary (let's pretend
    in sorted order even though {{{.items()}}} is random), and for each
    peerid, it updates {{{self.outstanding}}} to remember that the message is
    in flight (so our next pass through {{{loop()}}} won't send duplicate
    messages), and then sends the message. The relevant code looks like this:

{{{
        for (peerid, tw_vectors) in all_tw_vectors.items():
            for shnum in shnums:
                self.outstanding.add( (peerid, shnum) )
            d = self._do_testreadwrite(peerid, secrets, tw_vectors, read_vector)
            d.addCallbacks(self._got_write_answer, self._got_write_error)
            d.addCallback(self.loop)
}}}

  * when we've processed peerid 1 and 2, and we're working on 3, the
    {{{_do_testreadwrite}}} call will see the disconnected server and errback
    with {{{DeadReferenceError}}}. This immediately fires
    {{{_got_write_error}}}, which log.UNUSUALs the event, removes the share
    from {{{self.outstanding}}}, and adds the peer to {{{self.bad_peers}}} so
    it becomes ineligible for other shares. Then the Deferred chain moves
    onto the next call, and invokes {{{self.loop}}}. This all happens while
    the for loop is on the stack: the rest of the server messages have not
    been processed yet.
   * now {{{loop()}}} is entered the second time, reentrantly. It updates the
     goal, which removes the sh3-on-X mapping but doesn't add any new shares
     because sh3 is already happily living on server3. It computes the
     ""needed"" list of shares to send, which does not include 1 or 2 or 3
     (because those messages are in {{{self.outstanding}}}, but does include
     all of the others (because they haven't been added to
     {{{self.outstanding}}} yet, even though they will be as soon as the
     suspended call to {{{_send_shares}}} gets a chance).
   * Then it (re)enters {{{_send_shares()}}}, which dutifully sends requests
     to the needed list: 4, 5, etc, up to 10. The inner call to
     {{{_send_shares}}} exits, as does the inner call to {{{loop()}}}
  * now the outer call to {{{_send_shares()}}} regains control, in the middle
    of its for loop. It moves to server 4, adds the messages to the
    {{{self.outstanding}}} set (which is ignored, because it was already
    there from the inner call), sends the (duplicate) message, and continues
    onwards.
  * The result is that we get duplicate identical messages for the later
    servers.
 * The first message to server 4 comes back with a successful result: the
   test vector matches, so the write was performed. That message is removed
   from {{{self.outstanding}}} with ""discard"". The servermap is updated with
   the new share we know to be in place on this server.
 * Later, the second message to server 4 comes back. It is removed from
   {{{self.outstanding}}}, but it was already gone, and ""discard"" ignores
   this. This message has failed, because the test vector it provided did not
   match, because the first message had been processed before the second
   message arrived.
  * this log.WEIRDs an event (and triggers an Incident). The log messages say
    things like: somebody modified the share on us: shnum=8: I thought they
    had #1832:R=lzzf, but testv reported #1832:R=lzzf . The first version in
    this message comes from the servermap, and indicates that it had been
    updated upon receipt of the first response. The second version in this
    message come from the server, and indicates that the first response was
    successful.
  * it also marks the upload as being ""surprised"", since it appears that
    someone else has modified the share since our mapupdate was done
 * the surprise causes this Publish to fail with
   {{{UncoordinatedWriteError}}}

The actual numbers of shares in Zooko's #786 104230-vyc6byy incident were
different than this simplified example. There were actually 20 shares
scattered among 12 servers due to lots of servers being offline during
previous updates. The server which was lost only held one share (sh0), and
that share was also stored on a different server. The order in which the for
loop was traversed is unknown. There were two server messages still
outstanding when the duplicate response was received, ending the Publish
process. There may have been lots of duplicate messages sent, for which the
responses didn't arrive until after the Incident log was closed.

The upshot is that the reentrant call to {{{loop()}}} can cause trouble. The
best fix is probably to change the code in {{{_send_shares}}} to fire
{{{loop()}}} via an eventual-send, specifically with foolscap's
{{{fireEventually}}} Deferred-creating function. Since this is the only place
in Publish that invokes {{{loop()}}} (other than the initial priming call),
this should be sufficient to prevent reentrancy.

"	defect	closed	critical	1.6.0	code-mutable	1.5.0	fixed	availability upload