﻿id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc	launchpad_bug
758	maximum recursion depth exceeded in Tahoe2PeerSelector	zooko		"I just got this traceback from a node using the volunteergrid:

{{{
/usr/local/lib/python2.6/dist-packages/Twisted-8.2.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py, line 328 in _runCallbacks
326                    self._runningCallbacks = True
327                    try:
328                        self.result = callback(self.result, *args, **kw)
329                    finally:
Locals
callback	<bound method Tahoe2PeerSelector._got_response of <Tahoe2PeerSelector for upload nztp5>>
self	<Deferred at 0x4d93a70 current result: None>
args	(<PeerTracker for peer xjy2clbq and SI nztp5>, set([19, 20]), [<PeerTracker for peer gapnio7p and SI nztp5>])
kw	{}
/home/volunteergrid/src/tahoe/src/allmydata/immutable/upload.py, line 384 in _got_response
382
383        # now loop
384        return self._loop()
385
Locals
self	<Tahoe2PeerSelector for upload nztp5>
/home/volunteergrid/src/tahoe/src/allmydata/immutable/upload.py, line 284 in _loop
282            self.contacted_peers.extend(self.contacted_peers2)
283            self.contacted_peers[:] = []
284            return self._loop()
285        else:
Locals
self	<Tahoe2PeerSelector for upload nztp5>
/home/volunteergrid/src/tahoe/src/allmydata/immutable/upload.py, line 284 in _loop
282            self.contacted_peers.extend(self.contacted_peers2)
283            self.contacted_peers[:] = []
284            return self._loop()
285        else:
Locals
self	<Tahoe2PeerSelector for upload nztp5>
}}}

(And so forth until maximum recursion depth exceeded.)

There are only 15 servers on the volunteergrid right now.  The clause that is shown, around [source:src/allmydata/immutable/upload.py#L279 279 of upload.py] is for the case that all servers have been asked to hold a share, and then all servers have been asked to hold a second share, and this clause is to iterate and go on to ask them to hold yet a third-or-greater share.

It appears that this loop never terminated before the recursion depth was exceeded.  We have [source:src/allmydata/tahoe/test/test_upload.py@20090625021809-4233b-9cdbf53c54025466fea8ab97bed668cd0017b142#L483 tests of this case], but...  Hey waitaminute!  That code in upload.py says:

{{{
elif self.contacted_peers2:
    # we've finished the second-or-later pass. Move all the remaining
    # peers back into self.contacted_peers for the next pass
    self.contacted_peers.extend(self.contacted_peers2)
    self.contacted_peers[:] = []
    return self._loop()
}}}

That can't be right.  It probably means to say:

{{{
    self.contacted_peers.extend(self.contacted_peers2)
    del self.contacted_peers2[:]
}}}

Why does that test catch this bug?

But it is too late at night for me to be messing with such stuff.

If someone in a different timezone or a different sleep schedule wants to fix the test to catch this bug while I sleep, that would be great!  :-)"	defect	closed	major	1.5.0	code-peerselection	1.4.1	fixed			
