﻿id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc	launchpad_bug
2017	non-deterministic test hang on OpenBSD	zooko	sickness	"sickness's !OpenBSD buildslave showed a test timeout:

{{{
===============================================================================
[ERROR]
Traceback (most recent call last):
Failure: twisted.internet.defer.TimeoutError: <allmydata.test.test_runner.RunNode testMethod=test_client_no_noise> (test_client_no_noise) still running at 240.0 secs

allmydata.test.test_runner.RunNode.test_client_no_noise
===============================================================================
[ERROR]
Traceback (most recent call last):
Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was unclean.
DelayedCalls: (set twisted.internet.base.DelayedCall.debug = True to debug)
<DelayedCall 0x816eb82c [0.00169348716736s] called=0 cancelled=0 LoopingCall<0.01>(RunNode._poll, *(<function _node_has_started at 0x7ff29ed4>, 1373030506.664452), **{})()>

allmydata.test.test_runner.RunNode.test_client_no_noise
-------------------------------------------------------------------------------
Ran 1139 tests in 1784.336s

FAILED (skips=15, expectedFailures=3, errors=2, successes=1120)
}}}

(from https://tahoe-lafs.org/buildbot-tahoe-lafs/builders/sickness%20OpenBSD%205.0%20x86%20py2.7/builds/27)

Rerunning the tests with the exact same build (using Buildbot's ""force rebuild"" feature) resulted in success:

https://tahoe-lafs.org/buildbot-tahoe-lafs/builders/sickness%20OpenBSD%205.0%20x86%20py2.7/builds/28

In that run (build number 28), those tests took only a few seconds:

{{{
 19.917 seconds: allmydata.test.test_runner.RunNode.test_client
}}}
{{{
 13.758 seconds: allmydata.test.test_runner.RunNode.test_client_no_noise
}}}

(from https://tahoe-lafs.org/buildbot-tahoe-lafs/builders/sickness%20OpenBSD%205.0%20x86%20py2.7/builds/28/steps/test/logs/timings)

So there is a non-deterministic bug that exhibits on sickness's buildslave which causes those two tests to hang.

Questions:

1. Does this happen on any other buildslaves?

2. Did this ever happen before the recent patches which changed the behavior of iputil — [b0883807361830c609dff1677c3cb34fd64d3ebb], [f97b8e5e1df75284aa9b89dd830f8728040eab67], [08590b1f6a880d51751fdcacea6a007ebc568f2e], [16b245563db2f6ca71b9332b06debbe3e1d734b4], [b31a4f6e870cb56efa40c785a868a944b964e8b9], [a493ee0bb641175ecf918e28fce4d25df15994b6], [6104950ed8a7a356eed2218f2df958d074022eea], [f77ec470d75f4b8fb81b1abca4ee3b73f1ad8b22], [8e31d66cd0b0821ccaa2c7c259e7d6f262ad4738], [6a445d73bc5253ec4ae0dec70af02e33bc869cf6]?

I suspect those iputil patches of causing this hang.

sickness: could you please run the unit tests from the current trunk version repeatedly with trial's {{{--until-failure}}} option? {{{./bin/tahoe debug trial --until-failure allmydata.test}}} (See [wiki:HowToWriteTests] for more options.) If you can reliably reproduce the problem, then would you use git to rewind to before those patches and see if that makes the problem go away? Thanks!"	defect	new	normal	undecided	code	1.10.0		iputil heisenbug openbsd		
