﻿id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc	launchpad_bug
3945	Retry moody GitHub Actions steps	sajith	sajith	"Some workflows fail on !GitHub Actions either because the tests are moody or !GitHub Actions itself is moody.  Example: https://github.com/tahoe-lafs/tahoe-lafs/actions/runs/3556042011/jobs/5973114477

{{{
2022-11-27T01:09:13.3236569Z [FAIL]
2022-11-27T01:09:13.3236873Z Traceback (most recent call last):
2022-11-27T01:09:13.3237795Z   File ""D:\a\tahoe-lafs\tahoe-lafs\.tox\py310-coverage\lib\site-packages\allmydata\util\pollmixin.py"", line 47, in _convert_done
2022-11-27T01:09:13.3238340Z     f.trap(PollComplete)
2022-11-27T01:09:13.3239166Z   File ""D:\a\tahoe-lafs\tahoe-lafs\.tox\py310-coverage\lib\site-packages\twisted\python\failure.py"", line 480, in trap
2022-11-27T01:09:13.3244610Z     self.raiseException()
2022-11-27T01:09:13.3245778Z   File ""D:\a\tahoe-lafs\tahoe-lafs\.tox\py310-coverage\lib\site-packages\twisted\python\failure.py"", line 504, in raiseException
2022-11-27T01:09:13.3259779Z     raise self.value.with_traceback(self.tb)
2022-11-27T01:09:13.3260719Z   File ""D:\a\tahoe-lafs\tahoe-lafs\.tox\py310-coverage\lib\site-packages\twisted\internet\defer.py"", line 206, in maybeDeferred
2022-11-27T01:09:13.3261254Z     result = f(*args, **kwargs)
2022-11-27T01:09:13.3261923Z   File ""D:\a\tahoe-lafs\tahoe-lafs\.tox\py310-coverage\lib\site-packages\allmydata\util\pollmixin.py"", line 69, in _poll
2022-11-27T01:09:13.3262457Z     self.fail(""Errors snooped, terminating early"")
2022-11-27T01:09:13.3262935Z twisted.trial.unittest.FailTest: Errors snooped, terminating early
2022-11-27T01:09:13.3263257Z 
2022-11-27T01:09:13.3263547Z allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
2022-11-27T01:09:13.3263989Z ===============================================================================
2022-11-27T01:09:13.3264288Z [ERROR]
2022-11-27T01:09:13.3264609Z Traceback (most recent call last):
2022-11-27T01:09:13.3265386Z   File ""D:\a\tahoe-lafs\tahoe-lafs\.tox\py310-coverage\lib\site-packages\allmydata\util\rrefutil.py"", line 26, in _no_get_version
2022-11-27T01:09:13.3268422Z     f.trap(Violation, RemoteException)
2022-11-27T01:09:13.3269217Z   File ""D:\a\tahoe-lafs\tahoe-lafs\.tox\py310-coverage\lib\site-packages\twisted\python\failure.py"", line 480, in trap
2022-11-27T01:09:13.3269711Z     self.raiseException()
2022-11-27T01:09:13.3270396Z   File ""D:\a\tahoe-lafs\tahoe-lafs\.tox\py310-coverage\lib\site-packages\twisted\python\failure.py"", line 504, in raiseException
2022-11-27T01:09:13.3270976Z     raise self.value.with_traceback(self.tb)
2022-11-27T01:09:13.3271553Z foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=4vg7) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
2022-11-27T01:09:13.3271977Z 
2022-11-27T01:09:13.3272448Z allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
2022-11-27T01:09:13.3272884Z ===============================================================================
2022-11-27T01:09:13.3273207Z [ERROR]
2022-11-27T01:09:13.3273530Z Traceback (most recent call last):
2022-11-27T01:09:13.3274088Z Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=4vg7) (during method=RIUploadHelper.tahoe.allmydata.com:upload)
2022-11-27T01:09:13.3274512Z 
2022-11-27T01:09:13.3274802Z allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
2022-11-27T01:09:13.3275437Z -------------------------------------------------------------------------------
2022-11-27T01:09:13.3275958Z Ran 1776 tests in 1302.475s
2022-11-27T01:09:13.3276195Z 
2022-11-27T01:09:13.3276435Z FAILED (skips=27, failures=1, errors=2, successes=1748)
}}}

That failure has nothing to do with the changes that triggered that workflow; it might be a good idea to retry that step.

Some other workflows take a long time to run. Examples: on https://github.com/tahoe-lafs/tahoe-lafs/actions/runs/3556042011/jobs/5973114477, `coverage (ubuntu-latest, pypy-37)`, `integration (ubuntu-latest, 3.7)`, and `integration (ubuntu-latest, 3.9)`.  Although in this specific instance integration tests are failing due to #3943, it might be a good idea to retry them after a reasonable timeout, and give up altogether after a number of tries instead of spinning for many hours on end.

This perhaps would be a good use of [https://github.com/marketplace/actions/retry-step actions/retry-step]?"	task	closed	normal	undecided	dev-infrastructure	n/a	wontfix			
