From dyioulos at yahoo.com Thu Oct 1 08:24:07 2009 From: dyioulos at yahoo.com (Dimitri Yioulos) Date: Thu, 1 Oct 2009 08:24:07 -0700 (PDT) Subject: [tahoe-dev] Error all of a sudden Message-ID: <108296.40749.qm@web33102.mail.mud.yahoo.com> Hello all. When trying to do a backup during the past couple of days, I get the following error(s): dyioulos at server:~$ pfexec tahoe backup /usr/local/doc tahoe:bpbackup Traceback (most recent call last): File "/usr/local/allmydata-tahoe-1.5.0/support/bin/tahoe", line 8, in load_entry_point('allmydata-tahoe==1.5.0', 'console_scripts', 'tahoe')() File "/usr/local/allmydata-tahoe-1.5.0/src/allmydata/scripts/runner.py", line 91, in run rc = runner(sys.argv[1:]) File "/usr/local/allmydata-tahoe-1.5.0/src/allmydata/scripts/runner.py", line 78, in runner rc = cli.dispatch[command](so) File "/usr/local/allmydata-tahoe-1.5.0/src/allmydata/scripts/cli.py", line 456, in backup rc = tahoe_backup.backup(options) File "/usr/local/allmydata-tahoe-1.5.0/src/allmydata/scripts/tahoe_backup.py", line 370, in backup return bu.run() File "/usr/local/allmydata-tahoe-1.5.0/src/allmydata/scripts/tahoe_backup.py", line 212, in run new_backup_dircap = self.process(options.from_dir, latest_backup_dircap) File "/usr/local/allmydata-tahoe-1.5.0/src/allmydata/scripts/tahoe_backup.py", line 265, in process newfilecap, metadata = self.upload(childpath) File "/usr/local/allmydata-tahoe-1.5.0/src/allmydata/scripts/tahoe_backup.py", line 352, in upload raiseHTTPError("Error during file PUT", resp) File "/usr/local/allmydata-tahoe-1.5.0/src/allmydata/scripts/tahoe_backup.py", line 18, in raiseHTTPError raise HTTPError(msg) allmydata.scripts.tahoe_backup.HTTPError: Error during file PUT: 500 Internal Server Error Exception

<class 'foolscap.tokens.RemoteException'>: <RemoteException around '[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last): File "/usr/lib/python2.5/site-packages/foolscap/call.py", line 667, in _done self.request.complete(res) File "/usr/lib/python2.5/site-packages/foolscap/call.py", line 53, in complete self.deferred.callback(res) File "/usr/lib/python2.5/site-packages/twisted/internet/defer.py", line 239, in callback self._startRunCallbacks(result) File "/usr/lib/python2.5/site-packages/twisted/internet/defer.py", line 304, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File "/usr/lib/python2.5/site-packages/twisted/internet/defer.py", line 317, in _runCallbacks self.result = callback(self.result, *args, **kw) File "/usr/lib/python2.5/site-packages/allmydata/immutable/upload.py", line 379, in _got_response return self._loop() File "/usr/lib/python2.5/site-packages/allmydata/immutable/upload.py", line 302, in _loop self.shares_of_happiness) allmydata.interfaces.NotEnoughSharesError: peer selection failed for <Tahoe2PeerSelector for upload jzhq3>: placed 0 shares out of 10 total (10 homeless), sent 106 queries to 106 peers, 0 queries placed some shares, 106 placed none, got 0 errors ]'>

<class 'foolscap.tokens.RemoteException'>: <RemoteException around '[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last): File "/usr/lib/python2.5/site-packages/foolscap/call.py", line 667, in _done self.request.complete(res) File "/usr/lib/python2.5/site-packages/foolscap/call.py", line 53, in complete self.deferred.callback(res) File "/usr/lib/python2.5/site-packages/twisted/internet/defer.py", line 239, in callback self._startRunCallbacks(result) File "/usr/lib/python2.5/site-packages/twisted/internet/defer.py", line 304, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File "/usr/lib/python2.5/site-packages/twisted/internet/defer.py", line 317, in _runCallbacks self.result = callback(self.result, *args, **kw) File "/usr/lib/python2.5/site-packages/allmydata/immutable/upload.py", line 379, in _got_response return self._loop() File "/usr/lib/python2.5/site-packages/allmydata/immutable/upload.py", line 302, in _loop self.shares_of_happiness) allmydata.interfaces.NotEnoughSharesError: peer selection failed for <Tahoe2PeerSelector for upload jzhq3>: placed 0 shares out of 10 total (10 homeless), sent 106 queries to 106 peers, 0 queries placed some shares, 106 placed none, got 0 errors ]'>

Sorry for the length. What's up? Thanks. Dimitri From zooko at zooko.com Thu Oct 1 09:10:26 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Thu, 1 Oct 2009 10:10:26 -0600 Subject: [tahoe-dev] Error all of a sudden In-Reply-To: <108296.40749.qm@web33102.mail.mud.yahoo.com> References: <108296.40749.qm@web33102.mail.mud.yahoo.com> Message-ID: Are you using the allmydata.com production grid? I didn't know that you are you a customer of allmydata.com . I thought you were using the OKFN grid. Anyway, the allmydata.com production grid is currently full, partially due to lots of servers being full of data and partially due to some of the servers that do have available disk being temporarily off-line. When a grid is full, writes (and backups and repairs) fail but reads works. allmydata.com is working on adding capacity and bringing up the offline servers. If you are using the OKFN grid then maybe *it* is full. Regards, Zooko On Thursday,2009-10-01, at 9:24 , Dimitri Yioulos wrote: > Hello all. > > When trying to do a backup during the past couple of days, I get > the following error(s): > > dyioulos at server:~$ pfexec tahoe backup /usr/local/doc tahoe:bpbackup > Traceback (most recent call last): > File "/usr/local/allmydata-tahoe-1.5.0/support/bin/tahoe", line > 8, in > load_entry_point('allmydata-tahoe==1.5.0', 'console_scripts', > 'tahoe')() > File "/usr/local/allmydata-tahoe-1.5.0/src/allmydata/scripts/ > runner.py", line 91, in run > rc = runner(sys.argv[1:]) > File "/usr/local/allmydata-tahoe-1.5.0/src/allmydata/scripts/ > runner.py", line 78, in runner > rc = cli.dispatch[command](so) > File "/usr/local/allmydata-tahoe-1.5.0/src/allmydata/scripts/ > cli.py", line 456, in backup > rc = tahoe_backup.backup(options) > File "/usr/local/allmydata-tahoe-1.5.0/src/allmydata/scripts/ > tahoe_backup.py", line 370, in backup > return bu.run() > File "/usr/local/allmydata-tahoe-1.5.0/src/allmydata/scripts/ > tahoe_backup.py", line 212, in run > new_backup_dircap = self.process(options.from_dir, > latest_backup_dircap) > File "/usr/local/allmydata-tahoe-1.5.0/src/allmydata/scripts/ > tahoe_backup.py", line 265, in process > newfilecap, metadata = self.upload(childpath) > File "/usr/local/allmydata-tahoe-1.5.0/src/allmydata/scripts/ > tahoe_backup.py", line 352, in upload > raiseHTTPError("Error during file PUT", resp) > File "/usr/local/allmydata-tahoe-1.5.0/src/allmydata/scripts/ > tahoe_backup.py", line 18, in raiseHTTPError > raise HTTPError(msg) > allmydata.scripts.tahoe_backup.HTTPError: Error during file PUT: > 500 Internal Server Error Exception head>

<class > 'foolscap.tokens.RemoteException'>: <RemoteException around > '[CopiedFailure instance: Traceback from remote host -- Traceback > (most recent call last): > File "/usr/lib/python2.5/site-packages/foolscap/call.py", line > 667, in _done > self.request.complete(res) > File "/usr/lib/python2.5/site-packages/foolscap/call.py", line > 53, in complete > self.deferred.callback(res) > File "/usr/lib/python2.5/site-packages/twisted/internet/ > defer.py", line 239, in callback > self._startRunCallbacks(result) > File "/usr/lib/python2.5/site-packages/twisted/internet/ > defer.py", line 304, in _startRunCallbacks > self._runCallbacks() > --- <exception caught here> --- > File "/usr/lib/python2.5/site-packages/twisted/internet/ > defer.py", line 317, in _runCallbacks > self.result = callback(self.result, *args, **kw) > File "/usr/lib/python2.5/site-packages/allmydata/immutable/ > upload.py", line 379, in _got_response > return self._loop() > File "/usr/lib/python2.5/site-packages/allmydata/immutable/ > upload.py", line 302, in _loop > self.shares_of_happiness) > allmydata.interfaces.NotEnoughSharesError: peer selection failed > for <Tahoe2PeerSelector for upload jzhq3>: placed 0 shares > out of 10 total (10 homeless), sent 106 queries to 106 peers, 0 > queries placed some shares, 106 placed none, got 0 errors > ]'>

name="tracebackEnd">

<class > 'foolscap.tokens.RemoteException'>: <RemoteException around > '[CopiedFailure instance: Traceback from remote host -- Traceback > (most recent call last): > File "/usr/lib/python2.5/site-packages/foolscap/call.py", line > 667, in _done > self.request.complete(res) > File "/usr/lib/python2.5/site-packages/foolscap/call.py", line > 53, in complete > self.deferred.callback(res) > File "/usr/lib/python2.5/site-packages/twisted/internet/ > defer.py", line 239, in callback > self._startRunCallbacks(result) > File "/usr/lib/python2.5/site-packages/twisted/internet/ > defer.py", line 304, in _startRunCallbacks > self._runCallbacks() > --- <exception caught here> --- > File "/usr/lib/python2.5/site-packages/twisted/internet/ > defer.py", line 317, in _runCallbacks > self.result = callback(self.result, *args, **kw) > File "/usr/lib/python2.5/site-packages/allmydata/immutable/ > upload.py", line 379, in _got_response > return self._loop() > File "/usr/lib/python2.5/site-packages/allmydata/immutable/ > upload.py", line 302, in _loop > self.shares_of_happiness) > allmydata.interfaces.NotEnoughSharesError: peer selection failed > for <Tahoe2PeerSelector for upload jzhq3>: placed 0 shares > out of 10 total (10 homeless), sent 106 queries to 106 peers, 0 > queries placed some shares, 106 placed none, got 0 errors > ]'>

> > Sorry for the length. What's up? > > Thanks. > > Dimitri > > > > _______________________________________________ > tahoe-dev mailing list > tahoe-dev at allmydata.org > http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev From dyioulos at yahoo.com Thu Oct 1 10:12:48 2009 From: dyioulos at yahoo.com (Dimitri Yioulos) Date: Thu, 1 Oct 2009 10:12:48 -0700 (PDT) Subject: [tahoe-dev] Error all of a sudden In-Reply-To: Message-ID: <881961.59609.qm@web33107.mail.mud.yahoo.com> Zooko, I'm actually doing sysadmin work for someone who is a customer of allmydata.com. I'm using the OKFN grid partly for testing, and partly for use by my day job company. Sorry for any confusion, and thanks for the information. If there's no storage capacity available, wouldn't that pressage a need to 1) let users know ahead of time that no space is available, and: 2) to replace the error message I saw with a simple "No storage capacity available at this time", or something akin to that. Just a suggestion. Best, Dimitri --- On Thu, 10/1/09, Zooko Wilcox-O'Hearn wrote: > From: Zooko Wilcox-O'Hearn > Subject: Re: [tahoe-dev] Error all of a sudden > To: tahoe-dev at allmydata.org > Date: Thursday, October 1, 2009, 12:10 PM > Are you using the allmydata.com > production grid?? I didn't know that? > you are you a customer of allmydata.com .? I thought > you were using? > the OKFN grid.? Anyway, the allmydata.com production > grid is? > currently full, partially due to lots of servers being full > of data? > and partially due to some of the servers that do have > available disk? > being temporarily off-line.? When a grid is full, > writes (and backups? > and repairs) fail but reads works.? allmydata.com is > working on? > adding capacity and bringing up the offline servers. > > If you are using the OKFN grid then maybe *it* is full. > > Regards, > > Zooko > > On Thursday,2009-10-01, at 9:24 , Dimitri Yioulos wrote: > > > Hello all. > > > > When trying to do a backup during the past couple of > days, I get? > > the following error(s): > > > > dyioulos at server:~$ pfexec tahoe backup /usr/local/doc > tahoe:bpbackup > > Traceback (most recent call last): > >???File > "/usr/local/allmydata-tahoe-1.5.0/support/bin/tahoe", > line? > > 8, in > >? > ???load_entry_point('allmydata-tahoe==1.5.0', > 'console_scripts',? > > 'tahoe')() > >???File > "/usr/local/allmydata-tahoe-1.5.0/src/allmydata/scripts/ > > runner.py", line 91, in run > >? ???rc = runner(sys.argv[1:]) > >???File > "/usr/local/allmydata-tahoe-1.5.0/src/allmydata/scripts/ > > runner.py", line 78, in runner > >? ???rc = > cli.dispatch[command](so) > >???File > "/usr/local/allmydata-tahoe-1.5.0/src/allmydata/scripts/ > > cli.py", line 456, in backup > >? ???rc = > tahoe_backup.backup(options) > >???File > "/usr/local/allmydata-tahoe-1.5.0/src/allmydata/scripts/ > > tahoe_backup.py", line 370, in backup > >? ???return bu.run() > >???File > "/usr/local/allmydata-tahoe-1.5.0/src/allmydata/scripts/ > > tahoe_backup.py", line 212, in run > >? ???new_backup_dircap = > self.process(options.from_dir,? > > latest_backup_dircap) > >???File > "/usr/local/allmydata-tahoe-1.5.0/src/allmydata/scripts/ > > tahoe_backup.py", line 265, in process > >? ???newfilecap, metadata = > self.upload(childpath) > >???File > "/usr/local/allmydata-tahoe-1.5.0/src/allmydata/scripts/ > > tahoe_backup.py", line 352, in upload > >? ???raiseHTTPError("Error during > file PUT", resp) > >???File > "/usr/local/allmydata-tahoe-1.5.0/src/allmydata/scripts/ > > tahoe_backup.py", line 18, in raiseHTTPError > >? ???raise HTTPError(msg) > > allmydata.scripts.tahoe_backup.HTTPError: Error during > file PUT:? > > 500 Internal Server Error > Exception > > head>

class="error"><class? > > 'foolscap.tokens.RemoteException'>: > <RemoteException around? > > '[CopiedFailure instance: Traceback from remote host > -- Traceback? > > (most recent call last): > >???File > "/usr/lib/python2.5/site-packages/foolscap/call.py", > line? > > 667, in _done > >? ???self.request.complete(res) > >???File > "/usr/lib/python2.5/site-packages/foolscap/call.py", > line? > > 53, in complete > >? ???self.deferred.callback(res) > >???File > "/usr/lib/python2.5/site-packages/twisted/internet/ > > defer.py", line 239, in callback > >? > ???self._startRunCallbacks(result) > >???File > "/usr/lib/python2.5/site-packages/twisted/internet/ > > defer.py", line 304, in _startRunCallbacks > >? ???self._runCallbacks() > > --- <exception caught here> --- > >???File > "/usr/lib/python2.5/site-packages/twisted/internet/ > > defer.py", line 317, in _runCallbacks > >? ???self.result = > callback(self.result, *args, **kw) > >???File > "/usr/lib/python2.5/site-packages/allmydata/immutable/ > > upload.py", line 379, in _got_response > >? ???return self._loop() > >???File > "/usr/lib/python2.5/site-packages/allmydata/immutable/ > > upload.py", line 302, in _loop > >? ???self.shares_of_happiness) > > allmydata.interfaces.NotEnoughSharesError: peer > selection failed? > > for <Tahoe2PeerSelector for upload > jzhq3>: placed 0 shares? > > out of 10 total (10 homeless), sent 106 queries to 106 > peers, 0? > > queries placed some shares, 106 placed none, got 0 > errors > > ]'>

class="stackTrace">
> name="tracebackEnd">

class="error"><class? > > 'foolscap.tokens.RemoteException'>: > <RemoteException around? > > '[CopiedFailure instance: Traceback from remote host > -- Traceback? > > (most recent call last): > >???File > "/usr/lib/python2.5/site-packages/foolscap/call.py", > line? > > 667, in _done > >? ???self.request.complete(res) > >???File > "/usr/lib/python2.5/site-packages/foolscap/call.py", > line? > > 53, in complete > >? ???self.deferred.callback(res) > >???File > "/usr/lib/python2.5/site-packages/twisted/internet/ > > defer.py", line 239, in callback > >? > ???self._startRunCallbacks(result) > >???File > "/usr/lib/python2.5/site-packages/twisted/internet/ > > defer.py", line 304, in _startRunCallbacks > >? ???self._runCallbacks() > > --- <exception caught here> --- > >???File > "/usr/lib/python2.5/site-packages/twisted/internet/ > > defer.py", line 317, in _runCallbacks > >? ???self.result = > callback(self.result, *args, **kw) > >???File > "/usr/lib/python2.5/site-packages/allmydata/immutable/ > > upload.py", line 379, in _got_response > >? ???return self._loop() > >???File > "/usr/lib/python2.5/site-packages/allmydata/immutable/ > > upload.py", line 302, in _loop > >? ???self.shares_of_happiness) > > allmydata.interfaces.NotEnoughSharesError: peer > selection failed? > > for <Tahoe2PeerSelector for upload > jzhq3>: placed 0 shares? > > out of 10 total (10 homeless), sent 106 queries to 106 > peers, 0? > > queries placed some shares, 106 placed none, got 0 > errors > > > ]'>

> > > > Sorry for the length.? What's up? > > > > Thanks. > > > > Dimitri > > > > > > > > _______________________________________________ > > tahoe-dev mailing list > > tahoe-dev at allmydata.org > > http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev > > _______________________________________________ > tahoe-dev mailing list > tahoe-dev at allmydata.org > http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev > From francois at ctrlaltdel.ch Thu Oct 1 12:11:44 2009 From: francois at ctrlaltdel.ch (Francois Deppierraz) Date: Thu, 01 Oct 2009 21:11:44 +0200 Subject: [tahoe-dev] Android Tahoe-LAFS client Message-ID: <4AC4FEF0.9050206@ctrlaltdel.ch> Hi folks, An experimental version of a Tahoe-LAFS client running on the Android platform is now available for installation directly from the market (lookup "tahoe"). File browsing, download and upload is currently supported. It's pretty young right now so you have to expect bugs here and there. A running and accessible tahoe node is required because this application connects to the WAPI, it's not a Tahoe node. The TestGrid is configured by default until you modify the settings. The source code is available on github: http://github.com/ctrlaltdel/TahoeLAFS-android I'd be happy to receive feedback from interested users, bug reports, ideas, flames. Fran?ois From zookog at gmail.com Thu Oct 1 13:26:02 2009 From: zookog at gmail.com (Zooko O'Whielacronx) Date: Thu, 1 Oct 2009 14:26:02 -0600 Subject: [tahoe-dev] Android Tahoe-LAFS client In-Reply-To: <4AC4FEF0.9050206@ctrlaltdel.ch> References: <4AC4FEF0.9050206@ctrlaltdel.ch> Message-ID: Fran?ois: Sweet! Way to go! Will you please send this announcement to tahoe-announce at allmydata.org as well? By the way, this reminds me that we probably need to rename the thing which acts as a Tahoe-LAFS storage protocol client and a Tahoe-LAFS HTTP protocol server from "client" to "gateway" (or something). See how you had to struggle to explain whether it was just the *client* client or the LAFS-client-and-HTTP-server that runs on Android? The same problem arises when sketching on a napkin the Network (and Reliance) Topoogy Diagram: [1]. So unless someone has a better idea, I'm going to start calling the thing in the middle there the "gateway" and stop calling it "the client" (even though it *is* an LAFS/SSL/TCP/IP client). Regards, Zooko [1] http://testgrid.allmydata.org:3567/file/URI%3ACHK%3A6opfg3uvhrvixw7c5u7kye3bjq%3Aeiqvmup7yog2rlwa2acoom2jrmgciuwr5bw5vkemxuqyg2jfbsqq%3A3%3A10%3A50638/@@named=/network-and-reliance-topology-thumb512.png On Thu, Oct 1, 2009 at 1:11 PM, Francois Deppierraz wrote: > Hi folks, > > An experimental version of a Tahoe-LAFS client running on the Android > platform is now available for installation directly from the market > (lookup "tahoe"). File browsing, download and upload is currently supported. > > It's pretty young right now so you have to expect bugs here and there. > > A running and accessible tahoe node is required because this application > connects to the WAPI, it's not a Tahoe node. The TestGrid is configured > by default until you modify the settings. > > The source code is available on github: > > http://github.com/ctrlaltdel/TahoeLAFS-android > > I'd be happy to receive feedback from interested users, bug reports, > ideas, flames. > > Fran?ois > _______________________________________________ > tahoe-dev mailing list > tahoe-dev at allmydata.org > http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev > From trac at allmydata.org Fri Oct 2 09:24:57 2009 From: trac at allmydata.org (tahoe-lafs) Date: Fri, 02 Oct 2009 16:24:57 -0000 Subject: [tahoe-dev] [tahoe-lafs] #811: fix fuse impl_c a.k.a. blackmatch Message-ID: <037.4441bf2f35856bdde472d9ee51bef65a@allmydata.org> #811: fix fuse impl_c a.k.a. blackmatch ---------------------------+------------------------------------------------ Reporter: zooko | Owner: Type: defect | Status: new Priority: major | Milestone: undecided Component: code-frontend | Version: 1.5.0 Keywords: review fuse | Launchpad_bug: ---------------------------+------------------------------------------------ Thomas Delaet contributed a patch to make fuse impl_c a.k.a. blackmatch work: http://allmydata.org/pipermail/tahoe-dev/2009-September/002923.html I've downloaded his patch from that email and attached it to this ticket. Please review and apply! -- Ticket URL: tahoe-lafs secure decentralized file storage grid From shawn at willden.org Fri Oct 2 10:38:22 2009 From: shawn at willden.org (Shawn Willden) Date: Fri, 2 Oct 2009 11:38:22 -0600 Subject: [tahoe-dev] Removing the dependency of immutable read caps on UEB computation Message-ID: <200910021138.22641.shawn@willden.org> I'd like to have a little discussion on whether or not it makes sense in the new immutable cap design to remove the dependency on UEB computation. As background for any who aren't familiar with it, and to confirm my own understanding, the UEB, or URI Extension Block, is a block of hashes that provides strong, multi-way integrity verification of the immutable file. Specifically, it contains: 1. The root of a Merkle tree on the file plaintext 2. A flat hash of the file plaintext 3. The root of a Merkle tree on the file ciphertext 4. A flat hash of the file ciphertext 5. Roots of Merkle trees on each share of the FEC-encoded ciphertext That's a lot of hashes, and it provides strong integrity guarantees. It provides a way to verify the integrity of the plaintext, the ciphertext and each encoded share of the ciphertext. That's all very good. A copy of the UEB is stored with each share. The current immutable read cap design embeds a hash of the UEB in the URI. Indeed, this 32-byte hash is comprises most of the length of current immutable read caps. David-Sarah Hopwood's Elk Point design applies Zooko's ideas about how to combine security and integrity parameters to make the UEB hash 'implicit' in the read and verify caps, but it's still present. The disadvantage of including the UEB hash in the read and verify caps, whether explicitly or implicitly, is that it means that FEC coding must be completed before the caps can be generated. This is unfortunate, because without it, it would be possible to efficiently compute read caps separate from the upload process, and even long before the upload is performed. I can think of many applications for that. The larger issue, though, is that the present design binds a given read cap to a specific choice of encoding parameters. This makes it impossible to change those parameters later, to accommodate for changing reliability requirements or changing grid size/structure, without finding a way to update all extant copies of the original cap, wherever they may be held. To address these issues, I propose splitting the UEB into two parts, one part that contains the plaintext and ciphertext hashes, and another that contains the share tree roots and the encoding parameters. Call them UEB1 and UEB2. UEB1 and any values derived from it can then be computed without doing FEC computations, and without choosing specific encoding parameters. Based on UEB1, a client with the verify cap can verify the assembled ciphertext and a client with the read cap can verify the decrypted plaintext. What they can't do is to verify the integrity of a specific share. Putting the UEB2 in the shares is the proximate solution to share validation, but raises the issue of how to validate the UEB2. Since it would be undesirable to allow anyone with read access to the file the ability to fake valid UEB2s, this requires introduction of an additional cap, a "share update" cap, which is not derivable from the read or verify caps. I suppose you could also call it a "repair cap". One way to do this, using the nomenclature from David-Sarah's Elk Point immutable diagram, is to add a W key, from which K1 is derived by hashing. In addition, an ECDSA key pair is derived from W. The UEB2 is signed with the ECDSA private key, and the signature is the UEB2 verifier, stored with each share. The "share update" cap would consist of the SID and the private key. W could also be used as a 'master' cap from which all others can be derived. Another possibility is to use the Elk Point mutable structure and fix the content by including the UEB1 data in the information hashed to produce T|U and signed to produce Sig_KR. To retain the idempotent-put characteristic of Tahoe immutable files, W can be a content hash, rather than a random value, and KD must be derived from W or omitted from the series of hashes that produces S. It may be valuable for both security analysis and code complexity to make mutable and immutable files be very similar in structure. The obvious downside of both of those approaches is that they introduce a need for asymmetric signatures, where immutable files previously required only hashing and symmetric encryption. I don't think there's any way to maintain share integrity while removing the dependency of the caps on FEC parameters. Personally, I think being able to re-structure the encoding without updating all of the caps is sufficient justification to accept the use of asymmetric signatures in immutable file buckets, and being able to generate caps without performing FEC computations is a very nice bonus. Comments? Shawn. From nejucomo at gmail.com Fri Oct 2 14:51:29 2009 From: nejucomo at gmail.com (Nathan) Date: Fri, 2 Oct 2009 14:51:29 -0700 Subject: [tahoe-dev] Android Tahoe-LAFS client In-Reply-To: <4AC4FEF0.9050206@ctrlaltdel.ch> References: <4AC4FEF0.9050206@ctrlaltdel.ch> Message-ID: <13f991410910021451t6bfe64e0haae295013d00400a@mail.gmail.com> Nice! I had one of those "wow, I live in the future" moments when I was browsing the public testgrid directory within a minute of reading your email! I haven't had luck uploading yet. Sounds like it's time to start learning how to do runtime android debugging. Nathan On Thu, Oct 1, 2009 at 12:11 PM, Francois Deppierraz wrote: > Hi folks, > > An experimental version of a Tahoe-LAFS client running on the Android > platform is now available for installation directly from the market > (lookup "tahoe"). File browsing, download and upload is currently supported. > > It's pretty young right now so you have to expect bugs here and there. > > A running and accessible tahoe node is required because this application > connects to the WAPI, it's not a Tahoe node. The TestGrid is configured > by default until you modify the settings. > > The source code is available on github: > > http://github.com/ctrlaltdel/TahoeLAFS-android > > I'd be happy to receive feedback from interested users, bug reports, > ideas, flames. > > Fran?ois > _______________________________________________ > tahoe-dev mailing list > tahoe-dev at allmydata.org > http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev > From warner at lothar.com Sat Oct 3 00:26:16 2009 From: warner at lothar.com (Brian Warner) Date: Sat, 03 Oct 2009 00:26:16 -0700 Subject: [tahoe-dev] Removing the dependency of immutable read caps on UEB computation In-Reply-To: <200910021138.22641.shawn@willden.org> References: <200910021138.22641.shawn@willden.org> Message-ID: <4AC6FC98.1080809@lothar.com> Shawn Willden wrote: > Specifically, it contains: > > 1. The root of a Merkle tree on the file plaintext > 2. A flat hash of the file plaintext > 3. The root of a Merkle tree on the file ciphertext > 4. A flat hash of the file ciphertext > 5. Roots of Merkle trees on each share of the FEC-encoded ciphertext Incidentally, we removed 1 and 2 forever ago, to squash the partial-information-guessing-attack. We'd like to bring them back, safely encrypted with the readcap, to detect integrity problems relating to having the wrong key or having a buggy AES implementation. > To address these issues, I propose splitting the UEB into two parts Interesting. As you point out, I'm not sure I like the introduction of an extra layer of caps (and an asymmetric key) into the immutable file scheme. It raises the question: who should hold onto these caps? Where should they put them? I suppose the original uploader of the file is the special party who then has the ability to re-encode it, but they'll have to store it somewhere, and it feels wasteful to put an extra layer of caps in the dirnodes (along with the writecap, readcap, and traversalcap) just to track an object that so few people will actually be able to use. Adding an asymmetric key might also introduce some new attack vectors. If I give you a readcap and claim that it points to a certain contract, and you sign that readcap to sign the contract, can I pull any tricks by also holding on to this newly-introduced signing key? I guess if the readcap covers UEB1, then I can't forge a document or cause you to sign something else, but I can produce shares that will look completely valid during fetch and decode but then fail the ciphertext check. That means I can make it awfully hard to actually download the document (since without an effective share hash, you can't know which were the bad shares, so you can try using other ones). (the structure for this would probably put H(UEB1|VerifyKey) in the readcap, and then store a signed UEB2 in each share). I guess we should figure out the use case here. Re-encoding the file is something that you'd want to do when the grid has changed in size, such that it is now appropriate to use different parameters than before, right? And if you're changing 'k', then you'll certainly need to replace all the existing shares. So the goal appears to be to do all the work of uploading a new copy of the file, but allow the old caps to start referencing the new version. Deriving the filecap without performing FEC doesn't feel like a huge win to me.. it's just a performance difference in testing for convergence, right? And if you (or someone you trust) uploaded the file originally, you (or they) could just retain a table mapping file hash to readcap (like tahoe's backupdb), letting you do this file-to-filecap computation even faster. I certainly see more value in being able to change the encoding parameters after the fact. But I'm kinda hopeful that there might be a way to allow re-encoding without such a big change (perhaps by allocating more space in the share-hash-tree, to allow same-k-bigger-N changes). I *am* intrigued by the idea of immutable files being just locked-down variants of mutable files. A mutable-file readcap plus a hash of the expected contents (i.e. H(UEB1)) would achieve this pretty well.. might not be too much longer than our current immutable readcaps, and we could keep the encoding-parameter-sensitive parts (UEB2) in the signed (and therefore mutable) portion, so they could be changed later. cheers, -Brian From francois at ctrlaltdel.ch Sat Oct 3 07:39:16 2009 From: francois at ctrlaltdel.ch (Francois Deppierraz) Date: Sat, 03 Oct 2009 16:39:16 +0200 Subject: [tahoe-dev] Android Tahoe-LAFS client In-Reply-To: References: <4AC4FEF0.9050206@ctrlaltdel.ch> Message-ID: <4AC76214.5030203@ctrlaltdel.ch> Zooko O'Whielacronx wrote: > Sweet! Way to go! Will you please send this announcement to > tahoe-announce at allmydata.org as well? I'd like to wait a little while before announcing it to a wider audience. Fran?ois From francois at ctrlaltdel.ch Sat Oct 3 07:55:21 2009 From: francois at ctrlaltdel.ch (Francois Deppierraz) Date: Sat, 03 Oct 2009 16:55:21 +0200 Subject: [tahoe-dev] Android Tahoe-LAFS client In-Reply-To: <13f991410910021451t6bfe64e0haae295013d00400a@mail.gmail.com> References: <4AC4FEF0.9050206@ctrlaltdel.ch> <13f991410910021451t6bfe64e0haae295013d00400a@mail.gmail.com> Message-ID: <4AC765D9.8030502@ctrlaltdel.ch> Hi, Nathan wrote: > I haven't had luck uploading yet. Sounds like it's time to start > learning how to do runtime android debugging. Uploading is currently done in the UI thread which is not recommended. This means that the whole application stall during a file upload, I plan to fix this one as soon as possible. In the meantime, uploading small files should probably work. If it crashed on your phone, I'm interested in the backtrace. When your phone is connected to your computer via a UBS cable, you should be able to get all the necessary logs with the 'adb' command available from the Android SDK. adb logcat debug You also can retrieve this log by running the 'SendLog' application directly on your phone. This application also support sending the log file by email. Thanks for giving it a try, Cheers, Fran?ois From zookog at gmail.com Sun Oct 4 07:52:38 2009 From: zookog at gmail.com (Zooko O'Whielacronx) Date: Sun, 4 Oct 2009 08:52:38 -0600 Subject: [tahoe-dev] Removing the dependency of immutable read caps on UEB computation In-Reply-To: <200910021138.22641.shawn@willden.org> References: <200910021138.22641.shawn@willden.org> Message-ID: Shawn: Interesting. The goal is to be able to upload a file with a different K encoding parameter (num shares require to reconstruct) but the same plaintext and the same (?) encryption key and have it match the same immutable file cap? And, as another goal, to make it faster to compute an immutable file cap from a plaintext and key? (I.e., you can compute an immutable file cap without producing all the erasure coding shares?) I'm not sure that I understood the goals. Hm, it seems like some solutions to that first goal would open the possibility of a "shapeshifting immutable file attack" -- the attack where the original creator of an immutable file makes it so that some downloaders see a different file than others see (like Christan Grothoff's Hall of Fame entry [1]). In order to be secure against a shapeshifting immutable file, the read-cap itself needs to contain something derived from the ciphertext (or even better, the plaintext). Also, would storage servers keep different "encoding parameter variants" of the same share? So one storage server might have, under storage index X, a share from the 3-of-10 encoding of that file as well as a share of the 6-of-10 encoding of the same file? And the downloader would specify in its request which encoding of the file it is looking for? I guess these desiderata (use same immutable file cap for different K, and compute immutable file cap efficiently) should be ticketed and/or added to http://allmydata.org/trac/tahoe/wiki/NewCapDesign . I don't think that we're going to fit all of the desired features into the NewCap format (especially because "simplicity of format" is one of the desired features!), but I would like to have thorough documentation of what things are desired and what the trade-offs are. Regards, Zooko [1] http://hacktahoe.org From shawn at willden.org Sun Oct 4 08:45:28 2009 From: shawn at willden.org (Shawn Willden) Date: Sun, 4 Oct 2009 09:45:28 -0600 Subject: [tahoe-dev] =?iso-8859-1?q?Removing_the_dependency_of_immutable_r?= =?iso-8859-1?q?ead_caps_on_UEB=09computation?= In-Reply-To: <4AC6FC98.1080809@lothar.com> References: <200910021138.22641.shawn@willden.org> <4AC6FC98.1080809@lothar.com> Message-ID: <200910040945.28444.shawn@willden.org> On Saturday 03 October 2009 01:26:16 am Brian Warner wrote: > Incidentally, we removed 1 and 2 forever ago, to squash the > partial-information-guessing-attack. Makes sense. The diagrams in the docs should be updated. > > To address these issues, I propose splitting the UEB into two parts > > Interesting. As you point out, I'm not sure I like the introduction of > an extra layer of caps (and an asymmetric key) into the immutable file > scheme. It raises the question: who should hold onto these caps? Where > should they put them? I suppose the original uploader of the file is the > special party who then has the ability to re-encode it, but they'll have > to store it somewhere, and it feels wasteful to put an extra layer of > caps in the dirnodes (along with the writecap, readcap, and > traversalcap) just to track an object that so few people will actually > be able to use. Since this is for immutable files, there is currently no writecap or traversalcap, just a readcap and perhaps a verifycap. This scheme would require either adding either a share-update cap or providing a master cap (from which share-update and read caps could be computed). > Adding an asymmetric key might also introduce some new attack vectors. > If I give you a readcap and claim that it points to a certain contract, > and you sign that readcap to sign the contract, can I pull any tricks by > also holding on to this newly-introduced signing key? I guess if the > readcap covers UEB1, then I can't forge a document or cause you to sign > something else, but I can produce shares that will look completely valid > during fetch and decode but then fail the ciphertext check. That means I > can make it awfully hard to actually download the document (since > without an effective share hash, you can't know which were the bad > shares, so you can try using other ones). This would allow the original uploader to do a sort of DoS attack on the file, but not to modify the contents. If the original shares (the ones I used when I decided to sign the cap) are still in the grid, I could still retrieve the original version, but it might be more difficult. If the original shares had expired and been removed from the storage servers, the original uploader could ensure that all extant shares are garbage. > And if you're changing 'k', then you'll certainly need to replace > all the existing shares. So the goal appears to be to do all the work of > uploading a new copy of the file, but allow the old caps to start > referencing the new version. Yes. Obviously, you could also change 'N' without changing 'k' -- something that might be possible with a sort of extended share hash tree anyway, but is not currently possible. And even with an extended share hash tree, you couldn't extend 'N' beyond whatever extra shares were computed during initial upload. Breaking the link between encoding choices and cap would allow arbitrary re-encoding -- and perhaps even completely different representations. > Deriving the filecap without performing FEC doesn't feel like a huge win > to me.. it's just a performance difference in testing for convergence, > right? No, it's more than that. It allows you to produce and store caps for files that haven't been uploaded to the grid yet. You can make a "this is where the file will be if it ever gets added" cap. Also, it would be possible to do it without the actual file contents, just the right hashes, which can make a huge performance difference in testing for convergence if the actual file doesn't have to be delivered to the Tahoe node doing the testing. > I *am* intrigued by the idea of immutable files being just locked-down > variants of mutable files. I think there's a lot of value in that. Shawn. From shawn at willden.org Sun Oct 4 08:53:38 2009 From: shawn at willden.org (Shawn Willden) Date: Sun, 4 Oct 2009 09:53:38 -0600 Subject: [tahoe-dev] Removing the dependency of immutable read caps on UEB computation In-Reply-To: References: <200910021138.22641.shawn@willden.org> Message-ID: <200910040953.38572.shawn@willden.org> On Sunday 04 October 2009 08:52:38 am Zooko O'Whielacronx wrote: > Interesting. The goal is to be able to upload a file with a different > K encoding parameter (num shares require to reconstruct) but the same > plaintext and the same (?) encryption key and have it match the same > immutable file cap? Different N or K. Or perhaps a different structure entirely (not sure why we'd ever want to go there). > And, as another goal, to make it faster to compute an immutable file > cap from a plaintext and key? (I.e., you can compute an immutable > file cap without producing all the erasure coding shares?) You could compute an immutable file cap without even having the plaintext, just hashes of plaintext and ciphertext. > In order to be secure against a > shapeshifting immutable file, the read-cap itself needs to contain > something derived from the ciphertext (or even better, the plaintext). Yes, I suggested that the plaintext and ciphertext hashes be part of the cap derivation. Allowing re-encoding just requires excluding the share hashes. > So one storage server might have, under > storage index X, a share from the 3-of-10 encoding of that file as > well as a share of the 6-of-10 encoding of the same file? And the > downloader would specify in its request which encoding of the file it > is looking for? I imagine the downloader would first ask what shares the storage server has, then indicate which ones it wants. > I don't > think that we're going to fit all of the desired features into the > NewCap format (especially because "simplicity of format" is one of the > desired features!) I think the "immutable files are just locked-down mutable files" approach helps with the simplicity of format goal. Shawn. From zooko at zooko.com Sun Oct 4 12:43:28 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Sun, 4 Oct 2009 13:43:28 -0600 Subject: [tahoe-dev] warning: don't rely on the Test Grid Message-ID: <3472B9FE-D8BB-40F2-9E7F-A99C18C222D5@zooko.com> Folks: allmydata.com is probably going to redirect the Test Grid servers that it runs to serve its paying customers on its Prod Grid instead. Download everything you value from Test Grid! Unless someone tells me otherwise, I'm assuming that nobody is storing anything precious on Test Grid and I can turn off those servers at any time. If you have something there that you can't quickly download, let me know and I'll leave those servers up until you're done. Regards, Zooko From warner at lothar.com Sun Oct 4 13:25:53 2009 From: warner at lothar.com (Brian Warner) Date: Sun, 04 Oct 2009 13:25:53 -0700 Subject: [tahoe-dev] Removing the dependency of immutable read caps on UEB computation In-Reply-To: <200910040945.28444.shawn@willden.org> References: <200910021138.22641.shawn@willden.org> <4AC6FC98.1080809@lothar.com> <200910040945.28444.shawn@willden.org> Message-ID: <4AC904D1.4040705@lothar.com> Shawn Willden wrote: > On Saturday 03 October 2009 01:26:16 am Brian Warner wrote: >> Incidentally, we removed 1 and 2 forever ago, to squash the >> partial-information-guessing-attack. > > Makes sense. The diagrams in the docs should be updated. Yeah, I'll see if I can get to that today. > Since this is for immutable files, there is currently no writecap or > traversalcap, just a readcap and perhaps a verifycap. This scheme > would require either adding either a share-update cap or providing a > master cap (from which share-update and read caps could be computed). So, one suggestion that follows would be to store the immutable "share-update" cap in the same dirnode column that contains writecaps for mutable files. Hm. Part of me says ok, part of me says that's bad parallelism. Why should a mutable-directory writecap-holder then get access to the re-encoding caps of the enclosed immutable files? Again, it gets back to the policy decision that distinguishes re-encoding-cap holders from read-cap holders: who would you give one-but-not-the-other to, and why? When would you be willing to be vulnerable to [whatever it is that a re-encoding cap allows] in exchange for allowing someone else to help you with [whatever it is that a re-encoding cap allows]? That sort of thing. (incidentally, I'm not fond of the term "master cap", because it doesn't actually convey what authorities the cap provides.. it just says that it provides more authority than any other cap. "re-encoding cap" feels more meaningful to me. I suppose it's possible to have a re-encoding cap which doesn't also provide the ability to read the file, in which case the master cap that lives above both re-encoding- and read- caps could be called the read-and-re-encode-cap, or something). >> Deriving the filecap without performing FEC doesn't feel like a huge >> win to me.. it's just a performance difference in testing for >> convergence, right? > > No, it's more than that. It allows you to produce and store caps for > files that haven't been uploaded to the grid yet. You can make a "this > is where the file will be if it ever gets added" cap. I still don't follow. You could hash+encrypt+FEC, produce shares, hash the shares, produce the normal CHK readcap, and then throw away the shares (without ever touching the network): this gives you caps for files that haven't been uploaded to the grid yet. Removing the share hashes just reduces the amount of work you have to do to get the readcap (no FEC). > Also, it would be possible to do it without the actual file contents, > just the right hashes, which can make a huge performance difference in > testing for convergence if the actual file doesn't have to be > delivered to the Tahoe node doing the testing. Hm, we're assuming a model in which the full file is available to some process A, and that there is a Tahoe webapi-serving node running in process B, and that A and B communicate, right? So part of the goal is to reduce the amount of data that goes between A and B? Or to make it possible for A to do more stuff without needing to send a lot of data to node B? In that case, I'm not sure I see as much of an improvement as you do. A has to provide B with a significant amount of uncommon data about the file to compute the FEC-less readcap: A must encrypt the file with the right key, segment it correctly (and the segment size must be a multiple of 'k'), build the merkle tree, and then deliver both the flat hashes and the whole merkle tree. This makes it sounds like there's a considerable amount of Tahoe-derived code running locally on A (so it can produce this information in the exact same way that B will eventually do so). In fact it starts to sound more and more like a Helper-ish relationship: some Tahoe code on A, some other Tahoe code over on B. If you've got help from your local filesystem to compute and store those uncommon hashes, then this might help. Or if you've got some other system on that side (like, say, tahoe's backupdb) to remember things for you, then it might work. But if you have those, why not just store the whole filecap there? (hey, wouldn't it be cool if local filesystems would let you store a bit of metadata about the file which would be automatically deleted if the file's contents were changed?) Hm, it sounds like some of the use case might be addressed by making it easier to run additional code in the tahoe node (i.e. a tahoe plugin), which might then let you move "B" over to where "A" is, and then generally tell the tahoe node to upload/examine files directly from disk instead of over an HTTP control+data channel. still intrigued, -Brian From zooko at zooko.com Sun Oct 4 15:23:25 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Sun, 4 Oct 2009 16:23:25 -0600 Subject: [tahoe-dev] Tahoe-LAFS was presented at HadoopWorld last Friday Message-ID: <40908B4E-F8E8-465A-9793-355B373AEB41@zooko.com> Aaron Cordova showed off Hadoop-LAFS at HadoopWorld last week. Here is someone's blog about HadoopWorld: http://four-bits.com/ You can find Hadoop-LAFS on our Related Projects page: http://allmydata.org/trac/tahoe/wiki/RelatedProjects Regards, Zooko From shawn at willden.org Sun Oct 4 17:49:23 2009 From: shawn at willden.org (Shawn Willden) Date: Sun, 4 Oct 2009 18:49:23 -0600 Subject: [tahoe-dev] Removing the dependency of immutable read caps on UEB computation In-Reply-To: <4AC904D1.4040705@lothar.com> References: <200910021138.22641.shawn@willden.org> <200910040945.28444.shawn@willden.org> <4AC904D1.4040705@lothar.com> Message-ID: <200910041849.24113.shawn@willden.org> On Sunday 04 October 2009 02:25:53 pm Brian Warner wrote: > So, one suggestion that follows would be to store the immutable > "share-update" cap in the same dirnode column that contains writecaps > for mutable files. Perhaps. I think re-encoding caps would have more a more specialized purpose. They would be needed by a repair system. > I suppose it's possible to have a re-encoding cap > which doesn't also provide the ability to read the file The re-encoding caps I described would not provide the ability to decrypt the file, only to re-encode the ciphertext. > in which case > the master cap that lives above both re-encoding- and read- caps could > be called the read-and-re-encode-cap, or something). read-and-re-encode-and-verify. 'master' is much shorter :) > I still don't follow. You could hash+encrypt+FEC, produce shares, hash > the shares, produce the normal CHK readcap, and then throw away the > shares (without ever touching the network): this gives you caps for > files that haven't been uploaded to the grid yet. But you also have to decide what encoding parameters to use. I want to separate that decision, because I want to allow encoding decisions to be made based on reliability requirements, performance issues, grid size and perhaps even server reliability estimates. Many of those factors are only known at the point of upload. > Hm, we're assuming a model in which the full file is available to some > process A, and that there is a Tahoe webapi-serving node running in > process B, and that A and B communicate, right? So part of the goal is > to reduce the amount of data that goes between A and B? Or to make it > possible for A to do more stuff without needing to send a lot of data to > node B? > In that case, I'm not sure I see as much of an improvement as you do. A > has to provide B with a significant amount of uncommon data about the > file to compute the FEC-less readcap: A must encrypt the file with the > right key, segment it correctly (and the segment size must be a multiple > of 'k'), build the merkle tree, and then deliver both the flat hashes > and the whole merkle tree. This makes it sounds like there's a > considerable amount of Tahoe-derived code running locally on A (so it > can produce this information in the exact same way that B will > eventually do so). In fact it starts to sound more and more like a > Helper-ish relationship: some Tahoe code on A, some other Tahoe code > over on B. Hmm. I didn't realize that segment size was dependent on 'k'. I thought segments were fixed at 128 KiB? Or is that buckets? Or blocks? I'm still quite hazy on the precise meaning of bucket and block. This is a very good point, though. I wouldn't want 'A' to have to understand Tahoe's segmentation decisions. I'm not sure why it feels acceptable to have it know Tahoe's encryption and hash tree generation in detail, but not segmentation. Maybe because segment sizes have changed in the past and it seems reasonable that they might change again in the future -- perhaps even get chosen dynamically at some point? It's probably better to assume that all of this knowledge is only in Tahoe and the client has to provide the plaintext in order to get a cap. > (hey, wouldn't it be cool if local filesystems > would let you store a bit of metadata about the file which would be > automatically deleted if the file's contents were changed?) That *would* be cool. > Hm, it sounds like some of the use case might be addressed by making it > easier to run additional code in the tahoe node (i.e. a tahoe plugin), > which might then let you move "B" over to where "A" is, and then > generally tell the tahoe node to upload/examine files directly from disk > instead of over an HTTP control+data channel. That would be very useful. I have to make copies of files before uploading them anyway, so that they don't change while uploading (because I map the file content hash to a read cap, so I need to make absolutely sure that the file uploaded is the same one I hashed), and then Tahoe has to make another copy before it can encode, so being able to tell Tahoe where to grab it from the file system would reduce the number of copies by one. On the "plugin" point, I'm thinking that I want to implement my backup server as a Tahoe plugin. I'm not sure it makes sense to implement it as a part of Tahoe, because Tahoe is a more general-purpose system. From a practical perspective, though, my backup server is (or will be) a Twisted application, it should live right next to a Tahoe node, and it should start up whenever the Tahoe node starts and stop whenever the Tahoe node stops. Seems like a good case for a plugin. Shawn. From robk at allmydata.com Sun Oct 4 17:56:55 2009 From: robk at allmydata.com (Rob Kinninmont) Date: Mon, 5 Oct 2009 01:56:55 +0100 Subject: [tahoe-dev] N.B. Garbage Collection; TestGrid and ProdGrid changes Message-ID: IMPORTANT: please read the first few paragraphs of this message, to ensure the integrity of your data. Thank you. I will this week be commencing a Garbage Collection pass over the allmydata.com production grid. Additionally we will be repurposing storage from the test grid into the production grid. If your files are all stored in directories rooted at an allmydata.com account's "root dir", and you have no important data stored in the test grid, you do not need to take any action. If you are one of the small number of allmydata.com users who are storing files in directories whose root caps you have kept private, then you will need to act promptly to add leases to those files. More details on that below. If you have data stored in the test grid which you are not comfortable losing, then you should act promptly to download and backup that data via some other means. The test grid is considered 'scratch' space for development and testing, and should not be relied upon for storage of important data, but I'm providing this 'heads up' in case anyone is relying upon that storage. If you have any questions or concerns, please don't hesitate to raise them. Thanks for your attention. cheers, rob More detailed information: -------------------------- Test Grid: We're going to be reducing the capacity of the test grid in a few days, and this will result in a loss of some of the data contained therein. The test grid should remain functional for ongoing testing and development throughout, but you should not rely on any data currently within the test grid remaining accessible. There's also a chance that it might case false negatives in tests which are running at the time that the capacity is reduced. If you have any data not otherwise backed up, you should download it as soon as possible. If you have any questions or concerns about this, please let me know. Production Grid Garbage Collection: Storage on the grid is technically _leased_ from the storage servers, and barring failures, the storage servers agree to keep data available until the expiration of any leases upon it. After that point the data is a candidate for storage reclamation, and in principle might be deleted at any time. The garbage collection process is a standard 'mark and sweep'. All data which should be retained is 'marked' by adding a fresh lease to it. Once the mark phase is complete, then the 'sweep' phase reclaims space by deleting any data which has no unexpired leases upon it. All data stored through the allmydata.com web service and backup client are stored within a directory structure the root of which is recorded in the user's allmdata.com account. I will be updating leases on all user data so reachable for current paid-up accounts before commencing the sweep. Hence if you only use the allmydata.com service via the website and backup client you do not need to act further. A small number of you, as tahoe developers, will have perhaps used other features of tahoe to store data which is not reachable from such an account; e.g. you have created a private directory, for example accessed through the alias mechanism of the 'tahoe' command, and that directory is not reachable from an account's root directory. If that applies to you, you will need to add leases to that data yourself before the sweep process runs. It will probably take a week or two to 'mark' all the allmydata.com accounts, but you should act as soon as possible to ensure your leases are updated. For each private rootcap you have, you will need to run tahoe deep-check --add-lease $ROOTCAP This might take a while to run, dependant on how many files and directories are involved. The deep-check runs at about 1-2 checks per second (dependant on round-trip latency to the production grid storage servers). You can obtain summary statistics by adding --raw, i.e. 'tahoe deep-check --raw --add-lease $ROOTCAP' which will print out statistics in JSON format. If you have any questions, concerns, or encounter problems with the process of adding leases to your data, please let me know. It would be helpful and instructive for us to get an idea of how many people are using private rootcaps to maintain data on the production storage grid. While it's not necessary (unlike the add-lease which _is_ necessary) it would be appreciated if you could drop me a line if this applies to you, perhaps including an estimate of how much data is involved. Thanks for reading. cheers, rob From warner at lothar.com Sun Oct 4 22:06:57 2009 From: warner at lothar.com (Brian Warner) Date: Sun, 04 Oct 2009 22:06:57 -0700 Subject: [tahoe-dev] Removing the dependency of immutable read caps on UEB computation In-Reply-To: <200910041849.24113.shawn@willden.org> References: <200910021138.22641.shawn@willden.org> <200910040945.28444.shawn@willden.org> <4AC904D1.4040705@lothar.com> <200910041849.24113.shawn@willden.org> Message-ID: <4AC97EF1.2080802@lothar.com> Shawn Willden wrote: > Hmm. I didn't realize that segment size was dependent on 'k'. I > thought segments were fixed at 128 KiB? Or is that buckets? Or blocks? > I'm still quite hazy on the precise meaning of bucket and block. 128KiB is the *maximum* segment size. The actual size is (I think): round_up_to_multiple_of(k, min(filesize, 128KiB)) The deal is that each segment will be effectively split into 'k' pieces (plus N-k redundant blocks), so the segment needs to be a multiple of 'k' in size. The alacrity is directly related to the segment size, so we put an upper bound on it. And efficiency goes down as the number of segments rises, so we use just one segment if we can. When we talk about segments, we're talking about segments of plaintext and/or ciphertext. The segment is what goes into FEC. The output of FEC is called a block, so there's NUMSEGS blocks in each share (and N shares per file). Storage servers keep track of buckets, each one labeled with a storage index. Each bucket has one or more shares (all for the same file). The client asks a storage server for access to a bucket by naming a storage-index, and the server responds with a list of all the shareids that it has for that SI (i.e. all the shares in that bucket). At least, that's the way we've been using the terms in Tahoe (not always consistently, I'm afraid). >> Hm, it sounds like some of the use case might be addressed by making >> it easier to run additional code in the tahoe node (i.e. a tahoe >> plugin), > On the "plugin" point, I'm thinking that I want to implement my backup > server as a Tahoe plugin. I'm not sure it makes sense to implement it > as a part of Tahoe, because Tahoe is a more general-purpose system. > From a practical perspective, though, my backup server is (or will be) > a Twisted application, it should live right next to a Tahoe node, and > it should start up whenever the Tahoe node starts and stop whenever > the Tahoe node stops. Seems like a good case for a plugin. So, the plugin idea I had was to have tahoe.cfg name the plugins that you want to load (as well as any plugin-specific configuration to use), then import the code with one of the various plugin frameworks that we've got floating around (twisted.python.plugin, setuptools+entrypoints, whatever mercurial uses, whatever trac uses). I designed the tahoe node as a hierarchy of twisted.application.service instances in anticipation of this.. the 'Client' service is the one that gives you an API to upload/download files. The plugin would then be a Service instance that gets attached as a service-child of that Client. This would basically give it start+stop hooks, and then it could e.g. upload files with: d = self.parent.upload(UPLOADABLE) We'd need to think of some convenient ways to let plugins do more than that.. probably add some sort of hooks into the webapi dispatcher, so the plugin could have a status page and some CLI controls. cheers, -Brian From warner at lothar.com Sun Oct 4 22:47:56 2009 From: warner at lothar.com (Brian Warner) Date: Sun, 04 Oct 2009 22:47:56 -0700 Subject: [tahoe-dev] Bringing Tahoe ideas to HTTP In-Reply-To: <4AB1F1EF.5010909@lothar.com> References: <4A970144.8020709@lothar.com> <27968C75-D2C5-43A8-B191-4AB1AFEBBF2C@solarsail.hcs.harvard.edu> <4AB01F65.9010601@echeque.com> <74F062C2-FDC6-4077-BA47-310299DA6344@zooko.com> <4AB1F1EF.5010909@lothar.com> Message-ID: <4AC9888C.8080901@lothar.com> (If you missed the start of this thread, look here: http://allmydata.org/pipermail/tahoe-dev/2009-August/002724.html and http://allmydata.org/pipermail/tahoe-dev/2009-September/002804.html) After a chat with Tyler Close, a few more ideas came to mind. He pointed out that a stubborn problem with web applications these days is that the HTTP browser caches are not doing as much good as developers expect. Despite frequently-used things like "jquery.js" being cacheable, a substantial portion of clients are showing up to the server empty-handed, and must re-download the library on pretty much every visit. One theory is that the browser caches are too small, and pretty much everything is marked as being cacheable, so it's just simple cache-thrashing. There's no way for the server to say "I'll be telling you to load jquery.js a lot, so prioritize it above everything else, and keep it in cache as long as you can". And, despite hundreds of sites all using the exact same copy of jquery.js, there's no way to share those cached copies, increasing the cache pressure even more. Google is encouraging all web developers to pull jquery.js from a google.com server, to reduce the pressure, but of course that puts you at their mercy from a security point of view: they (plus everyone else that can meddle with your bits on the wire) can inject arbitrary code into that copy of jquery.js, and compromise millions of secure pages. So the first idea is that the earlier "#hash=XYZ" URL annotation could be considered as a performance-improving feature. Basically the browser's cache would have an additional index using the "XYZ" secure hash (and *not* the hostname or full URL) as the key. Any fetch that occurs with this same XYZ annotation could be served from the local cache, without touching the network. As long as the previously-described rules were followed (downloads of a #hash=XYZ URL are validated against the hash, and rejected on mismatch), then the cache could only be populated with validated files, and this would be a safe way to share common files between sites. The second idea involves some of the capability-security work, specifically Mark Miller's "Caja" group which has developed a secure subset of JavaScript. Part of the capability world's efforts are to talk about different properties that a given object has, as determined by mechanical auditing of the code that implements that object. One of these properties is called "DeepFrozen", which basically means that the object has no mutable state and has no access to mutable state. If Alice and Bob (who are both bits of code, in this example) share access to a DeepFrozen object, there's no way for one of them to affect the other through that object: they might as well have two identical independent copies of the same object. The common "memoization" technique depends upon the function being optimized to be DeepFrozen, to make sure that it will always produce the same output for any given input. (note that this doesn't mean that the object can't create, say, a mutable array and manipulate it while the function runs.. it just means that it can't retain that array from one invocation to the next) So the second idea is that, if your #hash=XYZ-validated jquery.js library can be proven to be DeepFrozen (say, by passing it through the Caja verifier with a flag that says "only accept DeepFrozen classes" or something), then not only can you cache the javascript source code, but you can also cache the parse tree, saving you the time and memory needed to re-parse and evaluate the same source code on every single page load. (incidentally, it is quite likely that jquery.js would pass a DeepFrozen auditor, or could be made to do so fairly easily: anything that's written in a functional style will avoid using much mutable state) This requires both the DeepFrozen property (which makes it safe to share the parsed data structure) and the #hash=XYZ validation (which insures that the data structure was really generated from the right JS source code). I know that one of the Caja goals is to eventually get the verifier functionality into the browser, since that's where it can do the most good. If that happened, then the performance improvements to be had by writing verifiable code and using strong URL references could be used as a carrot to draw developers into using these new techniques. thoughts? cheers, -Brian From david-sarah at jacaranda.org Mon Oct 5 21:20:07 2009 From: david-sarah at jacaranda.org (David-Sarah Hopwood) Date: Tue, 06 Oct 2009 05:20:07 +0100 Subject: [tahoe-dev] Deletion in the Elk Point cap protocol In-Reply-To: <4AAC7A34.7070905@jacaranda.org> References: <200907162209.04414.shawn-tahoe@willden.org> <4AA4BAE5.1030705@lothar.com> <4AA62064.60904@jacaranda.org> <4AA75E06.8010301@lothar.com> <4AAA6A77.4080001@jacaranda.org> <4AABF714.6080900@lothar.com> <4AAC7A34.7070905@jacaranda.org> Message-ID: <4ACAC577.7020506@jacaranda.org> David-Sarah Hopwood wrote: > Brian Warner wrote: >> David-Sarah Hopwood wrote: >>> I've designed a new version of the cap protocol I posted a few days >>> ago, that addresses Brian's comments about roadblock attacks. (Current version at .) [...] >> * The destroy-cap is a neat bit of feature, but I'd want to think about >> it further before actually building it in. It makes a lot more sense >> in the context of add-only files.. if you don't have those, then >> there's not a lot of difference between filling the mutable slot with >> an empty file and completely destroying the slot. > > It makes sense for both add-only files and immutable files. Note that in > the case of immutable files it doesn't compromise collision resistance: > after a file has been destroyed it is possible for it to be recreated at > the same storage index (and made accessible by the same read cap), but in > that case it must have the same contents. Note that there's a minor error in the diagram concerning the destroy cap: it shows the destroy cap as KD, but in fact it has to include S and T as well as KD. This is so that the holder of a destroy cap can determine the storage index (and therefore which servers hold shares), and so that the servers are able to index shares only by their storage index. > Of course there are limitations to how thoroughly a file can be deleted, In support of how difficult it is to delete data by deleting shares in a DHT-based storage system, see the paper at: The attack described there is not directly relevant to Tahoe unless it were used in a particular unusual way; however, the paper shows that it is easy to overestimate the cost to an attacker of collecting a large proportion of the shares stored in a DHT. Of course in Tahoe, recording the ciphertext of shares will only help attackers who have the corresponding read caps. On the other hand, they could record the ciphertext and only gain access to the read cap later. > given that it can *necessarily* be recreated even by anyone who knows just > the "green" information, i.e. they don't need to have had the read cap. Actually this may not be true, if the servers retain a small amount of information about each share that has been deleted; see below. > I agree that we need to think about this feature further before deciding > whether to add it. As I see it, there are several classes of reasons why > you might want to delete a file: > > a) you don't care about it any more and want the space to be reclaimable. This reason is not affected by the fact that an attacker can retain copies of shares, since the attacker will pay the cost of storing their copies. However, that assumes that upload messages cannot be directly replayed by an attacker in such a way that the storage cost will be charged to the quota of the original uploader. (So, if a signature on an upload message is used to authenticate that it should be charged to a given quota, the message that is signed should include a fresh challenge from the server.) > b) its contents are obsolete and you no longer want people to rely on > them, even though it isn't important whether they know the contents. > c) you have a legal and/or moral obligation to no longer store the > contents or otherwise contribute to providing access to them. Define an "undeletion attack" to be an attack that reinstates a deleted share in such a way that it can still be read by an existing read cap. Provided undeletion attacks are not possible, the fact that an attacker can retain copies of shares does not prevent deletions for reason b) and/or c) from being effective: in b), the fact that an attacker holds copies does not prevent deletion of the original shares (if they cannot be undeleted) from acting as a signal that the content is obsolete. in c), servers that were asked to delete their shares of a file are no longer contributing to providing access to the content. If an attacker knew the plaintext then it can re-upload it, but only at a new storage index, so that existing read caps are invalidated. Attackers who do not know the plaintext or read cap cannot re-upload it. This adequately discharges the server operator's obligations. To prevent undeletion attacks, it should be possible to tell a server to remember that the share at a given storage index has been deleted. The cost of storing this information (which is much smaller than the original file) can be accounted for by the usual quota mechanism. Note that some care is required to simultaneously prevent both undeletion attacks and roadblock attacks, because deletion could potentially be used to create a roadblock. For example, in the Elk Point 3 design, if the server *only* remembers storage indices that have been deleted, then that would enable a roadblock attack with cost 2^t, because the attacker would only need to find an (EncK1, Dhash, V) triple that is a T-bit preimage for the T value of the target share, and use it to delete the share before it is uploaded. (It is desirable to allowing a share to be "deleted" even if it does not yet exist on a given server, to ensure that it cannot be uploaded to that server in future.) To prevent such attacks, what the server should remember for each deleted share, is the verify cap (S, T, U). That can be implemented by mapping each storage index S || T to a set of shares, and a set of deletion records each consisting of a U value. When a share is uploaded, the server should refuse to store it if the corresponding storage index has any deleted U value that matches that share. Adding a share or deletion record at a given storage index, would not displace any existing shares or deletion records at that index for which the U values do not match. For this scheme to be secure against undeletion attacks, it must not be possible to find a share related to a given share for a file F, that will fail to verify, but nevertheless be readable by a read cap for F. The Elk Point designs do appear to have that property: changing any of EncK1, Dhash, or V in a share will cause it to be unreadable as well as failing to verify. It is also possible to change Sig_KR, but that does not matter because uploaded shares are only checked against deleted U values; Sig_KR is checked separately. > d) you screwed up and made information available that should have been > confidential. We wouldn't claim to address this case. -- David-Sarah Hopwood ? http://davidsarah.livejournal.com From david-sarah at jacaranda.org Mon Oct 5 21:43:13 2009 From: david-sarah at jacaranda.org (David-Sarah Hopwood) Date: Tue, 06 Oct 2009 05:43:13 +0100 Subject: [tahoe-dev] Deletion in the Elk Point cap protocol In-Reply-To: <4ACAC577.7020506@jacaranda.org> References: <200907162209.04414.shawn-tahoe@willden.org> <4AA4BAE5.1030705@lothar.com> <4AA62064.60904@jacaranda.org> <4AA75E06.8010301@lothar.com> <4AAA6A77.4080001@jacaranda.org> <4AABF714.6080900@lothar.com> <4AAC7A34.7070905@jacaranda.org> <4ACAC577.7020506@jacaranda.org> Message-ID: <4ACACAE1.7020004@jacaranda.org> David-Sarah Hopwood wrote: > David-Sarah Hopwood wrote: >> b) its contents are obsolete and you no longer want people to rely on >> them, even though it isn't important whether they know the contents. >> c) you have a legal and/or moral obligation to no longer store the >> contents or otherwise contribute to providing access to them. > > Define an "undeletion attack" to be an attack that reinstates a deleted > share in such a way that it can still be read by an existing read cap. > > Provided undeletion attacks are not possible, the fact that an attacker > can retain copies of shares does not prevent deletions for reason b) > and/or c) from being effective: > > in b), the fact that an attacker holds copies does not prevent deletion > of the original shares (if they cannot be undeleted) from acting > as a signal that the content is obsolete. > > in c), servers that were asked to delete their shares of a file are no > longer contributing to providing access to the content. If an > attacker knew the plaintext then it can re-upload it, but only at > a new storage index, so that existing read caps are invalidated. > Attackers who do not know the plaintext or read cap cannot > re-upload it. This adequately discharges the server operator's > obligations. I meant also to clarify that supporting destroy caps does not prevent a server operator from deleting any share for any reason. For instance, if a server operator were given a legitimate reason to delete a share by some authority such as the police, they would presumably do so without needing to see the destroy cap. The problem that destroy caps solve is that without them, the creator of an immutable or add-only share cannot prove to a server that they are they are entitled to delete the share just by virtue of being its creator (or authorized to do so by the creator). Also, the process of deleting a file using its destroy cap would be automatic, whereas deleting a share from a particular server without the destroy cap would require manual intervention by the server operator. -- David-Sarah Hopwood ? http://davidsarah.livejournal.com From zookog at gmail.com Tue Oct 6 11:10:02 2009 From: zookog at gmail.com (Zooko O'Whielacronx) Date: Tue, 6 Oct 2009 12:10:02 -0600 Subject: [tahoe-dev] converting roadblocks into speedbumps -- Re: "Elk Point" design for mutable, add-only, and immutable files Message-ID: Folks: I'm finally getting caught up on the Elk Point design. (David-Sarah: where were you when you first thought of it? Because the tradition is to name a design after the geographic location where it was conceived.) I have several things to contribute to the discussion. This note is about "how to convert roadblocks into speedbumps". This is a possible feature that could be added to NewCap design to reduce the number of "roadblock prevention bits" in caps. This feature may not turn out to be needed or to be worth its cost in complexity and efficiency, but I want to document it as a tool in our design toolbox. As you know, a "roadblock attack" is a kind of denial-of-service where the attacker knows or guesses the storage index that you are going to use to upload your file, and it persuades a storage server to put something else into that storage index before you upload your share. If the part of the storage index which the storage server can verify (labelled 't' in the Elk Point diagrams [1]) is short enough, then an attacker can use brute force to generate a share which matches t, and the server won't be able to tell that it isn't a legitimate share for that entire storage index (t, S). One can imagine dealing with a roadblock attack by choosing a different key and re-uploading (or if there is a destroy cap 'D' then choosing a different 'D' so that you don't have to re-encrypt with a different key), but even if the uploader could win that race against the roadblocker, a repairer would not have that option since the repairer has to match the already-extant immutable file read cap. Here is an extension to the upload/download protocol that converts roadblocks to speedbumps: the storage server holds all shares that have been uploaded under a given storage index (and share num), and it stores them in the order that they arrived. Downloaders can retrieve and check the K1 and V values of each of the candidates in order that they were uploaded, and they can tell which one matches their readcap. Therefore if an attacker goes to the work to set up a roadblock, it only inconveniences the downloader (and the storage server) a tiny amount -- it is only a shallow speedbump. Here are the details. The protocol to upload shares is that the client invokes the server's "allocate_buckets()" method [2], passing a storage index and a set of sharenums. To download shares, the client invoke's the server's "get_buckets()" method [3] passing a storage index. The storage server returns a dict mapping from sharenum to BucketReader objects. The extension to the protocol is that the storage server returns instead a dict mapping from sharenum to a list of BucketReader objects, which are sorted in the order that they were uploaded. This doesn't sound like too much added complication, but perhaps that's just because the Foolscap library is so Pythonic and expressive. If you think about how the storage server lays out its share data on disk and how many network round-trips and bytes are required for downloading, the cost of this improvement looks a bit worse: The storage server in Tahoe-LAFS v1.5 stores shares in its local filesystem in the following layout: storage/shares/$START/$STORAGEINDEX/$SHARENUM Where "$START" denotes the first 10 bits worth of $STORAGEINDEX (that's 2 base-32 chars). You can see this in storage/server.py [4]. The easiest way to extend this to handle multiple share candidates would be something like storage/shares/$START/$STORAGEINDEX/$SHARENUM_$CANDIDATENUM Then the server could see all the candidates (just by inspecting the $STORAGEINDEX directory -- not having to read the files themselves) and return an ordered list of remote references to BucketReacers. Now if the server's reply to get_buckets() contained only the share num and the remote references to the BucketReaders, then the downloader would have to wait for at least one more network round trip to fetch the K1 and V values for each BucketReader before it could decide which candidate share matched its read-cap. One the other hand, if the server's initial reply contained the K1 and V values for each candidate, then the server would have to do at least one extra disk seek per candidate to read those values out of each share. We know from experience that a disk seek in the storage server is typically one of the slowest operations in the whole grid. So we could add a file to the layout which contains just the K1 and V values, in order, for all of the candidates, thus making it possible for the storage server to return all the K1 and V values without doing a lot of seeks: storage/shares/$START/$STORAGEINDEX/$SHARENUM_K1s_and_Vs storage/shares/$START/$STORAGEINDEX/$SHARENUM_$CANDIDATENUM Okay, that's it. Would this be a win? Well, I think that depends on the exact sizes of the caps. I'm working on a table cataloguing Everything That Can Go Wrong, including brute force attacks on various elements of the crypto structure and failure of the various crypto primitives that we use. The Everything That Can Go Wrong table includes the numbers for how much a brute force attack would cost, which will let me understand exactly what size of crypto values I would be comfortable with. Once I have that table, then I'll reconsider whether in Elk Point town we should build a Roadblock-To-Speedbump Converter, or if we should just set such a high tax on roadblocks that nobody can afford to build one (by including enough t bits). Regards, Zooko [1] http://jacaranda.org/tahoe/mutable-addonly-elkpoint-2.svg [2] http://allmydata.org/trac/tahoe/browser/src/allmydata/interfaces.py?rev=4045#L94 [3] http://allmydata.org/trac/tahoe/browser/src/allmydata/interfaces.py?rev=4045#L160 [4] http://allmydata.org/trac/tahoe/browser/src/allmydata/storage/server.py?rev=3871 From zookog at gmail.com Tue Oct 6 13:07:04 2009 From: zookog at gmail.com (Zooko O'Whielacronx) Date: Tue, 6 Oct 2009 14:07:04 -0600 Subject: [tahoe-dev] Deletion in the Elk Point cap protocol In-Reply-To: <4ACACAE1.7020004@jacaranda.org> References: <4AA4BAE5.1030705@lothar.com> <4AA62064.60904@jacaranda.org> <4AA75E06.8010301@lothar.com> <4AAA6A77.4080001@jacaranda.org> <4AABF714.6080900@lothar.com> <4AAC7A34.7070905@jacaranda.org> <4ACAC577.7020506@jacaranda.org> <4ACACAE1.7020004@jacaranda.org> Message-ID: I think there are too many possible features, with associated costs and interactions, for most of us (with the possible exception of Brian) to keep track of. Let's update http://allmydata.org/trac/tahoe/wiki/NewCapDesign to list all features that we would want if we could afford them, and link to discussions such as this one about how it could be implemented and what it would cost in complexity. By the way, something that I want which is closely related to this "Deletion" feature but which is *not* this Deletion feature is Revocation of Further Writes: http://allmydata.org/pipermail/tahoe-dev/2009-June/001995.html Revocation of Further Writes is something that I strongly want, from my own personal experience using Tahoe-LAFS (recounted in that mailing list post) and because I think that it could be valuable to a lot of people. Revocation of Further Writes is very simple from the perspective of the cap crypto structure because it can be implemented without any changes to the cap crypto structure! It can be implemented as a new layer that sits above mutable caps and below directories. However, Revocation of Further Writes has a potential problem from a rollback attack, like the Deletion feature does (and like mutable caps do in general). Also I happen to know that the Cassandra distributed database has a similar problem even though they don't try to account for the malicious case: they want to make sure that servers gossiping with one another don't accidentally resurrect a file that has been deleted, so they have a "tombstone" protocol similar to the protocol that David-Sarah suggested to "remember that the share at a given storage index has been deleted". David-Sarah: will you please add Deletion http://allmydata.org/trac/tahoe/wiki/NewCapDesign ? Regards, Zooko From zooko at zooko.com Tue Oct 6 15:56:26 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Tue, 6 Oct 2009 16:56:26 -0600 Subject: [tahoe-dev] Android Tahoe-LAFS client In-Reply-To: References: <4AC4FEF0.9050206@ctrlaltdel.ch> Message-ID: <5AD8274B-30DC-457E-96E9-D87A2EE65592@zooko.com> So just to be extra double clear about this, it is *possible* to run a Tahoe-LAFS gateway on an embedded ARM system -- Fran?ois has contributed a buildbot that we use to test that -- but what the Android client runs is *not* a Tahoe-LAFS gateway, but instead just an HTTP client. Right, Fran?ois? So neither storage (which is provided by storage servers) nor security (which is provided by Tahoe-LAFS gateways) is provided by the Android device. If you are confused about the difference between a Tahoe-LAFS gateway and a Tahoe-LAFS client, please see this diagram: http://tahoebs1.allmydata.com:8123/file/URI%3ACHK% 3Aw5ggttx55tj6e6ah6zvysxmqum% 3Aarthiemgeotsk2wmlxsyigsudvuglevbj4uprzywmcr5ooay72va%3A3%3A10% 3A164776/@@named=/network-and-reliance-topology.svg Regards, Zooko From david-sarah at jacaranda.org Tue Oct 6 17:49:08 2009 From: david-sarah at jacaranda.org (David-Sarah Hopwood) Date: Wed, 07 Oct 2009 01:49:08 +0100 Subject: [tahoe-dev] Removing the dependency of immutable read caps on UEB computation Message-ID: <4ACBE584.80507@jacaranda.org> [I'm having trouble with incoming email at the moment, so I'm having to read this thread from the list archives. That means my responses may be a bit delayed and won't be properly threaded.] Brian Warner wrote: > I *am* intrigued by the idea of immutable files being just locked-down > variants of mutable files. A mutable-file readcap plus a hash of the > expected contents (i.e. H(UEB1)) would achieve this pretty well.. might > not be too much longer than our current immutable readcaps, and we could > keep the encoding-parameter-sensitive parts (UEB2) in the signed (and > therefore mutable) portion, so they could be changed later. We can do better than that. Notice that the mutable and immutable Elk Point protocols are (deliberately) already very similar. In particular, the mutable protocol obtains the same (n+t)/2 bits of collision-resistance as the immutable protocol does, for the values that are hashed by hash_m to obtain T || U. (This is from the point of view of a read cap holder. Assumptions: m >= n+t, K1 is at least n bits, and all cryptographic primitives have the strength expected from their security parameters.) When it is used for mutable files, this collision-resistance for EncK1, Dhash and V doesn't really buy you anything because even if those values are fixed, the file contents can still vary. However, if a hash of the plaintext (also of length m bits, say) is optionally included in the input to hash_m, the same protocol can be used for immutable files, and still obtains (n+t)/2 bits of collision resistance for the plaintext, from the point of view of a read cap holder. Although the protocol for mutable and immutable files would be the same in this approach, the encoding of a read-cap should include a bit saying whether it is immutable, and the client should require and check the plaintext hash if this bit is set. This enables a read cap holder to tell off-line whether they are getting immutability and collision- resistance for that cap. As Shawn points out, this approach allows an immutable file creator to deny further access to the file. However, if the destroy cap functionality is supported, then the file creator is intentionally authorized to do that by using the destroy cap, so it's not an attack if they can also do it in some other way. (I'm distinguishing between the creator and uploader, since they may not be the same. The creator of a file is the party who originally holds all of the secret keys for that file.) -- David-Sarah Hopwood ? http://davidsarah.livejournal.com From david-sarah at jacaranda.org Tue Oct 6 18:20:12 2009 From: david-sarah at jacaranda.org (David-Sarah Hopwood) Date: Wed, 07 Oct 2009 02:20:12 +0100 Subject: [tahoe-dev] Removing the dependency of immutable read caps on UEB computation In-Reply-To: <4ACBE584.80507@jacaranda.org> References: <4ACBE584.80507@jacaranda.org> Message-ID: <4ACBECCC.2000100@jacaranda.org> David-Sarah Hopwood wrote: > Brian Warner wrote: >> I *am* intrigued by the idea of immutable files being just locked-down >> variants of mutable files. A mutable-file readcap plus a hash of the >> expected contents (i.e. H(UEB1)) would achieve this pretty well.. might >> not be too much longer than our current immutable readcaps, and we could >> keep the encoding-parameter-sensitive parts (UEB2) in the signed (and >> therefore mutable) portion, so they could be changed later. > > We can do better than that. Notice that the mutable and immutable > Elk Point protocols are (deliberately) already very similar. > In particular, the mutable protocol obtains the same (n+t)/2 bits of > collision-resistance as the immutable protocol does, for the values > that are hashed by hash_m to obtain T || U. (This is from the point > of view of a read cap holder. Assumptions: m >= n+t, K1 is at least > n bits, and all cryptographic primitives have the strength expected > from their security parameters.) > > When it is used for mutable files, this collision-resistance for EncK1, > Dhash and V doesn't really buy you anything because even if those values > are fixed, the file contents can still vary. However, if a hash of the > plaintext (also of length m bits, say) is optionally included in the input > to hash_m, the same protocol can be used for immutable files, and still > obtains (n+t)/2 bits of collision resistance for the plaintext, from > the point of view of a read cap holder. Incidentally, I previously said "I think it's desirable to continue to avoid relying on public key cryptography in the immutable file protocol." However, using the mutable file protocol in the way described above, does not rely on public key cryptography for integrity of the plaintext as read by a read-cap holder. That is still only dependent on hashes, and on the symmetric cipher used to encrypt K1. The public key crypto is relied on just to allow checking validity of a share without the read cap. -- David-Sarah Hopwood ? http://davidsarah.livejournal.com From david-sarah at jacaranda.org Tue Oct 6 18:47:50 2009 From: david-sarah at jacaranda.org (David-Sarah Hopwood) Date: Wed, 07 Oct 2009 02:47:50 +0100 Subject: [tahoe-dev] Bringing Tahoe ideas to HTTP Message-ID: <4ACBF346.2000009@jacaranda.org> Brian Warner wrote: > So the second idea is that, if your #hash=XYZ-validated jquery.js > library can be proven to be DeepFrozen (say, by passing it through the > Caja verifier with a flag that says "only accept DeepFrozen classes" or > something), then not only can you cache the javascript source code, but > you can also cache the parse tree, saving you the time and memory needed > to re-parse and evaluate the same source code on every single page load. Actually you can cache the *parse tree* for a hash-validated library even without proving that the library is DeepFrozen. What you can't cache, just using hash validation, are any objects resulting from evaluating the library code. If you prove that the evaluation of such an object has no external side-effects and that it must be DeepFrozen, then you can cache that object as well. Your first idea to use hash validation to help with file caching seems more immediately practical. Incidentally, I obtained the XML source for the "Link Fingerprints" Internet Draft from Gervase Markham. I think we should reintroduce that draft -- perhaps with some minor technical changes such as a more compact encoding for the hash, rather than hexadecimal, and with some way to support both a fragment ID and a hash in the same URL. We could add the caching idea to the draft as additional rationale. -- David-Sarah Hopwood ? http://davidsarah.livejournal.com From david-sarah at jacaranda.org Wed Oct 7 00:50:01 2009 From: david-sarah at jacaranda.org (David-Sarah Hopwood) Date: Wed, 07 Oct 2009 08:50:01 +0100 Subject: [tahoe-dev] Removing the dependency of immutable read caps on UEB computation Message-ID: <4ACC4829.4010909@jacaranda.org> Some clarifications: David-Sarah Hopwood wrote: >> When [Elk Point] is used for mutable files, this collision-resistance for >> EncK1, Dhash and V doesn't really buy you anything because even if those >> values are fixed, the file contents can still vary. However, if a hash of the >> plaintext (also of length m bits, say) is optionally included in the input >> to hash_m, the same protocol can be used for immutable files, and still >> obtains (n+t)/2 bits of collision resistance for the plaintext, from >> the point of view of a read cap holder. This plaintext hash should be either salted, or encrypted with the read key, so that it does not allow guessing attacks. > Incidentally, I previously said > "I think it's desirable to continue to avoid relying on public key > cryptography in the immutable file protocol." > > However, using the mutable file protocol in the way described above, > does not rely on public key cryptography for integrity of the plaintext > as read by a read-cap holder. That is still only dependent on hashes, > and on the symmetric cipher used to encrypt K1. Note that integrity is not dependent on the security of the symmetric cipher; only that it is implemented correctly, and is a deterministic permutation for a given key. > The public key crypto is > relied on just to allow checking validity of a share without the read cap. -- David-Sarah Hopwood ? http://davidsarah.livejournal.com From francois at ctrlaltdel.ch Wed Oct 7 05:09:26 2009 From: francois at ctrlaltdel.ch (Francois Deppierraz) Date: Wed, 07 Oct 2009 14:09:26 +0200 Subject: [tahoe-dev] Android Tahoe-LAFS client In-Reply-To: <5AD8274B-30DC-457E-96E9-D87A2EE65592@zooko.com> References: <4AC4FEF0.9050206@ctrlaltdel.ch> <5AD8274B-30DC-457E-96E9-D87A2EE65592@zooko.com> Message-ID: <4ACC84F6.1020606@ctrlaltdel.ch> Hi Zooko, Zooko Wilcox-O'Hearn wrote: > So just to be extra double clear about this, it is *possible* to run > a Tahoe-LAFS gateway on an embedded ARM system -- Fran?ois has > contributed a buildbot that we use to test that -- but what the > Android client runs is *not* a Tahoe-LAFS gateway, but instead just > an HTTP client. Right, Fran?ois? Yes, that's correct, this Android app is only a special-purpose HTTP client. On the security side, this means that you have to trust the Tahoe gateway you're connecting to because it'll be able to see the plaintext of your files. And you've better access it through HTTPS as well. > So neither storage (which is provided by storage servers) nor > security (which is provided by Tahoe-LAFS gateways) is provided by > the Android device. A Python interpreter already runs on Android [1] but I've currently no idea on how to install Tahoe dependencies (zfec and pycryptopp) requiring native code. Fran?ois [1] http://code.google.com/p/android-scripting/ From trac at allmydata.org Thu Oct 8 09:53:13 2009 From: trac at allmydata.org (tahoe-lafs) Date: Thu, 08 Oct 2009 16:53:13 -0000 Subject: [tahoe-dev] [tahoe-lafs] #812: server-side crawlers: tolerate corrupted shares, verify shares (was: handle corrupted lease files) In-Reply-To: <037.63bf4e6035795ad71b6485224d4ecd8e@allmydata.org> References: <037.63bf4e6035795ad71b6485224d4ecd8e@allmydata.org> Message-ID: <046.f529811fda1a7eb009a4ef5ec0b6c0b1@allmydata.org> #812: server-side crawlers: tolerate corrupted shares, verify shares --------------------------+------------------------------------------------- Reporter: zooko | Owner: warner Type: defect | Status: new Priority: major | Milestone: undecided Component: code-storage | Version: 1.4.1 Keywords: | Launchpad_bug: --------------------------+------------------------------------------------- Comment(by warner): Sounds good. To be specific, this is unrelated to leases, it's just that the lease-expiring crawler is what first noticed the corruption. So this ticket is about: * verify that the crawlers keep crawling after an exception in their per- share handler functions (I believe that it already does this, but we should verify it) * implement a share-verifying crawler (the server-side verifier), and have it quarantine any corrupted share in some offline junkpile And yeah, just rm the file, it's useless to anyone. The next time that directory is modified, a new copy of that share will be created. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From zooko at zooko.com Thu Oct 8 20:56:45 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Thu, 8 Oct 2009 21:56:45 -0600 Subject: [tahoe-dev] interesting security+distributed network research: Adeona, Vanish Message-ID: <782AD73A-DE15-44D6-9EED-D1DB1D575425@zooko.com> Dear tahoe-dev folks: Check out these research papers: Roxana Geambasu, Jarret Falkner, Paul Gardner, Tadayoshi Kohno, Arvind Krishnamurthy, and Henry M. Levy: "Experiences Building Security Applications on DHTs" Scott Wolchok, Owen S. Hofmann, Nadia Heninger, Edward W. Felten, J. Alex Halderman, Christopher J. Rossbach, Brent Waters, and Emmett Witchel: "Defeating Vanish with Low-Cost Sybil Attacks Against Large DHTs" It is interesting to see how these security researchers are trying to adapt distributed networks such as Vuze (Azureus) and OpenDHT to provide novel security properties, and it is very interesting to see how the schemes fail. I wonder if the kind of security properties that they want could be layered on top of Tahoe-LAFS or provided as a built-in feature of a future version of Tahoe-LAFS. As far as I can see right now, the answer is "not quite". Regards, Zooko From trac at allmydata.org Fri Oct 9 09:58:28 2009 From: trac at allmydata.org (tahoe-lafs) Date: Fri, 09 Oct 2009 16:58:28 -0000 Subject: [tahoe-dev] [tahoe-lafs] #49: UPnP In-Reply-To: <037.990325008bd9b3d720883cafbf92d5f8@allmydata.org> References: <037.990325008bd9b3d720883cafbf92d5f8@allmydata.org> Message-ID: <046.6b00d674e7cbc6ffd091b2699e9ef4b2@allmydata.org> #49: UPnP ---------------------------+------------------------------------------------ Reporter: zooko | Type: enhancement Status: new | Priority: minor Milestone: undecided | Component: code-network Version: | Keywords: Launchpad_bug: | ---------------------------+------------------------------------------------ Comment(by jrydberg): A simple UPNP library, that also aims at supporting NAT-PMP: http://github.com/jrydberg/natmap -- Ticket URL: tahoe-lafs secure decentralized file storage grid From dave at boostpro.com Fri Oct 9 11:15:47 2009 From: dave at boostpro.com (David Abrahams) Date: Fri, 09 Oct 2009 14:15:47 -0400 Subject: [tahoe-dev] No progress? Message-ID: I started a very large "tahoe cp -r" to the allmydata.com production grid a few hours ago. It's not giving me any feedback, so I keep checking it with "tahoe stats," but those numbers aren't changing: Counts and Total Sizes: count-immutable-files: 6406 count-mutable-files: 0 count-literal-files: 331 count-files: 6737 count-directories: 877 size-immutable-files: 112597462 (112.60 MB, 107.38 MiB) size-literal-files: 5733 (5.73 kB, 5.60 kiB) size-directories: 4866248 (4.87 MB, 4.64 MiB) largest-directory: 123115 (123.11 kB, 120.23 kiB) largest-immutable-file: 4141504 (4.14 MB, 3.95 MiB) Size Histogram: 0-0 : 53 (0 B, 0 B) 1-3 : 69 (3 B, 3 B) 4-10 : 22 (10 B, 10 B) 11-31 : 112 (31 B, 31 B) 32-100 : 323 (100 B, 100 B) 101-316 : 537 (316 B, 316 B) 317-1000 : 904 (1000 B, 1000 B) 1001-3162 : 1222 (3.16 kB, 3.09 kiB) 3163-10000 : 1568 (10.00 kB, 9.77 kiB) 10001-31622 : 1323 (31.62 kB, 30.88 kiB) 31623-100000 : 497 (100.00 kB, 97.66 kiB) 100001-316227 : 75 (316.23 kB, 308.82 kiB) 316228-1000000 : 20 (1.00 MB, 976.56 kiB) 1000001-3162277 : 8 (3.16 MB, 3.02 MiB) 3162278-10000000 : 4 (10.00 MB, 9.54 MiB) Any explanation for this? Thanks! -- Dave Abrahams Meet me at BoostCon: http://www.boostcon.com BoostPro Computing http://www.boostpro.com From zookog at gmail.com Fri Oct 9 19:26:45 2009 From: zookog at gmail.com (Zooko O'Whielacronx) Date: Fri, 9 Oct 2009 20:26:45 -0600 Subject: [tahoe-dev] No progress? In-Reply-To: References: Message-ID: Dave: As far as I know the new servers that Peter twittered about [1] aren't actually live yet. We plan to make our munin graphs of servers and capacity publicly visible once we've added this new capacity added so that in the future you'll be able to directly check the state of the servers. Regards, Zooko [1] http://twitter.com/Allmydata/status/4685700086 On Fri, Oct 9, 2009 at 12:15 PM, David Abrahams wrote: > > I started a very large "tahoe cp -r" to the allmydata.com production > grid a few hours ago. ?It's not giving me any feedback, so I keep > checking it with "tahoe stats," but those numbers aren't changing: > > Counts and Total Sizes: > ?count-immutable-files: 6406 > ? count-mutable-files: 0 > ? count-literal-files: 331 > ? ? ? ? ? count-files: 6737 > ? ? count-directories: 877 > ?size-immutable-files: 112597462 ? ?(112.60 MB, 107.38 MiB) > ? ?size-literal-files: 5733 ? ?(5.73 kB, 5.60 kiB) > ? ? ?size-directories: 4866248 ? ?(4.87 MB, 4.64 MiB) > ? ? largest-directory: 123115 ? ?(123.11 kB, 120.23 kiB) > largest-immutable-file: 4141504 ? ?(4.14 MB, 3.95 MiB) > Size Histogram: > ? ? ? 0-0 ? ? ? ?: 53 ? ? ?(0 B, 0 B) > ? ? ? 1-3 ? ? ? ?: 69 ? ? ?(3 B, 3 B) > ? ? ? 4-10 ? ? ? : 22 ? ? ?(10 B, 10 B) > ? ? ?11-31 ? ? ? : 112 ? ? (31 B, 31 B) > ? ? ?32-100 ? ? ?: 323 ? ? (100 B, 100 B) > ? ? 101-316 ? ? ?: 537 ? ? (316 B, 316 B) > ? ? 317-1000 ? ? : 904 ? ? (1000 B, 1000 B) > ? ?1001-3162 ? ? : 1222 ? ?(3.16 kB, 3.09 kiB) > ? ?3163-10000 ? ?: 1568 ? ?(10.00 kB, 9.77 kiB) > ? 10001-31622 ? ?: 1323 ? ?(31.62 kB, 30.88 kiB) > ? 31623-100000 ? : 497 ? ? (100.00 kB, 97.66 kiB) > ?100001-316227 ? : 75 ? ? ?(316.23 kB, 308.82 kiB) > ?316228-1000000 ?: 20 ? ? ?(1.00 MB, 976.56 kiB) > ?1000001-3162277 ?: 8 ? ? ? (3.16 MB, 3.02 MiB) > ?3162278-10000000 : 4 ? ? ? (10.00 MB, 9.54 MiB) > > Any explanation for this? > > Thanks! > > -- > Dave Abrahams ? ? ? ? ? Meet me at BoostCon: http://www.boostcon.com > BoostPro Computing > http://www.boostpro.com > > > _______________________________________________ > tahoe-dev mailing list > tahoe-dev at allmydata.org > http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev > From shawn at willden.org Sat Oct 10 05:15:07 2009 From: shawn at willden.org (Shawn Willden) Date: Sat, 10 Oct 2009 06:15:07 -0600 Subject: [tahoe-dev] Share rebalancing Message-ID: <200910100615.07599.shawn@willden.org> I noticed this morning that for the last couple of days my "backups" have been going nowhere. I have a small friendnet, and for some reason all of the other nodes went down at roughly the same time. My backups continued chugging away, but the only storage node available was my local node -- so all shares went there. Not very useful. I'm going to set up some monitoring stuff to notify me and automatically shut down the backups whenever grid health drops below an acceptable level, but my first concern is to figure out how to fix the useless backups, to distribute the shares properly. I'm looking for ideas about how I can do this. The best plan I've come up with so far is this: 1. Modify GridVerify (my backup checker, which walks the backup logs and checks the health of each file using the check function of Tahoe's web API) to evaluate the share allocation of each file and log the readcaps and storage IDs of those that aren't spread across enough servers. 2. Write a script to retrieve each of those readcaps. 3. Write a script to delete the local shares by deleting the storage ID directories. 4. Re-upload the downloaded files. Any other ideas? Shawn. From zooko at zooko.com Sat Oct 10 10:24:18 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Sat, 10 Oct 2009 11:24:18 -0600 Subject: [tahoe-dev] Share rebalancing In-Reply-To: <200910100615.07599.shawn@willden.org> References: <200910100615.07599.shawn@willden.org> Message-ID: On Saturday,2009-10-10, at 6:15 , Shawn Willden wrote: > I have a small friendnet, and for some reason all of the other > nodes went down at roughly the same time. My backups continued > chugging away, but the only storage node available was my local > node -- so all shares went there. Not very useful. > > I'm going to set up some monitoring stuff to notify me and > automatically shut down the backups whenever grid health drops > below an acceptable level, Also Kevan Carstensen's implementation of #778 would make it so that those uploads are reported as failing, which I think would solve the same problem that your proposed "monitoring stuff" would. Regards, Zooko http://allmydata.org/trac/tahoe/ticket/778 # "shares of happiness" is the wrong measure; "servers of happiness" is better From trac at allmydata.org Sat Oct 10 08:49:31 2009 From: trac at allmydata.org (tahoe-lafs) Date: Sat, 10 Oct 2009 15:49:31 -0000 Subject: [tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better In-Reply-To: <037.6c53986cb63378fa501159cafeed32e6@allmydata.org> References: <037.6c53986cb63378fa501159cafeed32e6@allmydata.org> Message-ID: <046.b93a1d9d8813dd1369db5458b09f9072@allmydata.org> #778: "shares of happiness" is the wrong measure; "servers of happiness" is better --------------------------------+------------------------------------------- Reporter: zooko | Owner: kevan Type: defect | Status: new Priority: critical | Milestone: 1.5.1 Component: code-peerselection | Version: 1.4.1 Keywords: reliability | Launchpad_bug: --------------------------------+------------------------------------------- Comment(by zooko): Kevan: It's great to see that you study the code so carefully. This gives me a nice warm fuzzy feeling that more eyeballs have looked at it. Anyway, I'm very sorry it has been two weeks since it was my turn to reply on this ticket and I haven't done so. I've been really busy. I guess the key fact that you've shown that I didn't appreciate is that the variable {{{_servers_with_shares}}} holds only servers that have a ''new'' share that isn't already held by one of the servers in that set. Perhaps it should be renamed to something like {{{_servers_with_unique_shares}}}. (If you do that, please use the {{{darcs replace}}} command to rename it.) Now I can think of one more issue. You've pretty much convinced me that this way of counting {{{_servers_with_shares}}} can't overcount unique shares which are available on separate servers, but it could undercount. For example, suppose {{{s_1: f_1}}}, {{{s_2: f_2}}}, {{{s_3: f_3}}}, {{{s_4: f_1, f_2, f_3}}}. Then if {{{s_4}}} is counted first it will prevent {{{s_1}}}, {{{s_2}}}, and {{{s_3}}} from being counted because they don't have any new shares, so the final value of {{{_servers_with_shares}}} will be 1. On the other hand if {{{s_1}}}, then {{{s_2}}}, then {{{s_3}}} are counted first the final value will be 3. If this is right, then it means that sometimes an upload could be reported as failing (because the uploader happened to talk to {{{s_4}}} first) when it should have been reported as succeeding. What do you think? It might be worth committing your patch as is, meaning that trunk would then potentially suffer from uploads spuriously failing when they shouldn't have (but they never suffer from uploads spuriously succeeding when they shouldn't have), and then starting on a separate patch to avoid that problem. Or, perhaps we should keep this patch out of trunk even longer and think about that issue. Regards, Zooko -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Sat Oct 10 09:01:31 2009 From: trac at allmydata.org (pycryptopp) Date: Sat, 10 Oct 2009 16:01:31 -0000 Subject: [tahoe-dev] [pycryptopp] #19: Segmentation fault in HashMultipleBlocks In-Reply-To: <056.fb0765ee719a28eafc9d9a9f772bbc7b@allmydata.org> References: <056.fb0765ee719a28eafc9d9a9f772bbc7b@allmydata.org> Message-ID: <065.ce7af8c03ea21531621d5b3589307d90@allmydata.org> #19: Segmentation fault in HashMultipleBlocks ---------------------+------------------------------------------------------ Reporter: francois | Owner: nejucomo Type: defect | Status: new Priority: major | Version: 0.5.1 Keywords: | Launchpad_bug: ---------------------+------------------------------------------------------ Changes (by zooko): * owner: francois => nejucomo Comment: Nathan: please sign off on this by marking it as "resolved". -- Ticket URL: pycryptopp Python bindings for the Crypto++ library From shawn at willden.org Sat Oct 10 15:01:09 2009 From: shawn at willden.org (Shawn Willden) Date: Sat, 10 Oct 2009 16:01:09 -0600 Subject: [tahoe-dev] Share rebalancing In-Reply-To: References: <200910100615.07599.shawn@willden.org> Message-ID: <200910101601.09627.shawn@willden.org> On Saturday 10 October 2009 11:24:18 am Zooko Wilcox-O'Hearn wrote: > On Saturday,2009-10-10, at 6:15 , Shawn Willden wrote: > > I have a small friendnet, and for some reason all of the other > > nodes went down at roughly the same time. My backups continued > > chugging away, but the only storage node available was my local > > node -- so all shares went there. Not very useful. > > > > I'm going to set up some monitoring stuff to notify me and > > automatically shut down the backups whenever grid health drops > > below an acceptable level, > > Also Kevan Carstensen's implementation of #778 would make it so that > those uploads are reported as failing, which I think would solve the > same problem that your proposed "monitoring stuff" would. It would cause the uploads to fail, which is good, but it would also lose my careful selection of FEC parameters. I've chosen values that achieve the level of reliability I need, while maximizing download performance. With Kevan's changes, I'd be able to maintain reliability, but I'd lose d/l performance, since it would be constrained to k * (slowest_rate). In any case, the more important part of my question was how to fix the present problem, rather than how to avoid it in the future :-) Shawn. From trac at allmydata.org Sat Oct 10 15:23:54 2009 From: trac at allmydata.org (tahoe-lafs) Date: Sat, 10 Oct 2009 22:23:54 -0000 Subject: [tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better In-Reply-To: <037.6c53986cb63378fa501159cafeed32e6@allmydata.org> References: <037.6c53986cb63378fa501159cafeed32e6@allmydata.org> Message-ID: <046.2ccbfaa5e09c96ce065b4dd74d90f8c8@allmydata.org> #778: "shares of happiness" is the wrong measure; "servers of happiness" is better --------------------------------+------------------------------------------- Reporter: zooko | Owner: kevan Type: defect | Status: new Priority: critical | Milestone: 1.5.1 Component: code-peerselection | Version: 1.4.1 Keywords: reliability | Launchpad_bug: --------------------------------+------------------------------------------- Comment(by kevan): Hm. That scenario would be a problem, and I don't really see an obvious solution to it. We could alter the logic at [source:src/allmydata/immutable/upload.py at 4045#L225] to not just give up after determining that there are no homeless shares, but that there aren't enough distinct servers with shares to consider the upload a success. We could, for example, figure out how many more servers need to have shares on them for the upload to work ( {{{n = servers_of_happiness - servers_with_shares}}}). We could then unallocate {{{n}}} shares from servers that have more than one share allocated, stick them back in {{{self.homeless_shares}}}, and then let the selection process continue as normal. We'd need a way to prevent it from looping, though -- maybe it should only do this if there are uncontacted peers. Would we want to remove shares from servers that happen to already have them if we're not counting them in the upload? If so, is there a way to do that? Does that idea make sense? Regarding holding up this patch versus committing now and making it a separate issue: * We'd probably want to write tests for this behavior. Do the test tools in Tahoe include a way to configure a grid so that it looks like the one in your example (I spent a while looking for such tools last weekend when I was trying to implement a test for your first example, but couldn't find them)? If not, we'd probably need to write them. * We'd probably want to make a better-defined algorithm for what I said in the paragraph up there (assuming that it is agreeable to everyone). I have school and work to keep me busy, so I'd be able to dedicate maybe an afternoon or two a week to keep working on this issue. I'm happy to do that -- I'd like to finish it -- but it would probably be a little while before we ended up committing a fix if we waited for that to be done (if someone with more time on their hands wanted to take over, that issue would be solved, I guess). So I guess that's one argument for making it a separate issue. On the other hand, it'd be nice to eliminate edge cases before committing. So there's that. I'm not sure which way I lean. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From zooko at zooko.com Sat Oct 10 15:42:08 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Sat, 10 Oct 2009 16:42:08 -0600 Subject: [tahoe-dev] "Elk Point" design for mutable, add-only, and immutable files In-Reply-To: <4AAF0725.2030800@jacaranda.org> References: <200907162209.04414.shawn-tahoe@willden.org> <4AA4BAE5.1030705@lothar.com> <4AA62064.60904@jacaranda.org> <4AA75E06.8010301@lothar.com> <4AAA6A77.4080001@jacaranda.org> <4AABF714.6080900@lothar.com> <4AAC7A34.7070905@jacaranda.org> <4AADA29F.8000009@lothar.com> <4AAF0725.2030800@jacaranda.org> Message-ID: <7125187E-4E35-4FF2-84F7-0542D9F46D2C@zooko.com> I've started a matrix of ways that an immutable file cap format could break: http://allmydata.org/trac/tahoe/wiki/NewCaps/WhatCouldGoWrong Unfortunately I can't conveniently replicate the data into an email message (except by sending HTML-formatted email, which I assume most of you would hate and which I don't even know how to do). So go read this page! http://allmydata.org/trac/tahoe/wiki/NewCaps/ WhatCouldGoWrong It includes how expensive it is to brute-force each part, which show us how big the crypto values R and T need to be. Also pay attention to the "what crypto property do we rely on" column. I wouldn't be surprised if SHA-256's collision-resistance is increasingly called into question in future years. (On the other hand I would be rather shocked if SHA-256's second-pre-image resistance were called into question in the forseeable future.) Regards, Zooko From zooko at zooko.com Sat Oct 10 16:36:20 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Sat, 10 Oct 2009 17:36:20 -0600 Subject: [tahoe-dev] detecting weak uploads and share rebalancing Re: Share rebalancing In-Reply-To: <200910101601.09627.shawn@willden.org> References: <200910100615.07599.shawn@willden.org> <200910101601.09627.shawn@willden.org> Message-ID: <2D46A0E6-1CAF-4B34-9971-4F78AA90F7B2@zooko.com> On Saturday,2009-10-10, at 16:01 , Shawn Willden wrote: >> Also Kevan Carstensen's implementation of #778 would make it so >> that those uploads are reported as failing > > It would cause the uploads to fail, which is good, but it would > also lose my careful selection of FEC parameters. Hm, are you thinking of #791? It began life as #778, but calved off into its own ticket. The patch that Kevan has written for #778 would cause weak uploads to fail, as you desire, without changing your FEC parameters. However the current version of his patch has a problem -- in certain unusual edge conditions it could cause a strong upload to fail. > In any case, the more important part of my question was how to fix > the present problem, rather than how to avoid it in the future :-) Ah, yes. So the process you described sounds like it would work, to me, but for immutable files there is an easier way -- just copy the share files to other servers. :-) Regards, Zooko http://allmydata.org/trac/tahoe/ticket/778 # "shares of happiness" is the wrong measure; "servers of happiness" is better http://allmydata.org/trac/tahoe/ticket/791 # Optimize FEC parameters to increase download performance From trac at allmydata.org Sat Oct 10 17:11:41 2009 From: trac at allmydata.org (tahoe-lafs) Date: Sun, 11 Oct 2009 00:11:41 -0000 Subject: [tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better In-Reply-To: <037.6c53986cb63378fa501159cafeed32e6@allmydata.org> References: <037.6c53986cb63378fa501159cafeed32e6@allmydata.org> Message-ID: <046.0b455f4e7412024378841acf8229a2c3@allmydata.org> #778: "shares of happiness" is the wrong measure; "servers of happiness" is better --------------------------------+------------------------------------------- Reporter: zooko | Owner: kevan Type: defect | Status: new Priority: critical | Milestone: 1.5.1 Component: code-peerselection | Version: 1.4.1 Keywords: reliability | Launchpad_bug: --------------------------------+------------------------------------------- Comment(by zooko): Yes, we definitely want tests. Check out [source:src/allmydata/test/no_network.py at 20090815112846-66853-7015fcf1322720ece28def7b8f2e4955b4689862#L242 GridTestMixin] and the way that it is used by [source:src/allmydata/test/test_repairer.py?rev=20091005221849-66853-3d1e85b7a2af40ddd07f07676ffb8f6dcc57d983#L397 test_repairer.py]. I guess we could use that pattern to set up a test grid with the shares stored on servers in a specific pattern such as discussed in comment:52 and comment:53. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Sat Oct 10 17:13:47 2009 From: trac at allmydata.org (tahoe-lafs) Date: Sun, 11 Oct 2009 00:13:47 -0000 Subject: [tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better In-Reply-To: <037.6c53986cb63378fa501159cafeed32e6@allmydata.org> References: <037.6c53986cb63378fa501159cafeed32e6@allmydata.org> Message-ID: <046.f090ef4b636f6bb4c8bbae483a07c996@allmydata.org> #778: "shares of happiness" is the wrong measure; "servers of happiness" is better --------------------------------+------------------------------------------- Reporter: zooko | Owner: kevan Type: defect | Status: new Priority: critical | Milestone: 1.5.1 Component: code-peerselection | Version: 1.4.1 Keywords: reliability | Launchpad_bug: --------------------------------+------------------------------------------- Comment(by zooko): Kevan and I chatted on IRC and we both independently thought that his patch shouldn't go in when it is susceptible to that "spurious failure to upload" problem, because if that problem were to occur the user would have no good way to work-around it and would be stuck. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From shawn at willden.org Sat Oct 10 18:41:09 2009 From: shawn at willden.org (Shawn Willden) Date: Sat, 10 Oct 2009 19:41:09 -0600 Subject: [tahoe-dev] detecting weak uploads and share rebalancing Re: Share rebalancing In-Reply-To: <2D46A0E6-1CAF-4B34-9971-4F78AA90F7B2@zooko.com> References: <200910100615.07599.shawn@willden.org> <200910101601.09627.shawn@willden.org> <2D46A0E6-1CAF-4B34-9971-4F78AA90F7B2@zooko.com> Message-ID: <200910101941.09308.shawn@willden.org> On Saturday 10 October 2009 05:36:20 pm Zooko Wilcox-O'Hearn wrote: > The patch that Kevan has written for #778 would cause weak uploads to > fail, as you desire, without changing your FEC parameters. It wouldn't change my FEC parameters? I thought his patch would change the current parameters to mean servers instead of shares, so I would think that setting H and N larger than the number of servers in the grid would be a problem. I have K = number_of_servers. > However > the current version of his patch has a problem -- in certain unusual > edge conditions it could cause a strong upload to fail. I haven't followed the patch very closely. I guess I should go read about it. A lot of the terminology in the description assumes knowledge of the code, though, so my perception is that it would take significant effort to understand. > > In any case, the more important part of my question was how to fix > > the present problem, rather than how to avoid it in the future :-) > > Ah, yes. So the process you described sounds like it would work, to > me, but for immutable files there is an easier way -- just copy the > share files to other servers. :-) I could do that, but I'd really like to get the shares to the *right* servers, per the permuted list, to minimize the likelihood that the repairer ends up placing shares poorly if the file has to be repaired. Shawn. From david-sarah at jacaranda.org Sat Oct 10 20:25:33 2009 From: david-sarah at jacaranda.org (David-Sarah Hopwood) Date: Sun, 11 Oct 2009 04:25:33 +0100 Subject: [tahoe-dev] "Elk Point" design for mutable, add-only, and immutable files In-Reply-To: <7125187E-4E35-4FF2-84F7-0542D9F46D2C@zooko.com> References: <200907162209.04414.shawn-tahoe@willden.org> <4AA4BAE5.1030705@lothar.com> <4AA62064.60904@jacaranda.org> <4AA75E06.8010301@lothar.com> <4AAA6A77.4080001@jacaranda.org> <4AABF714.6080900@lothar.com> <4AAC7A34.7070905@jacaranda.org> <4AADA29F.8000009@lothar.com> <4AAF0725.2030800@jacaranda.org> <7125187E-4E35-4FF2-84F7-0542D9F46D2C@zooko.com> Message-ID: <4AD1502D.8000202@jacaranda.org> Zooko Wilcox-O'Hearn wrote: > I've started a matrix of ways that an immutable file cap format could > break: http://allmydata.org/trac/tahoe/wiki/NewCaps/WhatCouldGoWrong > > Unfortunately I can't conveniently replicate the data into an email > message (except by sending HTML-formatted email, which I assume most > of you would hate and which I don't even know how to do). > > So go read this page! http://allmydata.org/trac/tahoe/wiki/NewCaps/ > WhatCouldGoWrong OK, I've added everything I can think of right now. Note the question in footnote 5: # 5. Brute force costs assume a single-target attack that is expected to # succeed with high probability. Costs will be lower for attacking # multiple targets or for a lower success probability. # (Should we give explicit formulae for this?) > Also pay attention to the "what crypto property do we rely on" > column. I wouldn't be surprised if SHA-256's collision-resistance is > increasingly called into question in future years. (On the other > hand I would be rather shocked if SHA-256's second-pre-image > resistance were called into question in the forseeable future.) I agree. Only attack #1 depends on collision resistance. -- David-Sarah Hopwood ? http://davidsarah.livejournal.com From zookog at gmail.com Sat Oct 10 20:38:16 2009 From: zookog at gmail.com (Zooko O'Whielacronx) Date: Sat, 10 Oct 2009 21:38:16 -0600 Subject: [tahoe-dev] detecting weak uploads and share rebalancing Re: Share rebalancing In-Reply-To: <200910101941.09308.shawn@willden.org> References: <200910100615.07599.shawn@willden.org> <200910101601.09627.shawn@willden.org> <2D46A0E6-1CAF-4B34-9971-4F78AA90F7B2@zooko.com> <200910101941.09308.shawn@willden.org> Message-ID: On Sat, Oct 10, 2009 at 7:41 PM, Shawn Willden wrote: > > It wouldn't change my FEC parameters? ?I thought his patch would change the > current parameters to mean servers instead of shares, so I would think that > setting H and N larger than the number of servers in the grid would be a > problem. ?I have K = number_of_servers. The idea has evolved quite a bit through the life of ticket #778 (and its offspring #791). The current idea uses the same K and N values to mean how many shares to generate, but reinterprets H to mean not "I'll be happy if at least H shares have been uploaded" but "I'll be happy if at least H servers are holding distinct shares". > I haven't followed the patch very closely. ?I guess I should go read about it. > A lot of the terminology in the description assumes knowledge of the code, > though, so my perception is that it would take significant effort to > understand. Well, maybe you could contribute by judging whether the docs make sense to you. We shouldn't commit the patch if understanding how to use it requires knowledge of the code (or at least, if it requires *more* knowledge of the code than the current behavior requires). http://allmydata.org/trac/tahoe/attachment/ticket/778/docs.txt > I could do that, but I'd really like to get the shares to the *right* servers, > per the permuted list, to minimize the likelihood that the repairer ends up > placing shares poorly if the file has to be repaired. Ah yes, this is an example of why I wish for #302. If Tahoe-LAFS just used the standard Chord ring that all modern distributed data systems use, then you could figure out which shares go to which servers by visually inspecting the storage indexes and the server ids. Regards, Zooko http://allmydata.org/trac/tahoe/ticket/302 # stop permuting peerlist, use SI as offset into ring instead? From zookog at gmail.com Sat Oct 10 20:53:19 2009 From: zookog at gmail.com (Zooko O'Whielacronx) Date: Sat, 10 Oct 2009 21:53:19 -0600 Subject: [tahoe-dev] "Elk Point" design for mutable, add-only, and immutable files In-Reply-To: <4AD1502D.8000202@jacaranda.org> References: <4AA62064.60904@jacaranda.org> <4AA75E06.8010301@lothar.com> <4AAA6A77.4080001@jacaranda.org> <4AABF714.6080900@lothar.com> <4AAC7A34.7070905@jacaranda.org> <4AADA29F.8000009@lothar.com> <4AAF0725.2030800@jacaranda.org> <7125187E-4E35-4FF2-84F7-0542D9F46D2C@zooko.com> <4AD1502D.8000202@jacaranda.org> Message-ID: On Sat, Oct 10, 2009 at 9:25 PM, David-Sarah Hopwood wrote: > > OK, I've added everything I can think of right now. :-) > Note the question in footnote 5: > > # 5. Brute force costs assume a single-target attack that is expected to > # ? ?succeed with high probability. Costs will be lower for attacking > # ? ?multiple targets or for a lower success probability. > # ? ?(Should we give explicit formulae for this?) We should think that issue through, along with the accompanying issue of "how low a chance of success is low enough". If there are 2^50 caps in use, and some technique can "attack all known caps at once", then do we need to increase the size of the caps (possibly by up to 50 bits) to make it so that the chance of success against *any* target is still negligible? Or is it just unreasonable to think that some adversary would spend massive amounts of computer power in order to forge some random cap out of a large set of caps? Anyway, writing down precisely what things are susceptible to that sort of "shotgun" attack seems necessary, which is why I added the "target" column. Regards, Zooko From david-sarah at jacaranda.org Sat Oct 10 21:03:03 2009 From: david-sarah at jacaranda.org (David-Sarah Hopwood) Date: Sun, 11 Oct 2009 05:03:03 +0100 Subject: [tahoe-dev] detecting weak uploads and share rebalancing Re: Share rebalancing In-Reply-To: References: <200910100615.07599.shawn@willden.org> <200910101601.09627.shawn@willden.org> <2D46A0E6-1CAF-4B34-9971-4F78AA90F7B2@zooko.com> <200910101941.09308.shawn@willden.org> Message-ID: <4AD158F7.9070501@jacaranda.org> Zooko O'Whielacronx wrote: > Ah yes, this is an example of why I wish for #302. If Tahoe-LAFS just > used the standard Chord ring that all modern distributed data systems > use, There are fielded systems using both Chord and Kademlia DHT designs, no? Kademlia is significantly different. > then you could figure out which shares go to which servers by > visually inspecting the storage indexes and the server ids. Yes. -- David-Sarah Hopwood ? http://davidsarah.livejournal.com From zookog at gmail.com Sun Oct 11 14:03:47 2009 From: zookog at gmail.com (Zooko O'Whielacronx) Date: Sun, 11 Oct 2009 15:03:47 -0600 Subject: [tahoe-dev] "Elk Point" design for mutable, add-only, and immutable files In-Reply-To: References: <4AA75E06.8010301@lothar.com> <4AAA6A77.4080001@jacaranda.org> <4AABF714.6080900@lothar.com> <4AAC7A34.7070905@jacaranda.org> <4AADA29F.8000009@lothar.com> <4AAF0725.2030800@jacaranda.org> <7125187E-4E35-4FF2-84F7-0542D9F46D2C@zooko.com> <4AD1502D.8000202@jacaranda.org> Message-ID: Nikita Borisov posted on twitter: "Do you need VCs to be generatable from RCs offline? If not, make RC=H2(VK), lookup VK to generate VC=H1(VK)" I haven't thought through this suggestion yet, but I thought I would post it in case I don't get time to do so anytime soon. Regards, Zooko From jamesd at echeque.com Sun Oct 11 17:23:36 2009 From: jamesd at echeque.com (James A. Donald) Date: Mon, 12 Oct 2009 10:23:36 +1000 Subject: [tahoe-dev] "Elk Point" design for mutable, add-only, and immutable files In-Reply-To: References: <4AA62064.60904@jacaranda.org> <4AA75E06.8010301@lothar.com> <4AAA6A77.4080001@jacaranda.org> <4AABF714.6080900@lothar.com> <4AAC7A34.7070905@jacaranda.org> <4AADA29F.8000009@lothar.com> <4AAF0725.2030800@jacaranda.org> <7125187E-4E35-4FF2-84F7-0542D9F46D2C@zooko.com> <4AD1502D.8000202@jacaranda.org> Message-ID: <4AD27708.5000402@echeque.com> Zooko O'Whielacronx wrote: > We should think that issue through, along with the accompanying issue > of "how low a chance of success is low enough". If there are 2^50 > caps in use, and some technique can "attack all known caps at once", > then do we need to increase the size of the caps (possibly by up to 50 > bits) to make it so that the chance of success against *any* target is > still negligible? Or is it just unreasonable to think that some > adversary would spend massive amounts of computer power in order to > forge some random cap out of a large set of caps? Obviously this depends on what caps are being used for. For what caps are *now* being used for, no one would to forge some random cap out of a very large set of caps. If caps were used for the purpose that the shared secret of a credit card is used for, *then* people would be interested in forging some random cap - but that is a new kind of cap, which could be defined with a new number of bits. From david-sarah at jacaranda.org Sun Oct 11 20:33:27 2009 From: david-sarah at jacaranda.org (David-Sarah Hopwood) Date: Mon, 12 Oct 2009 04:33:27 +0100 Subject: [tahoe-dev] "Elk Point" design for mutable, add-only, and immutable files In-Reply-To: References: <4AA75E06.8010301@lothar.com> <4AAA6A77.4080001@jacaranda.org> <4AABF714.6080900@lothar.com> <4AAC7A34.7070905@jacaranda.org> <4AADA29F.8000009@lothar.com> <4AAF0725.2030800@jacaranda.org> <7125187E-4E35-4FF2-84F7-0542D9F46D2C@zooko.com> <4AD1502D.8000202@jacaranda.org> Message-ID: <4AD2A387.5060304@jacaranda.org> Zooko O'Whielacronx wrote: > Nikita Borisov posted on twitter: > > "Do you need VCs [verify caps] to be generatable from RCs [read caps] offline? > If not, make RC=H2(VK), lookup VK to generate VC=H1(VK)" > > I haven't thought through this suggestion yet, but I thought I would > post it in case I don't get time to do so anytime soon. What was the motivation for this suggestion? I think Elk Point is already a refinement of this: the read and verify caps are both derived from hashes of (Dhash, V), but they share the T field, which increases the cost of roadblock attacks. And the key K1 is also input to the hash that generates R (based on your idea at ), which is what enables all bits of the read cap to act as integrity-checking bits. Also, Elk Point does allow a verify cap to be derived offline from a read cap. (If T is not sufficiently long then it must be a read cap that includes the U field, but that's unavoidable. Note that for immutable files, then to ensure collision resistance, T should be sufficiently long that a verify cap can be derived offline from any read cap.) -- David-Sarah Hopwood ? http://davidsarah.livejournal.com From david-sarah at jacaranda.org Sun Oct 11 21:05:19 2009 From: david-sarah at jacaranda.org (David-Sarah Hopwood) Date: Mon, 12 Oct 2009 05:05:19 +0100 Subject: [tahoe-dev] "Elk Point" design for mutable, add-only, and immutable files In-Reply-To: <4AD27708.5000402@echeque.com> References: <4AA62064.60904@jacaranda.org> <4AA75E06.8010301@lothar.com> <4AAA6A77.4080001@jacaranda.org> <4AABF714.6080900@lothar.com> <4AAC7A34.7070905@jacaranda.org> <4AADA29F.8000009@lothar.com> <4AAF0725.2030800@jacaranda.org> <7125187E-4E35-4FF2-84F7-0542D9F46D2C@zooko.com> <4AD1502D.8000202@jacaranda.org> <4AD27708.5000402@echeque.com> Message-ID: <4AD2AAFF.9080600@jacaranda.org> James A. Donald wrote: > Zooko O'Whielacronx wrote: >> We should think that issue through, along with the accompanying issue >> of "how low a chance of success is low enough". If there are 2^50 >> caps in use, and some technique can "attack all known caps at once", >> then do we need to increase the size of the caps (possibly by up to 50 >> bits) to make it so that the chance of success against *any* target is >> still negligible? Or is it just unreasonable to think that some >> adversary would spend massive amounts of computer power in order to >> forge some random cap out of a large set of caps? > > Obviously this depends on what caps are being used for. For what caps > are *now* being used for, no one would to forge some random cap out of a > very large set of caps. > > If caps were used for the purpose that the shared secret of a credit > card is used for, *then* people would be interested in forging some > random cap - but that is a new kind of cap, which could be defined with > a new number of bits. I disagree. I'm in favour of choosing conservative parameters from the start, on the basis that (as has been proven time and time again) users are in the habit of employing protocols in unusual and surprising ways *without* consulting any cryptographers or implementors, or changing any parameters from the defaults. That is, you simply cannot assume that you know how a protocol will be used, or reused -- or that you know what attackers' motivations might be. In any case, Tahoe is supposed to be a fully general-purpose file storage system, so we have no idea what will be stored in it (or for how long), even if it is only used for storage as such. However, I think that conventional wisdom about reasonable key and hash sizes does already factor in low-success-probablity and multiple-target attacks to *some* extent, although perhaps not quite conservatively enough. The page at now takes into account such attacks in the cost formulae: - the cost for attacks on encryption is proportional to the success probability - the cost for preimage attacks is proportional to the success probability divided by the number of targets - the cost for a collision attack is proportional to the square root of the success probability. So yes, adding 50 bits, or maybe 40 bits (to R, preferably) will effectively squash the multiple-target attacks, and low-success-probability attacks are probably adequately taken into account already by a 128-bit key. 50 bits is only about 8 or 9 characters in a base-62 URI; it's not so bad. -- David-Sarah Hopwood ? http://davidsarah.livejournal.com From david-sarah at jacaranda.org Sun Oct 11 21:23:13 2009 From: david-sarah at jacaranda.org (David-Sarah Hopwood) Date: Mon, 12 Oct 2009 05:23:13 +0100 Subject: [tahoe-dev] "Elk Point" design for mutable, add-only, and immutable files In-Reply-To: <7125187E-4E35-4FF2-84F7-0542D9F46D2C@zooko.com> References: <200907162209.04414.shawn-tahoe@willden.org> <4AA4BAE5.1030705@lothar.com> <4AA62064.60904@jacaranda.org> <4AA75E06.8010301@lothar.com> <4AAA6A77.4080001@jacaranda.org> <4AABF714.6080900@lothar.com> <4AAC7A34.7070905@jacaranda.org> <4AADA29F.8000009@lothar.com> <4AAF0725.2030800@jacaranda.org> <7125187E-4E35-4FF2-84F7-0542D9F46D2C@zooko.com> Message-ID: <4AD2AF31.9010807@jacaranda.org> Zooko Wilcox-O'Hearn wrote: > I've started a matrix of ways that an immutable file cap format could > break: http://allmydata.org/trac/tahoe/wiki/NewCaps/WhatCouldGoWrong [...] > Also pay attention to the "what crypto property do we rely on" > column. I wouldn't be surprised if SHA-256's collision-resistance is > increasingly called into question in future years. I agree, but note that you can only create colliding files once you know what attack to use -- unlike preimage attacks where you can target files that were created years ago. (This is of course no excuse for doing nothing to update many protocols and implementations until ten or more years after cracks start to appear, as happened with MD5.) -- David-Sarah Hopwood ? http://davidsarah.livejournal.com From trac at allmydata.org Mon Oct 12 02:21:52 2009 From: trac at allmydata.org (tahoe-lafs) Date: Mon, 12 Oct 2009 09:21:52 -0000 Subject: [tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better In-Reply-To: <037.6c53986cb63378fa501159cafeed32e6@allmydata.org> References: <037.6c53986cb63378fa501159cafeed32e6@allmydata.org> Message-ID: <046.e445fce9858f15d5a28f057f571c617c@allmydata.org> #778: "shares of happiness" is the wrong measure; "servers of happiness" is better --------------------------------+------------------------------------------- Reporter: zooko | Owner: kevan Type: defect | Status: new Priority: critical | Milestone: 1.5.1 Component: code-peerselection | Version: 1.4.1 Keywords: reliability | Launchpad_bug: --------------------------------+------------------------------------------- Comment(by kevan): Okay, I'm updating two patches. I updated my tests patch to include a test for the scenario Zooko proposed in comment:53. It's not _quite_ ideal (I need to figure out a way to make the {{{Tahoe2PeerSelector}}} pop server 0 off the peers list first for it to be perfect), but it fails with the current code. I also noticed that my {{{_servers_with_unique_shares}}} method in [http://allmydata.org/trac/tahoe/browser/src/allmydata/immutable/upload.py?rev=4045 upload.py] was comparing peerids with things that weren't peerids, so I made a minor change to the behavior.txt patch to address that. My todo list is basically: * Add a test for the scenario I propose in comment:52 * Design + implement changes to the peer selection algorithm to address the scenario in comment:53. I welcome any comments. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From zookog at gmail.com Mon Oct 12 15:45:18 2009 From: zookog at gmail.com (Zooko O'Whielacronx) Date: Mon, 12 Oct 2009 16:45:18 -0600 Subject: [tahoe-dev] MapReduce over Tahoe-LAFS slides from HadoopWorld Message-ID: Wow! Check out these great slides from Aaron Cordova's talk about HadoopWorld! http://www.cloudera.com/sites/all/themes/cloudera/static/hw09/3%20%20-%202-30%20Aaron%20Cordova,%20BAH,%20HadoopWorldComplete.pdf They clearly explain why you might want to use Tahoe-LAFS instead of HDFS to store the data that you are computing over with MapReduce. They also include some performance comparisons between Tahoe-LAFS and HDFS under MapReduce. Aaron: if you're reading this, could you give us more information about how Tahoe-LAFS gets used by MapReduce? The write performance was terrible. Regards, Zooko From david-sarah at jacaranda.org Mon Oct 12 19:47:56 2009 From: david-sarah at jacaranda.org (David-Sarah Hopwood) Date: Tue, 13 Oct 2009 03:47:56 +0100 Subject: [tahoe-dev] MapReduce over Tahoe-LAFS slides from HadoopWorld In-Reply-To: References: Message-ID: <4AD3EA5C.7040009@jacaranda.org> Zooko O'Whielacronx wrote: > Wow! Check out these great slides from Aaron Cordova's talk about HadoopWorld! > > http://www.cloudera.com/sites/all/themes/cloudera/static/hw09/3%20%20-%202-30%20Aaron%20Cordova,%20BAH,%20HadoopWorldComplete.pdf I notice that slide 10 ("Erasure Coding Overview"), when it says that "Up to n-k machines can fail, be compromised, or malicious without data loss", is assuming a property that would only actually be provided by the fix to . -- David-Sarah Hopwood ? http://davidsarah.livejournal.com From david-sarah at jacaranda.org Mon Oct 12 19:55:07 2009 From: david-sarah at jacaranda.org (David-Sarah Hopwood) Date: Tue, 13 Oct 2009 03:55:07 +0100 Subject: [tahoe-dev] MapReduce over Tahoe-LAFS slides from HadoopWorld In-Reply-To: <4AD3EA5C.7040009@jacaranda.org> References: <4AD3EA5C.7040009@jacaranda.org> Message-ID: <4AD3EC0B.60401@jacaranda.org> David-Sarah Hopwood wrote: > Zooko O'Whielacronx wrote: >> Wow! Check out these great slides from Aaron Cordova's talk about HadoopWorld! >> >> http://www.cloudera.com/sites/all/themes/cloudera/static/hw09/3%20%20-%202-30%20Aaron%20Cordova,%20BAH,%20HadoopWorldComplete.pdf > > I notice that slide 10 ("Erasure Coding Overview"), when it says that > "Up to n-k machines can fail, be compromised, or malicious without > data loss", is assuming a property that would only actually be provided > by the fix to . ... actually, not even then. I think that servers_of_happiness needs to be equal to n for the statement on the slide to be correct -- is that right? -- David-Sarah Hopwood ? http://davidsarah.livejournal.com From zookog at gmail.com Mon Oct 12 20:36:36 2009 From: zookog at gmail.com (Zooko O'Whielacronx) Date: Mon, 12 Oct 2009 21:36:36 -0600 Subject: [tahoe-dev] Sunday October 18: Patch Reviewing Party! Message-ID: Folks: My brother Nathan and I are going to meet on irc.freenode.net, channel #tahoe this coming Sunday October 18 at about 15:00 UTC to review as many patches as we can. You're invited! You are probably qualified to review patches! (Basically, a working knowledge of Python is all that you need for most tickets.) If you want to help, go to http://allmydata.org and click on the button at the top labelled "View Tickets". That will take you to http://allmydata.org/trac/tahoe/wiki/ViewTickets . Click the hyperlink labelled "tickets for review". That will take you to http://allmydata.org/trac/tahoe/query?status=!closed&order=priority&keywords=~review . Now read some tickets! If you want to contribute a patch to Tahoe-LAFS, see if you can get your patch uploaded by Sunday so that we won't run out of tickets to review. :-) Regards, Zooko From kevan at isnotajoke.com Mon Oct 12 20:52:16 2009 From: kevan at isnotajoke.com (Kevan Carstensen) Date: Mon, 12 Oct 2009 20:52:16 -0700 Subject: [tahoe-dev] MapReduce over Tahoe-LAFS slides from HadoopWorld In-Reply-To: <4AD3EC0B.60401@jacaranda.org> References: <4AD3EA5C.7040009@jacaranda.org> <4AD3EC0B.60401@jacaranda.org> Message-ID: <4AD3F970.1060302@isnotajoke.com> On 10/12/09 7:55 PM David-Sarah Hopwood wrote: > ... actually, not even then. I think that servers_of_happiness needs to > be equal to n for the statement on the slide to be correct -- is that > right? I think so. servers_of_happiness = h guarantees that a successuful upload will place at least one distinct share on each of at least h distinct servers. This means that at least h - k (and possibly more [1]) servers can be lost without any data loss. If n = h, then n - k servers can be lost. Note that this relation only makes sense for h >= k, since h < k would yield negative servers. That case is probably best interpreted (along with h = k) as not assuring any integrity in the event of server failure. [1] The "and possibly more" comes from the rather loose coupling between the peer selection process and the servers_of_happiness behavior. If possible, tahoe-LAFS will attempt to place each of n shares on its own distinct server -- if this succeeds in the case where h is smaller than n, then it is possible to lose more than h servers without losing the file. -- Kevan Carstensen | From zooko at zooko.com Tue Oct 13 06:23:49 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Tue, 13 Oct 2009 07:23:49 -0600 Subject: [tahoe-dev] "servers of happiness" Re: MapReduce over Tahoe-LAFS slides from HadoopWorld In-Reply-To: <4AD3F970.1060302@isnotajoke.com> References: <4AD3EA5C.7040009@jacaranda.org> <4AD3EC0B.60401@jacaranda.org> <4AD3F970.1060302@isnotajoke.com> Message-ID: I asked my wife Amber for help formalizing my intuition about what sort of share placement makes me happy. We came up with this: First of all, let's call a set of servers "sufficient" if you can download the file from that set (i.e. if at least K distinct shares are hosted in that set of servers). Now consider the largest set of servers such that every K-sized subset of it is sufficient. Let's call the size of that largest set S. Now my intuition about "Happyness" is that I configure a Happyness number H, and if an upload results in an S >= H then I'm happy. I think this is also Robert Metcalf's intuition [1]. It may also be Shawn Willden's intuition [2], but on the other hand perhaps Shawn Willden's intuition is something more sophisticated. ;-) A neat thing about this way of thinking is that the number S is the "health" or "robustness" of the file. An upload or a file-check operation could report S to the user. What do you think -- is this measure of "health" a good enough measure for the purposes of ticket #778? Regards, Zooko [1] http://allmydata.org/pipermail/tahoe-dev/2009-August/002494.html [2] http://allmydata.org/pipermail/tahoe-dev/2009-October/002972.html http://allmydata.org/trac/tahoe/ticket/778 # "shares of happiness" is the wrong measure; "servers of happiness" is better From zooko at zooko.com Tue Oct 13 08:23:11 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Tue, 13 Oct 2009 09:23:11 -0600 Subject: [tahoe-dev] patch review process Message-ID: Why review patches: We want more patches to be contributed to Tahoe-LAFS. Getting feedback on patches encourages contributors. Patches languishing in the "waiting to be reviewed" set discourages them. (By the way, something else that encourages them is users saying "Thank you.".) Who can review patches: Pretty much anyone reading this! Knowledge of Python is helpful, but some patches are so simple that reviewing them is a reasonable task for a beginner who is just learning Python. Some patches require more specialized knowledge to review, but most don't. How to review patches: 1. Go to http://allmydata.org . Click on "View Tickets", which will take you to http://allmydata.org/trac/tahoe/wiki/ViewTickets . Click on "tickets for review" which will take you to http://allmydata.org/ trac/tahoe/query?status=%21closed&order=priority&keywords=%7Ereview . 2. You can read everything without registering, but to add comments or change tickets you have to be logged in. Registering is quick and easy -- click the "Register" link at the top right of the page. 3. Read tickets until you find one that you can review. 4. (optional) Click "accept". This marks you as the person reviewing this patch. If you don't want to commit to this then you can skip this step. 5. Read the patch until you understand all of the docs, tests, code and comments in it. You can use the "Browse source" button at the top of the page to read the current versions of the files that the patch changes. 5.a. If you can't understand the patch after spending some time on it, then say so in a comment on the ticket! This might be taken as a reason to add documentation or comments or to refactor the code. On the other hand, it might just be that you don't have enough context to understand the code. That's okay too. 5.b. If you find errors or omissions in the docs, tests, code or comments then write that down in the ticket, remove the "review" keyword from the keywords, and assign the ticket to someone other than yourself. (That would be the original author of the patch, or someone who seems likely to fix the patch, or if you can't think of anyone better then assign it to me.) 5.c. If you understand the patch and find no errors or omissions then remove the keyword "review", add the keyword "reviewed" and assign it to me. I'll commit it to trunk. 6. Feel good about yourself. Thank you for helping with our little project attempting to improve the world! Regards, Zooko From kevan at isnotajoke.com Tue Oct 13 20:25:35 2009 From: kevan at isnotajoke.com (Kevan Carstensen) Date: Tue, 13 Oct 2009 20:25:35 -0700 Subject: [tahoe-dev] "servers of happiness" Re: MapReduce over Tahoe-LAFS slides from HadoopWorld In-Reply-To: References: <4AD3EA5C.7040009@jacaranda.org> <4AD3EC0B.60401@jacaranda.org> <4AD3F970.1060302@isnotajoke.com> Message-ID: <4AD544AF.3030707@isnotajoke.com> On 10/13/09 6:23 AM Zooko Wilcox-O'Hearn wrote: > What do you think -- is this measure of "health" a good enough > measure for the purposes of ticket #778? I think that your intuition of health is also covered by the current design of #778 -- that is, a mechanism that doesn't declare an upload successful unless 1. Shares from the file went to at least h peers, and 2. Any k of the h peers are sufficient to reconstruct the file. (I probably should have said this last night; apologies for my clumsy wording) My current patches check to make sure that shares were distributed to at least h servers before declaring the upload successful. To see that shares are not multiply distributed, and that they are not overcounted if they already exist, see comment 52 [1] on issue #778; this explains how the peer selection algorithm deals with these problems. If you're prepared to believe that this check is enough to check these criteria, you can skip the proof. I keep becoming unconvinced of this, and I wrote the patches, so I figured it was worthwhile to include a proof. ---- Proof ---- (There are, at present, undercounting issues, but I don't think those affect what I'm going to say) Then a successful upload with my current patches guarantees that there is a set of servers T with cardinality at least h such that: 1. Each server in T has received at least one share, and 2. There is an injective function between servers in T and shares of f. (I'm probably hideously overcomplicating this, but none of the other ways of saying that that I could come up with did the job. Also, I'm assuming the existence of the bijection from the discussion in #778 (which has at least convinced me of its existence). If I'm missing something, let me know) We can show that these guarantees are sufficient to satisfy the criteria of health defined in #778. Let k and n be encoding parameters. Let h be servers_of_happiness. Let X be the set of shares of f. Suppose that k >= h. Let m : T -> X be an injection, as described above. Suppose that a file f has successfully been uploaded to the grid using the implementation of #778. Then there is a set of servers T such that |T| >= h, as described above. Because T exists, we know that the first criterion -- that at least h servers have stored shares of the file -- is satisfied. Let U = {u_1, u_2, ..., u_k} be an arbitrary k-element subset of T. Since m is an injection, there is one Y = {m(u_1), m(u_2), ..., m(u_n)}, which is a k-element set of shares. Since the shares in X are distinct (for the purposes of file reconstruction), and the range of m is X, we know that the shares in Y are distinct. Then Y is a k-element set of distinct shares. Since k distinct shares are necessary to reconstruct f, Y is sufficient to reconstruct f, as required. Then an arbitrary k-element subset of T is sufficient to reconstruct f, as required by the second criterion. ---- End Proof ---- So I guess I've shown (or maybe not; maybe my proof has an error) that the patches in #778 do what they are supposed to. I think the mapping from what they're supposed to do to your definition of health is pretty clear: if I know that any k of a set of h servers in those that originally received the upload are sufficient to reconstruct the file, then I know that h is at least a lower bound on the true health value, as required by your criteria. Does this make sense? I like that your measurement of health gives an actual health value, and not just a lower bound. It'd be nice to be able to report something like that to the user. I wonder how we'd calculate it, though. Maybe the _servers_with_unique_shares number would correspond to it? (apologies for the long email -- I've been meaning to try to prove that for a while, and this seemed like a good excuse to try) [1] http://allmydata.org/trac/tahoe/ticket/778#comment:52 -- Kevan Carstensen | From zooko at zooko.com Tue Oct 13 20:34:18 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Tue, 13 Oct 2009 21:34:18 -0600 Subject: [tahoe-dev] on discovering that a hash function wasn't secure after all -- was: Re: "Elk Point" design for mutable, add-only, and immutable files In-Reply-To: <4AD2AF31.9010807@jacaranda.org> References: <200907162209.04414.shawn-tahoe@willden.org> <4AA4BAE5.1030705@lothar.com> <4AA62064.60904@jacaranda.org> <4AA75E06.8010301@lothar.com> <4AAA6A77.4080001@jacaranda.org> <4AABF714.6080900@lothar.com> <4AAC7A34.7070905@jacaranda.org> <4AADA29F.8000009@lothar.com> <4AAF0725.2030800@jacaranda.org> <7125187E-4E35-4FF2-84F7-0542D9F46D2C@zooko.com> <4AD2AF31.9010807@jacaranda.org> Message-ID: On Sunday,2009-10-11, at 22:23 , David-Sarah Hopwood wrote: >> Also pay attention to the "what crypto property do we rely on" >> column. I wouldn't be surprised if SHA-256's collision-resistance >> is increasingly called into question in future years. > > I agree, but note that you can only create colliding files once you > know what attack to use -- unlike preimage attacks where you can > target files that were created years ago. That's a good point, but we can't rely on that too much, because how do we know that the first person to discover collisions immediately published their results? Xiaoyun Wang announced how to find collisions in MD5 at the Crypto 2004 conference, but we don't know for sure that Wang was the first person to figure out how to do that. (As an aside, Wang was a Chinese national working at a Chinese university. Why didn't Chinese military/intelligence keep her discovery for themselves? My assumption is that they never noticed until too late. If they had monopolized that discovery and rediscovered Stevens et al. 2009 [1] then they could have had a root certificate to the Internet -- something that normally only the USA military/intelligence agencies are supposed to have.) So if someone gives you an immutable file cap built with SHA-256 in 2010, and then in 2020 a method is published for generating collisions in SHA-256, then if you want to be sure that the file is not a shape-shifter file you have to cast your mind back to 2010 and think to yourself "How sure am I that the generation of this cap wasn't performed by someone who knew this trick all along back in 2010?". :-) This is why I think it is useful to use precise terminology when talking about our evolving understanding of secure hash functions. It is tempting to speak loosely and say that MD5 was "secure" until 2004 and then it became "insecure", but that is making assumptions about who knew what in 2003. To be more precise, you have to say something like "In 2003 no way to generate collisions in MD5 was known to the public.". I know a cryptographer who claims to know an ex-KGB man who claims that he could generate preimages of MD5 in 1994. Sounds crazy right!? But I can't disprove it. And it sounds a lot less crazy now that Wang, Klima et al. have shown how to generate an MD5 collision in under a minute on a laptop. Regards, Zooko [1] http://www.win.tue.nl/hashclash/rogue-ca/ From kevan at isnotajoke.com Tue Oct 13 21:37:58 2009 From: kevan at isnotajoke.com (Kevan Carstensen) Date: Tue, 13 Oct 2009 21:37:58 -0700 Subject: [tahoe-dev] "servers of happiness" Re: MapReduce over Tahoe-LAFS slides from HadoopWorld In-Reply-To: <4AD544AF.3030707@isnotajoke.com> References: <4AD3EA5C.7040009@jacaranda.org> <4AD3EC0B.60401@jacaranda.org> <4AD3F970.1060302@isnotajoke.com> <4AD544AF.3030707@isnotajoke.com> Message-ID: <4AD555A6.7070100@isnotajoke.com> On 10/13/09 8:25 PM Kevan Carstensen wrote: > 2. There is an injective function between servers in T and shares of > f. This should probably be "We can construct an injection between servers in T and shares of f" -- otherwise, this statement might imply that exactly one share of each file is stored on a given server, which is not always the case. (the injection, if that is unclear, is between servers and shares stored on those servers) -- Kevan Carstensen | From david-sarah at jacaranda.org Tue Oct 13 21:58:16 2009 From: david-sarah at jacaranda.org (David-Sarah Hopwood) Date: Wed, 14 Oct 2009 05:58:16 +0100 Subject: [tahoe-dev] "servers of happiness" In-Reply-To: References: <4AD3EA5C.7040009@jacaranda.org> <4AD3EC0B.60401@jacaranda.org> <4AD3F970.1060302@isnotajoke.com> Message-ID: <4AD55A68.60108@jacaranda.org> Zooko Wilcox-O'Hearn wrote: > I asked my wife Amber for help formalizing my intuition about what > sort of share placement makes me happy. We came up with this: > > First of all, let's call a set of servers "sufficient" if you can > download the file from that set (i.e. if at least K distinct shares > are hosted in that set of servers). Call it K-sufficient. > Now consider the largest set of servers such that every K-sized > subset of it is sufficient. > Let's call the size of that largest set S. This largest set isn't necessarily unique. I think you mean: A set U is K-happy iff every K-sized subset of U is K-sufficient. Let S be the largest integer such that there exists a K-happy set of size S. Then S is unique, even though there may be more than one K-happy set of that size. > Now my intuition about > "Happyness" is that I configure a Happyness number H, and if an > upload results in an S >= H then I'm happy. Note that this only makes sense for H >= K. If H < K then consider an arbitrary set U of size H: it is vacuously K-happy, because it has no K-sized subsets. Therefore S >= H, because there must exist K-happy sets at least as large as U. This is despite the fact there the file may not be downloadable from any set of servers. Anyway, your intuition, if it is correct, is equivalent to the simpler statement: I am happy iff there exists a K-happy set of size H. because if there exists a K-happy set of any size S >= H, then there exists a K-happy set of size H. (This follows from the fact that if U' is a subset of U, then the set of K-sized subsets of U' is a subset of the set of K-sized subsets of U. Therefore, if there is a K-happy set of size S then all subsets of it are K-happy, and therefore there are K-happy sets of sizes 0..S-1.) -- David-Sarah Hopwood ? http://davidsarah.livejournal.com From david-sarah at jacaranda.org Tue Oct 13 22:29:42 2009 From: david-sarah at jacaranda.org (David-Sarah Hopwood) Date: Wed, 14 Oct 2009 06:29:42 +0100 Subject: [tahoe-dev] on discovering that a hash function wasn't secure after all -- was: Re: "Elk Point" design for mutable, add-only, and immutable files In-Reply-To: References: <200907162209.04414.shawn-tahoe@willden.org> <4AA4BAE5.1030705@lothar.com> <4AA62064.60904@jacaranda.org> <4AA75E06.8010301@lothar.com> <4AAA6A77.4080001@jacaranda.org> <4AABF714.6080900@lothar.com> <4AAC7A34.7070905@jacaranda.org> <4AADA29F.8000009@lothar.com> <4AAF0725.2030800@jacaranda.org> <7125187E-4E35-4FF2-84F7-0542D9F46D2C@zooko.com> <4AD2AF31.9010807@jacaranda.org> Message-ID: <4AD561C6.7000000@jacaranda.org> Zooko Wilcox-O'Hearn wrote: [...] > This is why I think it is useful to use precise terminology when > talking about our evolving understanding of secure hash functions. > It is tempting to speak loosely and say that MD5 was "secure" until > 2004 and then it became "insecure", but that is making assumptions > about who knew what in 2003. To be more precise, you have to say > something like "In 2003 no way to generate collisions in MD5 was > known to the public.". > > I know a cryptographer who claims to know an ex-KGB man who claims > that he could generate preimages of MD5 in 1994. Sounds crazy > right!? *Preimages*? That does sound crazy. I don't put much weight on conspiracy theories about how intelligence agencies are supposedly way ahead of the public state of the art. OTOH, MD5 should be considered to have been broken for collision resistance in 1993, when Den Boer and Bosselaers found pseudo-collisions in the compression function. I don't understand why so many people dismiss "theoretical" attacks such as pseudo-collisions as unimportant, when they clearly show that the design goals have not been met. At the very latest, it was broken in 1996, when actual collisions in the compression function were found. Since the Merkle-Damg?rd construction's proof of security depends on the compression function being collision-resistant, from that point on there was no reason to trust the collision resistance of MD5 as a whole. -- David-Sarah Hopwood ? http://davidsarah.livejournal.com From zooko at zooko.com Wed Oct 14 06:01:02 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Wed, 14 Oct 2009 07:01:02 -0600 Subject: [tahoe-dev] on discovering that a hash function wasn't secure after all -- was: Re: "Elk Point" design for mutable, add-only, and immutable files In-Reply-To: <4AD561C6.7000000@jacaranda.org> References: <200907162209.04414.shawn-tahoe@willden.org> <4AA4BAE5.1030705@lothar.com> <4AA62064.60904@jacaranda.org> <4AA75E06.8010301@lothar.com> <4AAA6A77.4080001@jacaranda.org> <4AABF714.6080900@lothar.com> <4AAC7A34.7070905@jacaranda.org> <4AADA29F.8000009@lothar.com> <4AAF0725.2030800@jacaranda.org> <7125187E-4E35-4FF2-84F7-0542D9F46D2C@zooko.com> <4AD2AF31.9010807@jacaranda.org> <4AD561C6.7000000@jacaranda.org> Message-ID: <38320336-6C65-488F-8602-47FBB021C468@zooko.com> I don't mean to say that national intelligence agencies are way ahead of the public state of the art in cryptography. Whether they are or not is an assumption we have to make. The point is to make that assumption explicit and to remember that it could turn out to be wrong. Regards, Zooko From david-sarah at jacaranda.org Wed Oct 14 22:02:32 2009 From: david-sarah at jacaranda.org (David-Sarah Hopwood) Date: Thu, 15 Oct 2009 06:02:32 +0100 Subject: [tahoe-dev] Avoiding multicollision attacks against Elk Point Message-ID: <4AD6ACE8.3060604@jacaranda.org> A "multicollision attack" is an attack that can find many collisions for a hash function, in only logarithmically greater time than a single collision. The following paper: Antoine Joux, "Multicollisions in Iterated Hash Functions. Application to Cascaded Constructions" describes how to do this for iterated hashes, including Merkle-Damg?rd hashes such as SHA-* (the attack is quite simple and easy to understand). [Some improvements are described in , although the newest attack described in that paper isn't applicable.] The Elk Point protocol uses essentially the following construction: R = hash_r(V, K1) T = hash_t(encrypt[R](K1), V) which is intended to be as secure (for collision, preimage, and second-preimage resistance) as a single hash on V and K1 with output size r+t bits. Here's the problem: suppose that hash_r is an iterated hash with an r-bit chaining value. Then Joux's paper shows how to perform a multicollision attack on hash_r, finding 2^(t/2) collisions, with only approximately (2^(r/2)).t/2 work. Then it is likely that two of those 2^(t/2) collisions will also collide in T (note that this doesn't depend at all on how T is computed, just that it is some t-bit function of R, K1 and V). So the overall cost of a collision attack on the combined hash is only (2^(r/2)).t/2, not 2^((r+t)/2). For example, if r = 128 and t = 128, the cost of a collision attack is only 2^64 * 64 = 2^70, rather than the expected 2^128. However, note that this attack depends completely on the fact that hash_r uses an r-bit chaining value. If hash_r is actually a truncation of a hash with a z-bit chaining value, then the attack requires 2^(z/2) work. More precisely, it requires whatever work is needed for a collision attack on the untruncated hash, provided that the attack works with sufficient probability for an arbitrary chaining value. Therefore, to avoid the attack we only need to ensure that z >= r+t (preferably with some margin, in case there is a better attack on the untruncated hash than brute force). If hash_r is SHA-256 or SHA-512 truncated to r bits, for example, then this multicollision attack is not a weak point, as long as untruncated SHA-256 or SHA-512 has no collision attack easier than 2^((r+t))/2). The preimage attack of section 4.2 of Joux's paper also applies. Again, it is foiled completely by the larger chaining value when using a truncated hash. (Note that hash_t was already defined as a truncated hash.) As a belt-and-suspenders defense, it may be a good idea to ensure that the input to the hash fits in a single block. This would completely eliminate the possibility of attacks that rely on the Merkle-Damg?rd structure. The maximum whole number of bytes that will fit in one SHA-512 block (taking into account padding) is 111 bytes. If KC_verify is an ECDSA public key then the input will fit, but non-elliptic-curve keys would not. I will include an intermediate hash in the next version of the protocol, so that there is no limitation on the public key size. Incidentally, this actually makes me a little more confident in the security of the protocol. Being able to construct a longer hash from a short one this easily, seemed a bit too much like getting something for nothing (if it were so easy to construct long hashes from short ones, why are no existing hashes built that way?) When using truncated hashes for hash_r and hash_t, OTOH, we are not attempting to get any greater collision or preimage resistance than the hash we started with. I have added a note about this attack to . -- David-Sarah Hopwood ? http://davidsarah.livejournal.com From david-sarah at jacaranda.org Wed Oct 14 23:55:59 2009 From: david-sarah at jacaranda.org (David-Sarah Hopwood) Date: Thu, 15 Oct 2009 07:55:59 +0100 Subject: [tahoe-dev] Avoiding multicollision attacks against Elk Point [minor correction] In-Reply-To: <4AD6ACE8.3060604@jacaranda.org> References: <4AD6ACE8.3060604@jacaranda.org> Message-ID: <4AD6C77F.5040300@jacaranda.org> David-Sarah Hopwood wrote: [...] > However, note that this attack depends completely on the fact that hash_r > uses an r-bit chaining value. If hash_r is actually a truncation of a hash > with a z-bit chaining value, then the attack requires 2^(z/2) work. > More precisely, it requires ... at least ... > whatever work is needed for a collision > attack on the untruncated hash, provided that the attack works with > sufficient probability for an arbitrary chaining value. -- David-Sarah Hopwood ? http://davidsarah.livejournal.com From david-sarah at jacaranda.org Thu Oct 15 00:02:49 2009 From: david-sarah at jacaranda.org (David-Sarah Hopwood) Date: Thu, 15 Oct 2009 08:02:49 +0100 Subject: [tahoe-dev] "servers of happiness" Re: MapReduce over Tahoe-LAFS slides from HadoopWorld In-Reply-To: <4AD544AF.3030707@isnotajoke.com> References: <4AD3EA5C.7040009@jacaranda.org> <4AD3EC0B.60401@jacaranda.org> <4AD3F970.1060302@isnotajoke.com> <4AD544AF.3030707@isnotajoke.com> Message-ID: <4AD6C919.8070104@jacaranda.org> Kevan Carstensen wrote: > On 10/13/09 6:23 AM Zooko Wilcox-O'Hearn wrote: >> What do you think -- is this measure of "health" a good enough >> measure for the purposes of ticket #778? > > I think that your intuition of health is also covered by the current > design of #778 -- that is, a mechanism that doesn't declare an upload > successful unless > > 1. Shares from the file went to at least h peers, and > 2. Any k of the h peers are sufficient to reconstruct the file. Sorry to nitpick, but there's an ambiguity here: any k of *which* h peers? Or do you mean: 2. Any k of the peers to which shares went are sufficient to reconstruct the file. ? -- David-Sarah Hopwood ? http://davidsarah.livejournal.com From trac at allmydata.org Thu Oct 15 09:51:52 2009 From: trac at allmydata.org (tahoe-lafs) Date: Thu, 15 Oct 2009 16:51:52 -0000 Subject: [tahoe-dev] [tahoe-lafs] #814: v1.4.1 storage servers sending a negative number for maximum-immutable-share-size? Message-ID: <037.0b7d4a8d57241b62d073033e468734e4@allmydata.org> #814: v1.4.1 storage servers sending a negative number for maximum-immutable- share-size? --------------------------+------------------------------------------------- Reporter: zooko | Owner: Type: defect | Status: new Priority: major | Milestone: undecided Component: code-storage | Version: 1.4.1 Keywords: | Launchpad_bug: --------------------------+------------------------------------------------- I'm looking at the logs of allmydata.com user gar5 and I see this. I believe the storage servers in question are tahoe-lafs v1.4.1 storage servers operated by allmydata.com: {{{ 05:16:27.571 [3121]: connected to pfavfmv3, version {'http://allmydata.org/tahoe/protocols/storage/v1': {'maximum-immutable- share-size': -5412811776L, 'tolerates-immutable-read-overrun': True, 'delete-mutable-shares-with-zero-length-writev': True}, 'application- version': 'tahoe-server/1.4.1'} 05:16:27.572 [3122]: connected to 5ws3m43h, version {'http://allmydata.org/tahoe/protocols/storage/v1': {'maximum-immutable- share-size': -5369373696L, 'tolerates-immutable-read-overrun': True, 'delete-mutable-shares-with-zero-length-writev': True}, 'application- version': 'tahoe-server/1.4.1'} 05:16:27.573 [3123]: connected to hndn3zdv, version {'http://allmydata.org/tahoe/protocols/storage/v1': {'maximum-immutable- share-size': -5419058176L, 'tolerates-immutable-read-overrun': True, 'delete-mutable-shares-with-zero-length-writev': True}, 'application- version': 'tahoe-server/1.4.1'} 05:16:27.573 [3124]: connected to gsulyyiv, version {'http://allmydata.org/tahoe/protocols/storage/v1': {'maximum-immutable- share-size': -5401027584L, 'tolerates-immutable-read-overrun': True, 'delete-mutable-shares-with-zero-length-writev': True}, 'application- version': 'tahoe-server/1.4.1'} 05:16:27.574 [3125]: connected to 7yun2nc3, version {'http://allmydata.org/tahoe/protocols/storage/v1': {'maximum-immutable- share-size': -5405422592L, 'tolerates-immutable-read-overrun': True, 'delete-mutable-shares-with-zero-length-writev': True}, 'application- version': 'tahoe-server/1.4.1'} }}} -- Ticket URL: tahoe-lafs secure decentralized file storage grid From kevan at isnotajoke.com Thu Oct 15 11:09:36 2009 From: kevan at isnotajoke.com (Kevan Carstensen) Date: Thu, 15 Oct 2009 11:09:36 -0700 Subject: [tahoe-dev] "servers of happiness" Re: MapReduce over Tahoe-LAFS slides from HadoopWorld In-Reply-To: <4AD6C919.8070104@jacaranda.org> References: <4AD3EA5C.7040009@jacaranda.org> <4AD3EC0B.60401@jacaranda.org> <4AD3F970.1060302@isnotajoke.com> <4AD544AF.3030707@isnotajoke.com> <4AD6C919.8070104@jacaranda.org> Message-ID: <4AD76560.70003@isnotajoke.com> On 10/15/09 12:02 AM David-Sarah Hopwood wrote: > Or do you mean: > 2. Any k of the peers to which shares went are sufficient to > reconstruct the file. In most cases, this is correct. It breaks down if there are two peers in the grid that store the same share, though -- in this case, we cannot guarantee that formulation of 2. More formally: 1. There exists a set of peers with at least size h such that an injection can be created between peers in that set and shares 2. Any k of the peers in that set is sufficient to reconstruct the file. Though, worded like that, the second condition is redundant -- if my proof isn't wrong, I've shown that the first condition implies the second. So maybe it is just 1. There exists a set of peers with cardinality at least h such that an injection can be crafted between peers in the set and shares. If it is pared down to that, though, I think I prefer Zooko's way of thinking about health -- it goes much farther in actually explaining what a health value of h actually means. Thanks for the question -- it is useful to clarify these things, especially after a lengthy discussion such as in #778. -- Kevan Carstensen | From trac at allmydata.org Thu Oct 15 15:24:30 2009 From: trac at allmydata.org (tahoe-lafs) Date: Thu, 15 Oct 2009 22:24:30 -0000 Subject: [tahoe-dev] [tahoe-lafs] #814: v1.4.1 storage servers sending a negative number for maximum-immutable-share-size? In-Reply-To: <037.0b7d4a8d57241b62d073033e468734e4@allmydata.org> References: <037.0b7d4a8d57241b62d073033e468734e4@allmydata.org> Message-ID: <046.f0db50be21eda3f131ea376af03470d9@allmydata.org> #814: v1.4.1 storage servers sending a negative number for maximum-immutable- share-size? --------------------------+------------------------------------------------- Reporter: zooko | Owner: Type: defect | Status: new Priority: major | Milestone: undecided Component: code-storage | Version: 1.4.1 Keywords: | Launchpad_bug: --------------------------+------------------------------------------------- Comment(by warner): I'd suspect a problem with the how-much-space-is-left calculation, since I think the maximum-immutable-share-size value that each server reports is actually min(format_support, unreserved_space_left). If the server is overfull and there's not a clamp at zero, you could get this sort of thing. I also vaguely remember fixing this problem at some point, maybe after 1.4.1? -- Ticket URL: tahoe-lafs secure decentralized file storage grid From kpreid at mac.com Sat Oct 17 04:36:59 2009 From: kpreid at mac.com (Kevin Reid) Date: Sat, 17 Oct 2009 07:36:59 -0400 Subject: [tahoe-dev] Automatic Tahoe file repair Message-ID: <81E8CBFC-D63C-49EB-A57F-A5667175036B@mac.com> My local Tahoe node is configured for the volunteer grid. This is a script I wrote I call "tahoe-repair-all": ------------------------------------------------------------------------ #!/bin/sh for item in `tahoe list-aliases | cut -f 1 -d :`; do echo '*** '"$item" tahoe deep-check --repair --add-lease $item: | perl -pe 's/^/\t/' echo done ------------------------------------------------------------------------ This is my crontab entry for it: ------------------------------------------------------------------------ 30 6 * * * . ~/.bashrc; tahoe-repair-all ------------------------------------------------------------------------ This is the email it sent me this morning: ------------------------------------------------------------------------ From: kpreid at 216-171-189-244.northland.net Subject: Cron . ~/.bashrc; tahoe-repair-all Date: October 17, 2009 6:30:06 EDT To: kpreid at 216-171-189-244.northland.net *** family done: 1 objects checked pre-repair: 1 healthy, 0 unhealthy 0 repairs attempted, 0 successful, 0 failed post-repair: 1 healthy, 0 unhealthy *** publish done: 5 objects checked pre-repair: 5 healthy, 0 unhealthy 0 repairs attempted, 0 successful, 0 failed post-repair: 5 healthy, 0 unhealthy *** tahoe-illustration done: 5 objects checked pre-repair: 5 healthy, 0 unhealthy 0 repairs attempted, 0 successful, 0 failed post-repair: 5 healthy, 0 unhealthy ------------------------------------------------------------------------ -- Kevin Reid From warner at lothar.com Mon Oct 19 12:34:03 2009 From: warner at lothar.com (Brian Warner) Date: Mon, 19 Oct 2009 12:34:03 -0700 Subject: [tahoe-dev] detecting weak uploads and share rebalancing Re: Share rebalancing In-Reply-To: References: <200910100615.07599.shawn@willden.org> <200910101601.09627.shawn@willden.org> <2D46A0E6-1CAF-4B34-9971-4F78AA90F7B2@zooko.com> <200910101941.09308.shawn@willden.org> Message-ID: <4ADCBF2B.8030101@lothar.com> Zooko O'Whielacronx wrote: > Ah yes, this is an example of why I wish for #302. If Tahoe-LAFS just > used the standard Chord ring that all modern distributed data systems > use, then you could figure out which shares go to which servers by > visually inspecting the storage indexes and the server ids. .. and we'd suffer from the "lumpy distribution" problems that are discussed in ticket #302, where servers get unequal load depending upon where they sit in the ring, and where servers who become full can dump inordinate amounts of traffic on the poor node just clockwise from them. And, an attacker who took out several neighboring-in-id-space servers would kill or seriously damage several files (if they took out N-k+1 consecutive servers in a non-trivially utilized grid, they'd be guaranteed to kill some files). In the permuted-ring design, they could only easily target one file at a time, and taking out N-k+1 consecutive servers would have no higher chance of completely killing a file than taking out N-k+1 randomly-selected servers. There'd be no correlation between share placement of independent files. (This property is closely related to the non-lumpy-distribution issue). I still believe that permuted-ring gives us better overall behavior. I'm willing to be proved wrong, though :). cheers, -Brian From Chris.Vanderlinden at ICTGROUP.COM Mon Oct 19 13:09:50 2009 From: Chris.Vanderlinden at ICTGROUP.COM (Vanderlinden, Chris) Date: Mon, 19 Oct 2009 16:09:50 -0400 Subject: [tahoe-dev] Python beginner trying to build for development testing Message-ID: <0E9188CF98399D4BA8F27FCE13364F0703762CE3@lngprdexch2.ictgroup.com> Hey everyone. Love the application and the fact that it's based on python as well (have been working with django a bit to get my python hands wet again). Anyway, let me describe my environment and goals: As a call center admin, I have quite a large amount of training and production PCs that tahoe could be installed on for file repositories, backups, etc. The "problem" is that they are all windows boxes, and I am somewhat new to the whole setup of tahoe in python. I have a few questions that I couldn't find an answer to reading the website documents: (windows XP platform) 1) Application is developed in Python, so assuming I can build it on a development box (tahoe source + all needed dependencies for python) it should build fine and run fine? 2) assuming the first assumption above is correct, that would mean it would run on any XP box that had python + dependencies installed? 3) py2exe possible for this once I can get it to build on a windows box? 4) Maybe this should have been the first question, but what is the difference between the windows client and building the client from source on a windows box? Anyone out there have a complete list of all the dependencies needed in python to get this to compile? I gave it a go last week in a VM, but it kept tossing errors for me. (or any tips on solving these dependency of a dependency errors?) Thanks a lot, Chris From secorp at allmydata.com Mon Oct 19 13:19:02 2009 From: secorp at allmydata.com (Peter Secor) Date: Mon, 19 Oct 2009 13:19:02 -0700 Subject: [tahoe-dev] Python beginner trying to build for development testing In-Reply-To: <0E9188CF98399D4BA8F27FCE13364F0703762CE3@lngprdexch2.ictgroup.com> References: <0E9188CF98399D4BA8F27FCE13364F0703762CE3@lngprdexch2.ictgroup.com> Message-ID: <4ADCC9B6.1090206@allmydata.com> Hi Chris, The project now has source code available for a full Windows build, but the documentation is not really complete. http://allmydata.org/trac/tahoe-w32-client/wiki Feel free to browse and see how we built an XP/Vista installer and also used py2exe. Peter Vanderlinden, Chris wrote: > Hey everyone. > > Love the application and the fact that it's based on python as well > (have been working with django a bit to get my python hands wet again). > > Anyway, let me describe my environment and goals: > > As a call center admin, I have quite a large amount of training and > production PCs that tahoe could be installed on for file repositories, > backups, etc. > > The "problem" is that they are all windows boxes, and I am somewhat new > to the whole setup of tahoe in python. > > > I have a few questions that I couldn't find an answer to reading the > website documents: > (windows XP platform) > > 1) Application is developed in Python, so assuming I can build it on a > development box (tahoe source + all needed dependencies for python) it > should build fine and run fine? > > 2) assuming the first assumption above is correct, that would mean it > would run on any XP box that had python + dependencies installed? > > 3) py2exe possible for this once I can get it to build on a windows box? > > 4) Maybe this should have been the first question, but what is the > difference between the windows client and building the client from > source on a windows box? > > > Anyone out there have a complete list of all the dependencies needed in > python to get this to compile? I gave it a go last week in a VM, but it > kept tossing errors for me. > (or any tips on solving these dependency of a dependency errors?) > > > Thanks a lot, > > > Chris > > > > _______________________________________________ > tahoe-dev mailing list > tahoe-dev at allmydata.org > http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev From Chris.Vanderlinden at ICTGROUP.COM Mon Oct 19 13:56:12 2009 From: Chris.Vanderlinden at ICTGROUP.COM (Vanderlinden, Chris) Date: Mon, 19 Oct 2009 16:56:12 -0400 Subject: [tahoe-dev] Python beginner trying to build for developmenttesting In-Reply-To: <4ADCC9B6.1090206@allmydata.com> References: <0E9188CF98399D4BA8F27FCE13364F0703762CE3@lngprdexch2.ictgroup.com> <4ADCC9B6.1090206@allmydata.com> Message-ID: <0E9188CF98399D4BA8F27FCE13364F0703762D01@lngprdexch2.ictgroup.com> Thanks Peter, Now correct me if I am wrong, this would mean that the source at: http://allmydata.org/source/tahoe/releases/allmydata-tahoe-1.5.0.zip is really not geared for a full windows build? Also, this windows build is pretty much identical in usage as per the documentation for the above linked source? (in regards to setting up nodes, configuring their type, creating a introducer node, etc) Thanks, Chris -----Original Message----- From: tahoe-dev-bounces at allmydata.org [mailto:tahoe-dev-bounces at allmydata.org] On Behalf Of Peter Secor Sent: Monday, October 19, 2009 4:19 PM To: tahoe-dev at allmydata.org Subject: Re: [tahoe-dev] Python beginner trying to build for developmenttesting Hi Chris, The project now has source code available for a full Windows build, but the documentation is not really complete. http://allmydata.org/trac/tahoe-w32-client/wiki Feel free to browse and see how we built an XP/Vista installer and also used py2exe. Peter Vanderlinden, Chris wrote: > Hey everyone. > > Love the application and the fact that it's based on python as well > (have been working with django a bit to get my python hands wet again). > > Anyway, let me describe my environment and goals: > > As a call center admin, I have quite a large amount of training and > production PCs that tahoe could be installed on for file repositories, > backups, etc. > > The "problem" is that they are all windows boxes, and I am somewhat new > to the whole setup of tahoe in python. > > > I have a few questions that I couldn't find an answer to reading the > website documents: > (windows XP platform) > > 1) Application is developed in Python, so assuming I can build it on a > development box (tahoe source + all needed dependencies for python) it > should build fine and run fine? > > 2) assuming the first assumption above is correct, that would mean it > would run on any XP box that had python + dependencies installed? > > 3) py2exe possible for this once I can get it to build on a windows box? > > 4) Maybe this should have been the first question, but what is the > difference between the windows client and building the client from > source on a windows box? > > > Anyone out there have a complete list of all the dependencies needed in > python to get this to compile? I gave it a go last week in a VM, but it > kept tossing errors for me. > (or any tips on solving these dependency of a dependency errors?) > > > Thanks a lot, > > > Chris > > > > _______________________________________________ > tahoe-dev mailing list > tahoe-dev at allmydata.org > http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev _______________________________________________ tahoe-dev mailing list tahoe-dev at allmydata.org http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev From zooko at zooko.com Mon Oct 19 14:36:44 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Mon, 19 Oct 2009 15:36:44 -0600 Subject: [tahoe-dev] detecting weak uploads and share rebalancing Re: Share rebalancing In-Reply-To: <4ADCBF2B.8030101@lothar.com> References: <200910100615.07599.shawn@willden.org> <200910101601.09627.shawn@willden.org> <2D46A0E6-1CAF-4B34-9971-4F78AA90F7B2@zooko.com> <200910101941.09308.shawn@willden.org> <4ADCBF2B.8030101@lothar.com> Message-ID: <4F2D1EB6-DFEF-49DA-96DD-117CA7732777@zooko.com> N.B. there are four different arguments about issue #302 in this letter; don't mix them up. On Monday,2009-10-19, at 13:34 , Brian Warner wrote: > .. and we'd suffer from the "lumpy distribution" problems that are > discussed in ticket #302, where servers get unequal load depending > upon where they sit in the ring, and where servers who become full > can dump inordinate amounts of traffic on the poor node just > clockwise from them. ... [reordering quotes from your message] > I still believe that permuted-ring gives us better overall > behavior. I'm willing to be proved wrong, though :). 1. I wrote a simulation which convinced me that this is wrong -- that both share placement algorithms have an indistinguishable (and highly varying) pattern of servers filling up. However, the results that I posted to tahoe-dev were confusing and hard to follow, and you seem to have ignored them. I see that I didn't link them in from #302 either. I should go find that letter in tahoe-dev archives and link to it from #302. Here it is: http://allmydata.org/pipermail/ tahoe-dev/2008-July/000676.html . > And, an attacker who took out several neighboring-in-id-space > servers would kill or seriously damage several files (if they took > out N-k+1 consecutive servers in a non-trivially utilized grid, > they'd be guaranteed to kill some files). [snip more interesting consequences of placement strategy on an attacker who attacks many servers] 2. Neat! I hadn't thought of this malicious case before. Perhaps you could add a link from #302 to your letter about the malicious case. 3. We were already familiar with designs like Chord (and Chord File System) and Kademlia when you (Brian) came up with the "permute per fileid" trick. It should be seen as an (arguable) improvement on those designs. However, Chord and Kademlia have been deployed with success, sometimes on a massive scale -- e.g. Cassandra DB [1] and Vuze [2] -- where load-balancing is also an issue. This suggests that either this phenomenon isn't a problem in many situations in practice (which would be consistent with my simulation -- argument 1) or that the designers of Cassandra DB and Vuze ought to think about adopting the permute-per-fileid trick (or both). In fact, Cassandra's unique appeal among "post-relational" (a.k.a. "nosql") databases is that it supports range queries, and the way it does so relies upon the "natural" chord ordering. If you're familiar with Chord, you can think of Cassandra as being a lot like Chord plus the added feature of *not* running the keys through a secure hash to load-balance them. This is an "improvement" on Chord in the opposite direction of our improvement! :-) It makes the load-balancing properties much *worse* than the standard Chord load-balancing. Assuming anyone actually uses Cassandra in this mode, then this demonstrates that the sort of "balancing tools" discussed in e.g. [3] are usable to some people. 4. This thread started because Shawn Willden needed to do some mucking about with his shares, and the permute-per-fileid feature makes it harder for him to muck his shares. This is a real live example of my argument (in e.g. http://allmydata.org/pipermail/tahoe- dev/2008-July/000672.html ) that the simpler placement strategy can help people administer their Tahoe-LAFS deployments. I need to link to this thread from #302 and claim that this is an example. Of all these four arguments, I think argument 4 is the most important. I think the next steps here are to document the arguments better on ticket #302 and also to create a new separate ticket which is all about making per-file-permutation optional (leaving #302 as the repository of arguments about which is better in what situations, whether the default should be changed, etc.). Regards, Zooko tickets mentioned in this email: http://allmydata.org/trac/tahoe/ticket/302 # stop permuting peerlist, use SI as offset into ring instead? [1] http://wiki.apache.org/cassandra/PoweredBy # says Cassandra is used for inbox search at Facebook, which is up to 40 TB of data across 120 machines in two separate data centers [2] http://vanish.cs.washington.edu/pubs/usenixsec09-geambasu.pdf # says that Vuze has a million nodes at a time, spanning the globe [3] http://allmydata.org/trac/tahoe/ticket/302#comment:7 From zooko at zooko.com Mon Oct 19 14:45:55 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Mon, 19 Oct 2009 15:45:55 -0600 Subject: [tahoe-dev] Python beginner trying to build for developmenttesting In-Reply-To: <0E9188CF98399D4BA8F27FCE13364F0703762D01@lngprdexch2.ictgroup.com> References: <0E9188CF98399D4BA8F27FCE13364F0703762CE3@lngprdexch2.ictgroup.com> <4ADCC9B6.1090206@allmydata.com> <0E9188CF98399D4BA8F27FCE13364F0703762D01@lngprdexch2.ictgroup.com> Message-ID: <0F63F17A-DBCF-48FE-8FA6-7579B744C076@zooko.com> On Monday,2009-10-19, at 14:56 , Vanderlinden, Chris wrote: > Now correct me if I am wrong, this would mean that the source at: > http://allmydata.org/source/tahoe/releases/allmydata- > tahoe-1.5.0.zip is really not geared for a full windows build? That software is Tahoe-LAFS, v1.5.0. It works fine on Windows (if you can get it built). It behaves the same way as it does on all other platforms: you can create a gateway node using the "tahoe create-client" command, use the command-line tools such as "tahoe cp" and "tahoe backup", use the WUI, etc. To do so, start here: http:// allmydata.org/source/tahoe/trunk/docs/install.html (and mentally prepare yourself to help us out with tickets like #756 and #781). The other thing that Peter mentioned is "Tahoe-W32-Client": http:// allmydata.org/trac/tahoe-w32-client/wiki Tahoe-W32-Client is an application that requires Tahoe-LAFS, and adds to it a Virtual Drive which is integrated into the Windows filesystem, so that you can drag and drop files into and out of Tahoe- LAFS using your Windows file explorer or your cmd.exe commands such as "cp" and "move". Tahoe-W32-Client also comes with a GUI installer so that you can have the pleasure of looking at pretty colored windows and buttons while installing. These facts ought to be wiki-ed up... Maybe here: http:// allmydata.org/trac/tahoe-w32-client/wiki and http://allmydata.org/ trac/tahoe/wiki/RelatedProjects Regards, Zooko http://allmydata.org/trac/tahoe/ticket/756 # if pywin32 has been manually installed, setuptools still doesn't detect it http://allmydata.org/trac/tahoe/ticket/781 # wanted: Windows Packaging From Chris.Vanderlinden at ICTGROUP.COM Mon Oct 19 14:56:34 2009 From: Chris.Vanderlinden at ICTGROUP.COM (Vanderlinden, Chris) Date: Mon, 19 Oct 2009 17:56:34 -0400 Subject: [tahoe-dev] Python beginner trying to build fordevelopmenttesting In-Reply-To: <0F63F17A-DBCF-48FE-8FA6-7579B744C076@zooko.com> References: <0E9188CF98399D4BA8F27FCE13364F0703762CE3@lngprdexch2.ictgroup.com><4ADCC9B6.1090206@allmydata.com><0E9188CF98399D4BA8F27FCE13364F0703762D01@lngprdexch2.ictgroup.com> <0F63F17A-DBCF-48FE-8FA6-7579B744C076@zooko.com> Message-ID: <0E9188CF98399D4BA8F27FCE13364F0703762D16@lngprdexch2.ictgroup.com> Thank you for clearing that up, makes a lot more sense now. So the goal on my end seems to be as such: Install Python Install all dependencies for tahoe Install / build tahoe-lafs Deploy tahoe-lafs introducer + server nodes Build tahoe-w32-client (if so desired) Test! (documenting along the way of course) Thanks again, gives me some things to do during any down time tomorrow. Chris -----Original Message----- From: tahoe-dev-bounces at allmydata.org [mailto:tahoe-dev-bounces at allmydata.org] On Behalf Of Zooko Wilcox-O'Hearn Sent: Monday, October 19, 2009 5:46 PM To: tahoe-dev at allmydata.org Subject: Re: [tahoe-dev] Python beginner trying to build fordevelopmenttesting On Monday,2009-10-19, at 14:56 , Vanderlinden, Chris wrote: > Now correct me if I am wrong, this would mean that the source at: > http://allmydata.org/source/tahoe/releases/allmydata- > tahoe-1.5.0.zip is really not geared for a full windows build? That software is Tahoe-LAFS, v1.5.0. It works fine on Windows (if you can get it built). It behaves the same way as it does on all other platforms: you can create a gateway node using the "tahoe create-client" command, use the command-line tools such as "tahoe cp" and "tahoe backup", use the WUI, etc. To do so, start here: http:// allmydata.org/source/tahoe/trunk/docs/install.html (and mentally prepare yourself to help us out with tickets like #756 and #781). The other thing that Peter mentioned is "Tahoe-W32-Client": http:// allmydata.org/trac/tahoe-w32-client/wiki Tahoe-W32-Client is an application that requires Tahoe-LAFS, and adds to it a Virtual Drive which is integrated into the Windows filesystem, so that you can drag and drop files into and out of Tahoe- LAFS using your Windows file explorer or your cmd.exe commands such as "cp" and "move". Tahoe-W32-Client also comes with a GUI installer so that you can have the pleasure of looking at pretty colored windows and buttons while installing. These facts ought to be wiki-ed up... Maybe here: http:// allmydata.org/trac/tahoe-w32-client/wiki and http://allmydata.org/ trac/tahoe/wiki/RelatedProjects Regards, Zooko http://allmydata.org/trac/tahoe/ticket/756 # if pywin32 has been manually installed, setuptools still doesn't detect it http://allmydata.org/trac/tahoe/ticket/781 # wanted: Windows Packaging _______________________________________________ tahoe-dev mailing list tahoe-dev at allmydata.org http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev From zooko at zooko.com Mon Oct 19 16:08:39 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Mon, 19 Oct 2009 17:08:39 -0600 Subject: [tahoe-dev] the first three FAQs Message-ID: I just created the FAQ page on the wiki, which up until now was an empty page that said "Click here to create this page". If you can think of any other Frequently Asked Questions, please add them to this page: http://allmydata.org/trac/tahoe/wiki/FAQ Q: What is special about Tahoe-LAFS? Why should anyone care about it instead of other distributed storage systems? A: Tahoe-LAFS is the first Free Software/Open Source storage technology which offers provider-independent security. Provider- independent security means that the integrity and confidentiality of your files is guaranteed by mathematics computed on the client side, and is independent of the servers, which may be owned and operated by someone else. To learn more, read our one-page explanation. Q: Oh, so I should be interested in Tahoe-LAFS only if I'm working on some sort of high-security project? A: No, no! Unlike most systems Tahoe-LAFS doesn't require you to manage an added layer of hassle in order to gain security. Instead the security properties are baked into the system in such a way that you usually don't even notice that they are there. Even if you don't care about protecting your files from someone spying on them or corrupting them, you might still like to use Tahoe-LAFS because it is an extremely robust and efficient "cloud storage" system in which your files are erasure-coded and distributed across separate servers. Q: "Erasure-coding"? What's that? A: You know how with RAID-5 you can lose any one drive and still recover? And there is also something called RAID-6 where you can lose any two drives and still recover. Erasure coding is the generalization of this pattern: you get to configure it for how many drives you could lose and still recover. Tahoe-LAFS is typically configured to upload each file to 10 different drives, where you can lose any 7 of them and still recover the entire file. This gives radically better reliability than typical RAID setups, at a cost of only 3.3 times the storage space that a single copy takes. From zooko at zooko.com Mon Oct 19 16:10:12 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Mon, 19 Oct 2009 17:10:12 -0600 Subject: [tahoe-dev] Python beginner trying to build fordevelopmenttesting In-Reply-To: <0E9188CF98399D4BA8F27FCE13364F0703762D16@lngprdexch2.ictgroup.com> References: <0E9188CF98399D4BA8F27FCE13364F0703762CE3@lngprdexch2.ictgroup.com><4ADCC9B6.1090206@allmydata.com><0E9188CF98399D4BA8F27FCE13364F0703762D01@lngprdexch2.ictgroup.com> <0F63F17A-DBCF-48FE-8FA6-7579B744C076@zooko.com> <0E9188CF98399D4BA8F27FCE13364F0703762D16@lngprdexch2.ictgroup.com> Message-ID: <50440B7F-CE8C-4654-80F0-E2B309282DD2@zooko.com> On Monday,2009-10-19, at 15:56 , Vanderlinden, Chris wrote: > Thank you for clearing that up, makes a lot more sense now. > > So the goal on my end seems to be as such: > > Install Python > Install all dependencies for tahoe > Install / build tahoe-lafs > > Deploy tahoe-lafs introducer + server nodes > > Build tahoe-w32-client (if so desired) > Test! > > (documenting along the way of course) Awesome! I really appreciate your offer to post documentation about your progress. Updates and edits to these docs would be good: http://allmydata.org/trac/tahoe/wiki/InstallDetails http://allmydata.org/trac/tahoe/wiki/InstallOnWindows Building Tahoe-LAFS for Windows is definitely doable -- many people have done it, and we have a buildbot that runs the unit tests on Windows on every darcs commit (http://allmydata.org/buildbot/ ), but it isn't automatic right now, mostly because its dependencies pycryptopp and zfec seem not to be found by the build tool even though they are available from here: http://allmydata.org/source/tahoe/deps/tahoe-dep-eggs/ Regards, Zooko From shawn at willden.org Mon Oct 19 22:48:15 2009 From: shawn at willden.org (Shawn Willden) Date: Mon, 19 Oct 2009 23:48:15 -0600 Subject: [tahoe-dev] Python beginner trying to build for development testing In-Reply-To: <4ADCC9B6.1090206@allmydata.com> References: <0E9188CF98399D4BA8F27FCE13364F0703762CE3@lngprdexch2.ictgroup.com> <4ADCC9B6.1090206@allmydata.com> Message-ID: <200910192348.15313.shawn@willden.org> On Monday 19 October 2009 02:19:02 pm Peter Secor wrote: > Hi Chris, > > The project now has source code available for a full Windows build, > but the documentation is not really complete. FYI, I fiddled with this a while ago, and could not get it to build because of dependencies on MSVC++ components that are not included in Microsoft's free version. Shawn. From shawn at willden.org Mon Oct 19 22:49:52 2009 From: shawn at willden.org (Shawn Willden) Date: Mon, 19 Oct 2009 23:49:52 -0600 Subject: [tahoe-dev] Python beginner trying to build fordevelopmenttesting In-Reply-To: <0E9188CF98399D4BA8F27FCE13364F0703762D16@lngprdexch2.ictgroup.com> References: <0E9188CF98399D4BA8F27FCE13364F0703762CE3@lngprdexch2.ictgroup.com> <0F63F17A-DBCF-48FE-8FA6-7579B744C076@zooko.com> <0E9188CF98399D4BA8F27FCE13364F0703762D16@lngprdexch2.ictgroup.com> Message-ID: <200910192349.52255.shawn@willden.org> On Monday 19 October 2009 03:56:34 pm Vanderlinden, Chris wrote: > Thank you for clearing that up, makes a lot more sense now. > > So the goal on my end seems to be as such: > > Install Python > Install all dependencies for tahoe > Install / build tahoe-lafs One other thing you'll need to do to make it really usable on Windows is to figure out how to get it to run as a Windows service. That's been on my to-do list for a while, but it keeps getting pushed down. It would be great if you figured it out, then I wouldn't have to :-) Shawn. From shawn at willden.org Mon Oct 19 22:52:38 2009 From: shawn at willden.org (Shawn Willden) Date: Mon, 19 Oct 2009 23:52:38 -0600 Subject: [tahoe-dev] detecting weak uploads and share rebalancing Re: Share rebalancing In-Reply-To: <4F2D1EB6-DFEF-49DA-96DD-117CA7732777@zooko.com> References: <200910100615.07599.shawn@willden.org> <4ADCBF2B.8030101@lothar.com> <4F2D1EB6-DFEF-49DA-96DD-117CA7732777@zooko.com> Message-ID: <200910192352.39099.shawn@willden.org> On Monday 19 October 2009 03:36:44 pm Zooko Wilcox-O'Hearn wrote: > 4. This thread started because Shawn Willden needed to do some > mucking about with his shares, and the permute-per-fileid feature > makes it harder for him to muck his shares. This is a real live > example of my argument (in e.g. http://allmydata.org/pipermail/tahoe- > dev/2008-July/000672.html ) that the simpler placement strategy can > help people administer their Tahoe-LAFS deployments. I need to link > to this thread from #302 and claim that this is an example. I think a tool for easily discovering the permuted list for a given file and the current grid would solve my issue. Shawn. From jamesd at echeque.com Tue Oct 20 03:44:55 2009 From: jamesd at echeque.com (James A. Donald) Date: Tue, 20 Oct 2009 20:44:55 +1000 Subject: [tahoe-dev] Python beginner trying to build for development testing In-Reply-To: <0E9188CF98399D4BA8F27FCE13364F0703762CE3@lngprdexch2.ictgroup.com> References: <0E9188CF98399D4BA8F27FCE13364F0703762CE3@lngprdexch2.ictgroup.com> Message-ID: <4ADD94A7.4060000@echeque.com> Vanderlinden, Chris wrote: > Hey everyone. > > Love the application and the fact that it's based on python as well > (have been working with django a bit to get my python hands wet again). > > Anyway, let me describe my environment and goals: > > As a call center admin, I have quite a large amount of training and > production PCs that tahoe could be installed on for file repositories, > backups, etc. > > The "problem" is that they are all windows boxes, and I am somewhat new > to the whole setup of tahoe in python. Python is a wonderful language, but python based products tend to have horrid windows installs, varying from painfully bad to basically disfunctional. In principle it does not need to be that way, but in practice is usually is. From zooko at zooko.com Tue Oct 20 13:12:35 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Tue, 20 Oct 2009 14:12:35 -0600 Subject: [tahoe-dev] (H)KDF questions Message-ID: <7D6BC1C3-F137-483F-B28A-CE995955504F@zooko.com> I have a few questions about HKDF and how it could be used for our purposes. Fortunately the authors of it are trying to get it standardized and they asked for feedback on a mailing list that I'm subscribed to, so I took the opportunity to ask them: http://www.ietf.org/mail-archive/web/cfrg/current/msg02651.html http://www.ietf.org/mail-archive/web/cfrg/current/msg02652.html Many of these questions would apply to any other KDF that we would use as well. Regards, Zooko From zooko at zooko.com Tue Oct 20 22:27:21 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Tue, 20 Oct 2009 23:27:21 -0600 Subject: [tahoe-dev] notes about DIR2:CHK deep immutability (maybe Tahoe-LAFS v1.6) Message-ID: <5F432396-E1A0-4488-84A9-D272243EA1DD@zooko.com> Folks: Brian has been working on #607. Very cool! #607 plus #778 plus other smaller tickets would justify putting out a new release and calling it "Tahoe-LAFS v1.6". When you create a DIR2:IMM, giving it a set of (childname, childcap) tuples, it should raise an exception if any childcap is not immutable. The immutable childcaps are "CHK" (perhaps renamed to "IMM"), LIT, and DIR2:CHK (or "DIR2:IMM"). When you unpack a DIR2:IMM, if you find any non-immutable children in there (i.e. because someone else's Tahoe-LAFS gateway is altered or buggy so that it did not raise the exception described above), then you treat that child as non-existent and log a warning. There could optionally be a command to deep-walk a directory graph and produce an immutable snapshot of everything. This could be an expensive operation depending on how deep the graph is, but large files are typically already immutable, so snapshotting them is free. Anyway, if you want to put something into an immutable directory and you get rejected because the thing isn't immutable, then this command would be useful. Regards, Zooko http://allmydata.org/trac/tahoe/ticket/607 # DIR2:IMM http://allmydata.org/trac/tahoe/ticket/778 # "shares of happiness" is the wrong measure; "servers of happiness" is better From shawn at willden.org Wed Oct 21 05:28:05 2009 From: shawn at willden.org (Shawn Willden) Date: Wed, 21 Oct 2009 06:28:05 -0600 Subject: [tahoe-dev] notes about DIR2:CHK deep immutability (maybe Tahoe-LAFS v1.6) In-Reply-To: <5F432396-E1A0-4488-84A9-D272243EA1DD@zooko.com> References: <5F432396-E1A0-4488-84A9-D272243EA1DD@zooko.com> Message-ID: <200910210628.06241.shawn@willden.org> On Tuesday 20 October 2009 11:27:21 pm Zooko Wilcox-O'Hearn wrote: > When you create a DIR2:IMM, giving it a set of (childname, childcap) > tuples, it should raise an exception if any childcap is not > immutable. Cool! One of the things I've been wishing for is a way to create a dirnode already populated with a full set of children, rather than having to add them one at a time. In my case, I'd want the directory to be immutable, so this is perfect. Shawn. From kpreid at mac.com Wed Oct 21 07:17:45 2009 From: kpreid at mac.com (Kevin Reid) Date: Wed, 21 Oct 2009 10:17:45 -0400 Subject: [tahoe-dev] notes about DIR2:CHK deep immutability (maybe Tahoe-LAFS v1.6) In-Reply-To: <5F432396-E1A0-4488-84A9-D272243EA1DD@zooko.com> References: <5F432396-E1A0-4488-84A9-D272243EA1DD@zooko.com> Message-ID: On Oct 21, 2009, at 1:27, Zooko Wilcox-O'Hearn wrote: > When you create a DIR2:IMM, giving it a set of (childname, childcap) > tuples, it should raise an exception if any childcap is not > immutable. The immutable childcaps are "CHK" (perhaps renamed to > "IMM"), LIT, and DIR2:CHK (or "DIR2:IMM"). This seems overly restrictive. For example, I can't create a dircap which demonstrably permanently refers to a given file, and also to a mutable directory. I don't have any use-cases offhand, though. Even if that isn't relevant, IMO there should be at least a utility which is "give me either a deep-immutable directory if possible, or a mutable directory to which the write key has been discarded" so that one can bundle a given set of caps without worrying about what their types are. -- Kevin Reid From Chris.Vanderlinden at ICTGROUP.COM Wed Oct 21 10:30:39 2009 From: Chris.Vanderlinden at ICTGROUP.COM (Vanderlinden, Chris) Date: Wed, 21 Oct 2009 13:30:39 -0400 Subject: [tahoe-dev] Must be overlooking something: unable to find vcvarsall.bat Message-ID: <0E9188CF98399D4BA8F27FCE13364F07037E123F@lngprdexch2.ictgroup.com> Steps taken: Steps so far I took were the following: Installed Python 2.6.3 Installed Pywin32-214 Installed MinGW5.1.4(with g++) Grabbed the Tahoe-1.5.0.tar.gz file Grabbed the tahoe.deps.tar.gz file Extracted both to c:\allmydata-tahoe-1.5.0 Ran: python setup.py build_tahoe (path has both C:\MinGW\bin and c:\Python26 setup) Any ideas on what I must have overlooked? Code: Microsoft Windows XP [Version 5.1.2600] (C) Copyright 1985-2001 Microsoft Corp. C:\allmydata-tahoe-1.5.0>python setup.py build_tahoe Not found: ../tahoe-deps Installed c:\allmydata-tahoe-1.5.0\setuptools_darcs-1.2.8-py2.6.egg Searching for setuptools-trial>=0.5 Best match: setuptools-trial 0.5.3 Processing setuptools_trial-0.5.3.tar.gz Running setuptools_trial-0.5.3\setup.py -q bdist_egg --dist-dir c:\docume~1\admi ni~1\locals~1\temp\easy_install-zv2vwe\setuptools_trial-0.5.3\egg-dist-t mp-ig3oj k Installed c:\allmydata-tahoe-1.5.0\setuptools_trial-0.5.3-py2.6.egg Searching for darcsver>=1.2.0 Best match: darcsver 1.3.1 Processing darcsver-1.3.1.tar.gz Running darcsver-1.3.1\setup.py -q bdist_egg --dist-dir c:\docume~1\admini~1\loc als~1\temp\easy_install-ep1oza\darcsver-1.3.1\egg-dist-tmp-rvu7js Installed c:\allmydata-tahoe-1.5.0\darcsver-1.3.1-py2.6.egg Searching for Twisted>=2.4.0 Best match: Twisted 8.2.0 Processing twisted-8.2.0.tar.bz2 Running Twisted-8.2.0\setup.py -q bdist_egg --dist-dir c:\docume~1\admini~1\loca ls~1\temp\easy_install-efv4jf\Twisted-8.2.0\egg-dist-tmp-ywqqyp Traceback (most recent call last): File "setup.py", line 363, in zip_safe=False, # We prefer unzipped for easier access. File "C:\Python26\lib\distutils\core.py", line 113, in setup _setup_distribution = dist = klass(attrs) File "C:\allmydata-tahoe-1.5.0\setuptools-0.6c12dev.egg\setuptools\dist.py", l ine 219, in __init__ File "C:\allmydata-tahoe-1.5.0\setuptools-0.6c12dev.egg\setuptools\dist.py", l ine 243, in fetch_build_eggs File "C:\allmydata-tahoe-1.5.0\setuptools-0.6c12dev.egg\pkg_resources.py", lin e 522, in resolve File "C:\allmydata-tahoe-1.5.0\setuptools-0.6c12dev.egg\pkg_resources.py", lin e 758, in best_match File "C:\allmydata-tahoe-1.5.0\setuptools-0.6c12dev.egg\pkg_resources.py", lin e 770, in obtain File "C:\allmydata-tahoe-1.5.0\setuptools-0.6c12dev.egg\setuptools\dist.py", l ine 286, in fetch_build_egg File "C:\allmydata-tahoe-1.5.0\setuptools-0.6c12dev.egg\setuptools\command\ea s y_install.py", line 452, in easy_install File "C:\allmydata-tahoe-1.5.0\setuptools-0.6c12dev.egg\setuptools\command\ea s y_install.py", line 482, in install_item File "C:\allmydata-tahoe-1.5.0\setuptools-0.6c12dev.egg\setuptools\command\ea s y_install.py", line 661, in install_eggs File "C:\allmydata-tahoe-1.5.0\setuptools-0.6c12dev.egg\setuptools\command\ea s y_install.py", line 936, in build_and_install File "C:\allmydata-tahoe-1.5.0\setuptools-0.6c12dev.egg\setuptools\command\ea s y_install.py", line 927, in run_setup distutils.errors.DistutilsError: Setup script exited with error: Unable to find vcvarsall.bat I didn't bother copying the part where I misspelled compiler in the distutils config file :) From zooko at zooko.com Wed Oct 21 10:56:27 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Wed, 21 Oct 2009 11:56:27 -0600 Subject: [tahoe-dev] Must be overlooking something: unable to find vcvarsall.bat In-Reply-To: <0E9188CF98399D4BA8F27FCE13364F07037E123F@lngprdexch2.ictgroup.com> References: <0E9188CF98399D4BA8F27FCE13364F07037E123F@lngprdexch2.ictgroup.com> Message-ID: vcvarsall.bat is a file that comes with a Microsoft compiler. Since you're using mingw you don't have it. The fact that it is looking for it suggests that Python thinks it is supposed to use the Microsoft compiler. Did you see the part of http://allmydata.org/ trac/tahoe/wiki/InstallDetails which describes how to create a distutils config file? Oh, also "python setup.py build_tahoe" is an obsolete command name. The docs on InstallDetails should be updated to call that command "python setup.py build". Also they should explain that "can't find vcvarsall.bat" means that you need a compiler. Also they should be improved in lots of other ways that you probably already have in mind. :-) Regards, Zooko From warner at lothar.com Wed Oct 21 11:12:34 2009 From: warner at lothar.com (Brian Warner) Date: Wed, 21 Oct 2009 11:12:34 -0700 Subject: [tahoe-dev] notes about DIR2:CHK deep immutability (maybe Tahoe-LAFS v1.6) In-Reply-To: <200910210628.06241.shawn@willden.org> References: <5F432396-E1A0-4488-84A9-D272243EA1DD@zooko.com> <200910210628.06241.shawn@willden.org> Message-ID: <4ADF4F12.2010009@lothar.com> Shawn Willden wrote: > Cool! One of the things I've been wishing for is a way to create a dirnode > already populated with a full set of children, rather than having to add them > one at a time. I just checked that in a few days ago. Check out webapi.txt .. it's most easily expressed as a request body to PUT /uri?t=mkdir . And a quick test suggests it's 30% faster than mkdir+set_children (it goes from 3 roundtrips to 1). The DIR2:CHK: creation function will be PUT /uri?t=mkdir&immutable=true , I think. cheers, -Brian From Chris.Vanderlinden at ICTGROUP.COM Wed Oct 21 11:16:41 2009 From: Chris.Vanderlinden at ICTGROUP.COM (Vanderlinden, Chris) Date: Wed, 21 Oct 2009 14:16:41 -0400 Subject: [tahoe-dev] Must be overlooking something: unable to findvcvarsall.bat In-Reply-To: References: <0E9188CF98399D4BA8F27FCE13364F07037E123F@lngprdexch2.ictgroup.com> Message-ID: <0E9188CF98399D4BA8F27FCE13364F07037E128F@lngprdexch2.ictgroup.com> I have the MinGW compiler installed (had a previous error where I had the compiler=mingw key in the cfg file misspelled. I will give the newer command a go and see where I can get with it. Chris -----Original Message----- From: tahoe-dev-bounces at allmydata.org [mailto:tahoe-dev-bounces at allmydata.org] On Behalf Of Zooko Wilcox-O'Hearn Sent: Wednesday, October 21, 2009 1:56 PM To: tahoe-dev at allmydata.org Subject: Re: [tahoe-dev] Must be overlooking something: unable to findvcvarsall.bat vcvarsall.bat is a file that comes with a Microsoft compiler. Since you're using mingw you don't have it. The fact that it is looking for it suggests that Python thinks it is supposed to use the Microsoft compiler. Did you see the part of http://allmydata.org/ trac/tahoe/wiki/InstallDetails which describes how to create a distutils config file? Oh, also "python setup.py build_tahoe" is an obsolete command name. The docs on InstallDetails should be updated to call that command "python setup.py build". Also they should explain that "can't find vcvarsall.bat" means that you need a compiler. Also they should be improved in lots of other ways that you probably already have in mind. :-) Regards, Zooko _______________________________________________ tahoe-dev mailing list tahoe-dev at allmydata.org http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev From warner at lothar.com Wed Oct 21 14:12:58 2009 From: warner at lothar.com (Brian Warner) Date: Wed, 21 Oct 2009 14:12:58 -0700 Subject: [tahoe-dev] notes about DIR2:CHK deep immutability (maybe Tahoe-LAFS v1.6) In-Reply-To: <5F432396-E1A0-4488-84A9-D272243EA1DD@zooko.com> References: <5F432396-E1A0-4488-84A9-D272243EA1DD@zooko.com> Message-ID: <4ADF795A.8000501@lothar.com> Zooko Wilcox-O'Hearn wrote: > There could optionally be a command to deep-walk a directory graph > and produce an immutable snapshot of everything. Kevin Reid wrote: > Even if that isn't relevant, IMO there should be at least a utility > which is "give me either a deep-immutable directory if possible, or a > mutable directory to which the write key has been discarded" so that > one can bundle a given set of caps without worrying about what their > types are. I'm thinking of a variant of "cp -r" for both of these.. maybe "cp -r --immutable", which will re-use any immutable objects that it finds (both files and immutable directories). Maybe --readonly would mean the discarded-writecap form. "cp -r --immutable" is thus the "virtual CD" creation command. I like the idea of DeepImmutable directories, specifically so that a tool like this can safely know whether it can re-use the object or not. If deep-immutable directories are common, then this tool could save a lot of time by not descending into any ones that it encounters. My hunch is that there are more use cases for deep-immutable than shallow-immutable, so I'd prefer it to be the default. cheers, -Brian From zooko at zooko.com Wed Oct 21 14:32:52 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Wed, 21 Oct 2009 15:32:52 -0600 Subject: [tahoe-dev] notes about DIR2:CHK deep immutability (maybe Tahoe-LAFS v1.6) In-Reply-To: <4ADF795A.8000501@lothar.com> References: <5F432396-E1A0-4488-84A9-D272243EA1DD@zooko.com> <4ADF795A.8000501@lothar.com> Message-ID: <2F967DCF-EE00-44B1-8CB2-19F556CC41B8@zooko.com> I imagine that deep-immutability could be useful to folks. For example, I could get a read-cap to a deep-immutable dir from someone, inspect it (manually or with a script) to make sure that all the files in it have some property, and then pass that cap on to someone else and know that whenever they use the cap, any files that they get will have that property. Note that this assumes that either the Tahoe-LAFS gateway used by the recipient to whom I give the cap has the "ignore any mutable children" feature, or when I inspect the directory structure I have to make sure there aren't any mutable children in there. It does not assume that the person who gave me the cap in the first place has a Tahoe-LAFS gateway that has the "refuse to insert mutable children" feature. Regards, Zooko From trac at allmydata.org Wed Oct 21 15:19:57 2009 From: trac at allmydata.org (tahoe-lafs) Date: Wed, 21 Oct 2009 22:19:57 -0000 Subject: [tahoe-dev] [tahoe-lafs] #816: don't rely on notifyOnDisconnect() Message-ID: <037.3d3c0b0220a282490c3281c932dbeb98@allmydata.org> #816: don't rely on notifyOnDisconnect() --------------------------+------------------------------------------------- Reporter: zooko | Owner: Type: enhancement | Status: new Priority: minor | Milestone: undecided Component: code-network | Version: 1.5.0 Keywords: | Launchpad_bug: --------------------------+------------------------------------------------- #653 was a long drawn out investigation that concluded that there is probably (but not certainly) a bug in foolscap in which {{{notifyOnDisconnect()}}} doesn't get triggered sometimes when it is supposed to. Fixing (and writing automated tests for) {{{notifyOnDisconnect()}}} is quite tricky. Also, it can never be 100% correct because of the problems of the inherent unreliability of communications and the limitations of the speed of light and so on. My personal prejudice as someone who has long studied secure and fault- tolerant networked applications is that you should really avoid relying on such a service -- a service that attempts to tell you when a remote object has switched from "likely to respond in a timely way to your next request" to "unlikely to respond in a timely way to your next request", and instead design your system so that it works correctly and as efficiently as it can regardless of the pattern of connections-and-disconnections of the underlying comms subsystems. (Hm, I guess this is an instance of the general idiom of "Don't check if it is likely to work and then try and then handle failure, instead just try and then handle failure.") Now, Tahoe-LAFS already does it this way! For the most part. There are a few places where we invoke {{{notifyOnDisconnect()}}}, but removing most of them would not diminish the functionality of Tahoe-LAFS. One thing that ''would'' diminish its functionality is as Brian wrote on #653: ?"" * the welcome-page status display would be unable to show "Connected / Not Connected" status for each known server. Instead, it could say "Last Connection Established At / Not Connected". Basically we'd know when the connection was established, and (with extra code) we could know when we last successfully used the connection. And when we tried to use the connection and found it down, we could mark the connection as down until we'd restablished it. But we wouldn't notice the actual event of connection loss (or the resulting period of not-being-connected) until we actually tried to use it. So we couldn't claim to be "connected", we could merely claim that we *had* connected at some point, and that we haven't noticed becoming disconnected yet (but aren't trying very hard to notice). * the share-allocation algorithm wouldn't learn about disconnected servers until it tried to send a message to them (this would fail quickly, but still not synchronously), but allocates share numbers ahead of time for each batch of requests. This could wind up with shares placed 0,1,3,4,2 instead of 0,1,2,3,4 The first problem would be annoying, so I think we're going to leave tahoe alone for now. I'll add a note to the foolscap docs to warn users about the notifyOnDisconnect bug, and encourage people to not rely upon it in replacement-connection -likely environments. """ Since he wrote that, I realized that it would be cool if the welcome-page had a "ping all servers" button which then changed their statuses to indicate whether they responded to the ping or not (and how long it took). This would, in my opinion, be more reliable and more informative than the current "connected/not-connected" welcome-page. To close this ticket, make sure you have Brian's approval first, then add a "ping all servers" feature to the welcome page, then remove all uses of {{{notifyOnDisconnect()}}} from Tahoe-LAFS. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Wed Oct 21 15:36:32 2009 From: trac at allmydata.org (tahoe-lafs) Date: Wed, 21 Oct 2009 22:36:32 -0000 Subject: [tahoe-dev] [tahoe-lafs] #816: don't rely on notifyOnDisconnect() In-Reply-To: <037.3d3c0b0220a282490c3281c932dbeb98@allmydata.org> References: <037.3d3c0b0220a282490c3281c932dbeb98@allmydata.org> Message-ID: <046.78943d9c34dadf3a36f015c56350a8d1@allmydata.org> #816: don't rely on notifyOnDisconnect() --------------------------+------------------------------------------------- Reporter: zooko | Owner: Type: enhancement | Status: new Priority: minor | Milestone: undecided Component: code-network | Version: 1.5.0 Keywords: | Launchpad_bug: --------------------------+------------------------------------------------- Comment(by warner): I like the ping-all-servers button on the welcome page. I'm happy with not using {{{notifyOnDisconnect()}}} to update the welcome-page information in a timely manner (if people want timely information, they can push the button and reload). It might be nice for the welcome page to show "waiting for ping response.." while a ping is in flight, but on the other hand that might also be uglier and unnecessarily complicated. I haven't yet decided about removing the {{{notifyOnDisconnect()}}} calls which provide share-allocation with more-timely information about how to allocate share numbers. I think I want peer-selection to have reasonably timely information about which servers are likely to be available and which ones are not, to improve the chances that the shares get allocated in order. (in addition to providing useful forensic data later, it also marginally improves performance of download, because the downloader is more likely to get "primary" shares sooner). -- Ticket URL: tahoe-lafs secure decentralized file storage grid From warner at lothar.com Thu Oct 22 08:42:01 2009 From: warner at lothar.com (Brian Warner) Date: Thu, 22 Oct 2009 08:42:01 -0700 Subject: [tahoe-dev] mkdir-with-children API change In-Reply-To: <4ADF4F12.2010009@lothar.com> References: <5F432396-E1A0-4488-84A9-D272243EA1DD@zooko.com> <200910210628.06241.shawn@willden.org> <4ADF4F12.2010009@lothar.com> Message-ID: <4AE07D49.8020501@lothar.com> Oh, before anyone starts to use that t=mkdir plus children= API I added a few days ago.. be aware that I'm going to change the name in the next few days. While discussing it with Zandr last night, we realized that it'd be safer to give it a new name, rather than to add a new argument to the old name. The problem is that an old client node will ignore the children= field, and the webapi user of that node won't have an easy way to find out whether this node is the type that recognizes the new field or not (short of trying it and then listing the newly-created directory to see whether it's empty or if it has the requested children). So instead, I think this will be done with POST /uri?t=mkdir-with-children , an entirely separate verb. t=mkdir will continue to ignore the request body and create empty directories. And when DIR2:CHK is ready, the verb will be t=mkdir-immutable-with-children or so (instead of t=mkdir&immutable=true, which would suffer from the same problem). I think we have one other place where this was an issue, when you upload a new file and can add mutable=true to get a mutable file instead of an immutable one, but I think that this functionality has been around long enough that there's not a serious concern about whether the node will pay attention to the argument or not, plus it's trivial to check the response to see that it's SSK: instead of CHK: or LIT: . So I don't think I'll be changing that one, even if the result is a slightly non-uniform API. cheers, -Brian From trac at allmydata.org Thu Oct 22 09:29:04 2009 From: trac at allmydata.org (tahoe-lafs) Date: Thu, 22 Oct 2009 16:29:04 -0000 Subject: [tahoe-dev] [tahoe-lafs] #817: inconsistent Recent Operations Status - Done/Finished Message-ID: <039.3ea53e47c83e52a6df5f90a1ac1a44aa@allmydata.org> #817: inconsistent Recent Operations Status - Done/Finished -------------------------------+-------------------------------------------- Reporter: terrell | Owner: Type: defect | Status: new Priority: minor | Milestone: undecided Component: code-frontend-web | Version: 1.5.0 Keywords: | Launchpad_bug: -------------------------------+-------------------------------------------- The grid status page shows inconsistent results for the Status column under Recent Operations. See attached window capture of http://testgrid.allmydata.org:3567/status/ below. Most completed operations are marked 'Done'. Some are marked 'Finished'. I would guess that to close this ticket, the status words should be consistent across different actions to remove ambiguity and any suggested semantic difference in status. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From david at thecoffeycan.com Thu Oct 22 13:51:52 2009 From: david at thecoffeycan.com (David Coffey) Date: Thu, 22 Oct 2009 16:51:52 -0400 Subject: [tahoe-dev] Must be overlooking something: unable to find vcvarsall.bat In-Reply-To: <0E9188CF98399D4BA8F27FCE13364F07037E123F@lngprdexch2.ictgroup.com> References: <0E9188CF98399D4BA8F27FCE13364F07037E123F@lngprdexch2.ictgroup.com> Message-ID: <4AE0C5E8.10303@thecoffeycan.com> Vanderlinden, Chris wrote: > Steps taken: > > Steps so far I took were the following: > > Installed Python 2.6.3 > Installed Pywin32-214 > Installed MinGW5.1.4(with g++) > Grabbed the Tahoe-1.5.0.tar.gz file > Grabbed the tahoe.deps.tar.gz file > > Extracted both to c:\allmydata-tahoe-1.5.0 > > Ran: python setup.py build_tahoe > (path has both C:\MinGW\bin and c:\Python26 setup) > > Any ideas on what I must have overlooked? > Try using Python version 2.6.2 I had the same problem, and going to 2.6.2 resolved it. I was also receiving KeyError: '_zope_interface_coptimizations' when I did not use the tahoe.deps.tar.gz file It looks like it has to do with a change in distutils in 2.6.3 that affected setuptools This python bug report has more information http://bugs.python.org/issue7064 I also found some more information in this mailing list thread. http://mail.python.org/pipermail/distutils-sig/2009-October/013534.html David From jamesd at echeque.com Thu Oct 22 21:58:09 2009 From: jamesd at echeque.com (James A. Donald) Date: Fri, 23 Oct 2009 14:58:09 +1000 Subject: [tahoe-dev] notes about DIR2:CHK deep immutability (maybe Tahoe-LAFS v1.6) In-Reply-To: <2F967DCF-EE00-44B1-8CB2-19F556CC41B8@zooko.com> References: <5F432396-E1A0-4488-84A9-D272243EA1DD@zooko.com> <4ADF795A.8000501@lothar.com> <2F967DCF-EE00-44B1-8CB2-19F556CC41B8@zooko.com> Message-ID: <4AE137E1.7080100@echeque.com> Zooko Wilcox-O'Hearn wrote: > I imagine that deep-immutability could be useful to > folks. Deep immutability is not a natural cryptographic property. I don't think we should synthesize unnatural properties unless there is a clear use case, *and* there has been consideration of this use case to ask whether it arises from going about things in the wrong way to start with. If one sets to synthesizing nice, but unnatural properties, one is apt wind up with the Xanadu disease. From david-sarah at jacaranda.org Thu Oct 22 22:24:54 2009 From: david-sarah at jacaranda.org (David-Sarah Hopwood) Date: Fri, 23 Oct 2009 06:24:54 +0100 Subject: [tahoe-dev] notes about DIR2:CHK deep immutability (maybe Tahoe-LAFS v1.6) In-Reply-To: <4AE137E1.7080100@echeque.com> References: <5F432396-E1A0-4488-84A9-D272243EA1DD@zooko.com> <4ADF795A.8000501@lothar.com> <2F967DCF-EE00-44B1-8CB2-19F556CC41B8@zooko.com> <4AE137E1.7080100@echeque.com> Message-ID: <4AE13E26.50209@jacaranda.org> James A. Donald wrote: > Zooko Wilcox-O'Hearn wrote: > > I imagine that deep-immutability could be useful to > > folks. > > Deep immutability is not a natural cryptographic property. Arguably it is. Hash trees are "naturally" deeply immutable. In any case, deep immutability is very much a natural, and more importantly useful, property for capability systems. I don't see why it would be any less so for cryptographic capability systems like Tahoe. -- David-Sarah Hopwood ? http://davidsarah.livejournal.com From jamesd at echeque.com Thu Oct 22 23:36:27 2009 From: jamesd at echeque.com (James A. Donald) Date: Fri, 23 Oct 2009 16:36:27 +1000 Subject: [tahoe-dev] notes about DIR2:CHK deep immutability (maybe Tahoe-LAFS v1.6) In-Reply-To: <4AE137E1.7080100@echeque.com> References: <5F432396-E1A0-4488-84A9-D272243EA1DD@zooko.com> <4ADF795A.8000501@lothar.com> <2F967DCF-EE00-44B1-8CB2-19F556CC41B8@zooko.com> <4AE137E1.7080100@echeque.com> Message-ID: <4AE14EEB.5080903@echeque.com> James A. Donald wrote: > If one sets to synthesizing nice, but unnatural > properties, one is apt wind to up with the Xanadu > disease. Not everyone may be familiar with the Xanadu disease, so I will clarify: Traditionally projects approach ninety percent complete, and then remain ninety percent complete forever. Project Xandadu, however, became less and less complete with the passage of time, rapidly and asymptotically approaching 0% complete. As the final envisaged design became more and more elegant, not only did the code become less and less complete, but also the design became less and less complete - less UI was prototyped, or even imagined, and less of the implementation for that UI was envisaged in sufficient detail to even start implementing it. Thus first the implementation approached zero, then design of the implementation approached zero, then design of the ui to be implemented approached zero. From Chris.Vanderlinden at ICTGROUP.COM Fri Oct 23 07:44:32 2009 From: Chris.Vanderlinden at ICTGROUP.COM (Vanderlinden, Chris) Date: Fri, 23 Oct 2009 10:44:32 -0400 Subject: [tahoe-dev] Must be overlooking something: unable to findvcvarsall.bat In-Reply-To: <4AE0C5E8.10303@thecoffeycan.com> References: <0E9188CF98399D4BA8F27FCE13364F07037E123F@lngprdexch2.ictgroup.com> <4AE0C5E8.10303@thecoffeycan.com> Message-ID: <0E9188CF98399D4BA8F27FCE13364F07037E165C@lngprdexch2.ictgroup.com> Thanks David, I am definitely going to give 2.6.2 a try. Chris -----Original Message----- From: tahoe-dev-bounces at allmydata.org [mailto:tahoe-dev-bounces at allmydata.org] On Behalf Of David Coffey Sent: Thursday, October 22, 2009 4:52 PM To: tahoe-dev at allmydata.org Subject: Re: [tahoe-dev] Must be overlooking something: unable to findvcvarsall.bat Vanderlinden, Chris wrote: > Steps taken: > > Steps so far I took were the following: > > Installed Python 2.6.3 > Installed Pywin32-214 > Installed MinGW5.1.4(with g++) > Grabbed the Tahoe-1.5.0.tar.gz file > Grabbed the tahoe.deps.tar.gz file > > Extracted both to c:\allmydata-tahoe-1.5.0 > > Ran: python setup.py build_tahoe > (path has both C:\MinGW\bin and c:\Python26 setup) > > Any ideas on what I must have overlooked? > Try using Python version 2.6.2 I had the same problem, and going to 2.6.2 resolved it. I was also receiving KeyError: '_zope_interface_coptimizations' when I did not use the tahoe.deps.tar.gz file It looks like it has to do with a change in distutils in 2.6.3 that affected setuptools This python bug report has more information http://bugs.python.org/issue7064 I also found some more information in this mailing list thread. http://mail.python.org/pipermail/distutils-sig/2009-October/013534.html David _______________________________________________ tahoe-dev mailing list tahoe-dev at allmydata.org http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev From zookog at gmail.com Fri Oct 23 12:04:14 2009 From: zookog at gmail.com (Zooko O'Whielacronx) Date: Fri, 23 Oct 2009 13:04:14 -0600 Subject: [tahoe-dev] notes about DIR2:CHK deep immutability (maybe Tahoe-LAFS v1.6) In-Reply-To: <4AE14EEB.5080903@echeque.com> References: <5F432396-E1A0-4488-84A9-D272243EA1DD@zooko.com> <4ADF795A.8000501@lothar.com> <2F967DCF-EE00-44B1-8CB2-19F556CC41B8@zooko.com> <4AE137E1.7080100@echeque.com> <4AE14EEB.5080903@echeque.com> Message-ID: Hi James: I'm mindful of avoiding Project Xanadu's fate -- prolonged ambitious design, but failure to actually implement and deploy to the benefit of real users. Also, I've contributed to a couple of ambitious failures myself: Mojo Nation and Mnet. Tahoe-LAFS already has a solid anchor to save it from this fate: it already has a substantial number of users who complain when it doesn't do what they expected and who request new features. They request short-term, not-too-ambitious features like faster upload and better charset encoding support. Also, we're committed to full backwards-compatibility in important dimensions (data, API, network protocol, cap). This doesn't mean we're not ambitious. Just wait til you hear about my ideas for a global decentralized currency! :-) But for now, we appear to be making slow but steady progress on adding small features like #778 and #607 while maintaining excellent reliability and code quality. It remains to be seen whether our progress will be fast enough, or whether something else will arise to satisfy people who could have used Tahoe-LAFS. If that happens, then I hope Tahoe-LAFS at least leaves a legacy of impressing into people's minds that provider-independent security is possible. Project Xanadu was a failure at giving people tools they could use, but it was a success at spreading valuable ideas that later took root in projects like Tahoe-LAFS. I hope we can do at least as good. By the way, we need help. Kevan Carstensen's contributions over the summer were a real boon, and we need more! If you can read code, please review patches [1]. If you can write code, please look at "tickets tagged as 'easy'" [2]. In any case, please use Tahoe-LAFS and complain about it! :-) Regards, Zooko tickets mentioned in this email: http://allmydata.org/trac/tahoe/ticket/778 # "shares of happiness" is the wrong measure; "servers of happiness" is better http://allmydata.org/trac/tahoe/ticket/607 # DIR2:IMM [1] http://allmydata.org/trac/tahoe/wiki/PatchReviewProcess [2] http://allmydata.org/trac/tahoe/query?status=!closed&order=priority&keywords=~easy From trac at allmydata.org Fri Oct 23 22:18:48 2009 From: trac at allmydata.org (tahoe-lafs) Date: Sat, 24 Oct 2009 05:18:48 -0000 Subject: [tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better In-Reply-To: <037.6c53986cb63378fa501159cafeed32e6@allmydata.org> References: <037.6c53986cb63378fa501159cafeed32e6@allmydata.org> Message-ID: <046.c19c05c7018f83f5952db4102306d357@allmydata.org> #778: "shares of happiness" is the wrong measure; "servers of happiness" is better --------------------------------+------------------------------------------- Reporter: zooko | Owner: kevan Type: defect | Status: new Priority: critical | Milestone: 1.5.1 Component: code-peerselection | Version: 1.4.1 Keywords: reliability | Launchpad_bug: --------------------------------+------------------------------------------- Comment(by kevan): I'm updating tests.txt to fix some bugs where I was mixing callbacks with synchronous code where I shouldn't have been. I also tweaked the share distribution for the comment:53 test so that the Tahoe2PeerSelector sees the servers in the right order. The simplest fix I can think of for the comment:53 issue changes the way that the [source:src/allmydata/immutable/upload.py at 4045#L313 _got_response] method in a Tahoe2PeerSelector instance handles existing shares in a response. Right now, if a peer tells Tahoe2PeerSelector that it has a share that it wasn't asked to allocate (in the {{{alreadygot}}} part of the response), then the logic in _got_response will alter the entry for that share in {{{preexisting_shares}}} to point to the peer, regardless of whether or not that entry was already pointing to something else. It's kind of rough, but this check fixes the issue for me (in that it makes the test pass). Thoughts? -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Sun Oct 25 10:39:08 2009 From: trac at allmydata.org (tahoe-lafs) Date: Sun, 25 Oct 2009 17:39:08 -0000 Subject: [tahoe-dev] [tahoe-lafs] #637: support "keep this much disk space free" on Windows as well as other platforms In-Reply-To: <037.087782cc5da1cf9aad2a79cd931ea2b7@allmydata.org> References: <037.087782cc5da1cf9aad2a79cd931ea2b7@allmydata.org> Message-ID: <046.cc5c9f889ae2eed9511debd6da27a2a4@allmydata.org> #637: support "keep this much disk space free" on Windows as well as other platforms --------------------------+------------------------------------------------- Reporter: zooko | Owner: zooko Type: defect | Status: assigned Priority: major | Milestone: eventually Component: code-storage | Version: 1.3.0 Keywords: win32 easy | Launchpad_bug: --------------------------+------------------------------------------------- Changes (by zooko): * keywords: win32 => win32 easy * milestone: 1.6.0 => eventually Old description: > This patch [20090220220353-4233b- > 24ec3c21004366dbb38ac28ec1fccce44f693b7b] makes it so that if we don't > know how to find out the free disk space on a platform, we fall back > gracefully by display something like "I dunno" instead of the free disk > space. We also already have code (before that patch) which falls back > gracefully in the sense of emitting a warning at startup if the > {{{reserved_space}}} feature has been configured (see > [src:doc/configuration.txt]) and the current platform can't figure out > how much free disk space it has. > > However, rather than having such fallbacks, I would really rather just > ensure that Tahoe figures out how much free disk space it has on all > supported platforms. For Windows (the only platform which currently does > work using {{{statvfs}}}), the way to do that is to require > {{{pywin32}}}, {{{import win32api}}}, and call > {{{win32api.GetDiskFreeSpaceEx()}}} > > http://docs.activestate.com/activepython/2.5/pywin32/win32api__GetDiskFreeSpaceEx_meth.html > http://msdn.microsoft.com/en-us/library/aa364937(VS.85).aspx > > Careful, don't call {{{win32api.GetDiskFreeSpace()}}} -- that's an old > function that predates disks with more than 2^32 bytes: > http://msdn.microsoft.com/en-us/library/aa364935.aspx New description: This patch [20090220220353-4233b-24ec3c21004366dbb38ac28ec1fccce44f693b7b] makes it so that if we don't know how to find out the free disk space on a platform, we fall back gracefully by display something like "I dunno" instead of the free disk space. We also already have code (before that patch) which falls back gracefully in the sense of emitting a warning at startup if the {{{reserved_space}}} feature has been configured (see [src:doc/configuration.txt]) and the current platform can't figure out how much free disk space it has. However, rather than having such fallbacks, I would really rather just ensure that Tahoe figures out how much free disk space it has on all supported platforms. For Windows (the only platform which currently does work using {{{statvfs}}}), the way to do that is to require {{{pywin32}}}, {{{import win32api}}}, and call {{{win32api.GetDiskFreeSpaceEx()}}} http://docs.activestate.com/activepython/2.5/pywin32/win32api__GetDiskFreeSpaceEx_meth.html http://msdn.microsoft.com/en-us/library/aa364937(VS.85).aspx Careful, don't call {{{win32api.GetDiskFreeSpace()}}} -- that's an old function that predates disks with more than 2^32^ bytes: http://msdn.microsoft.com/en-us/library/aa364935.aspx -- Comment: Any Windows hackers want to fix this? It really bothers me to see Windows getting second-class-citizen treatment, but I don't have convenient access to a Windows box right now to work on, and I'm prioritizing testing, packaging, and release-management for the imminent Tahoe-LAFS v1.6 release. It really should be pretty easy. In fact, I'm marking it with the "easy" tag. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From zooko at zooko.com Sun Oct 25 10:42:36 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Sun, 25 Oct 2009 11:42:36 -0600 Subject: [tahoe-dev] Windows hacker wanted -- Re: [tahoe-lafs] #637: support "keep this much disk space free" on Windows as well as other platforms In-Reply-To: <046.cc5c9f889ae2eed9511debd6da27a2a4@allmydata.org> References: <037.087782cc5da1cf9aad2a79cd931ea2b7@allmydata.org> <046.cc5c9f889ae2eed9511debd6da27a2a4@allmydata.org> Message-ID: <8CA19ED4-8404-41EA-8314-9E40B2ECD3F6@zooko.com> Any Windows hackers want to fix #637? It really bothers me to see Windows getting second-class-citizen treatment, but I don't have convenient access to a Windows box right now to work on, and I'm prioritizing testing, packaging, and release-management for the Tahoe-LAFS v1.6 release. It really should be pretty easy. In fact, I'm marking it with the "easy" tag. Regards, Zooko http://allmydata.org/trac/tahoe/ticket/637 # support "keep this much disk space free" on Windows as well as other platforms From trac at allmydata.org Sun Oct 25 14:29:09 2009 From: trac at allmydata.org (tahoe-lafs) Date: Sun, 25 Oct 2009 21:29:09 -0000 Subject: [tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better In-Reply-To: <037.6c53986cb63378fa501159cafeed32e6@allmydata.org> References: <037.6c53986cb63378fa501159cafeed32e6@allmydata.org> Message-ID: <046.8e8b44eec487c26ec672f95ab8dfd6ed@allmydata.org> #778: "shares of happiness" is the wrong measure; "servers of happiness" is better --------------------------------+------------------------------------------- Reporter: zooko | Owner: kevan Type: defect | Status: new Priority: critical | Milestone: 1.5.1 Component: code-peerselection | Version: 1.4.1 Keywords: reliability | Launchpad_bug: --------------------------------+------------------------------------------- Comment(by kevan): I'm updating behavior.txt to have the rough fix that I mention in comment:63, and tests.txt to add a test for the logic that calculates servers_of_happiness in Tahoe2PeerSelector. I think this fixes comment:53. Thoughts/feedback? -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Sun Oct 25 15:28:23 2009 From: trac at allmydata.org (tahoe-lafs) Date: Sun, 25 Oct 2009 22:28:23 -0000 Subject: [tahoe-dev] [tahoe-lafs] #637: support "keep this much disk space free" on Windows as well as other platforms In-Reply-To: <037.087782cc5da1cf9aad2a79cd931ea2b7@allmydata.org> References: <037.087782cc5da1cf9aad2a79cd931ea2b7@allmydata.org> Message-ID: <046.cb48d2bd6a52282206fcd8c0978f7937@allmydata.org> #637: support "keep this much disk space free" on Windows as well as other platforms --------------------------+------------------------------------------------- Reporter: zooko | Owner: davidsarah Type: defect | Status: new Priority: major | Milestone: 1.6.0 Component: code-storage | Version: 1.3.0 Keywords: win32 easy | Launchpad_bug: --------------------------+------------------------------------------------- Changes (by davidsarah): * owner: zooko => davidsarah * status: assigned => new * milestone: eventually => 1.6.0 Comment: I'll do this. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Sun Oct 25 15:37:36 2009 From: trac at allmydata.org (tahoe-lafs) Date: Sun, 25 Oct 2009 22:37:36 -0000 Subject: [tahoe-dev] [tahoe-lafs] #637: support "keep this much disk space free" on Windows as well as other platforms In-Reply-To: <037.087782cc5da1cf9aad2a79cd931ea2b7@allmydata.org> References: <037.087782cc5da1cf9aad2a79cd931ea2b7@allmydata.org> Message-ID: <046.f2041414b7d361379a496012b93fbbc2@allmydata.org> #637: support "keep this much disk space free" on Windows as well as other platforms --------------------------+------------------------------------------------- Reporter: zooko | Owner: davidsarah Type: defect | Status: assigned Priority: major | Milestone: 1.6.0 Component: code-storage | Version: 1.3.0 Keywords: win32 easy | Launchpad_bug: --------------------------+------------------------------------------------- Changes (by davidsarah): * status: new => assigned Comment: Hmm. It is possible that the call to get disk stats might fail, for some other reason than the API not being available. What should happen in that case? Is it OK to delete redundant or now-misnamed helper methods like do_statvfs and stat_disk in storage/server.py? (I'll keep get_available_space.) -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Sun Oct 25 16:09:54 2009 From: trac at allmydata.org (tahoe-lafs) Date: Sun, 25 Oct 2009 23:09:54 -0000 Subject: [tahoe-dev] [tahoe-lafs] #637: support "keep this much disk space free" on Windows as well as other platforms In-Reply-To: <037.087782cc5da1cf9aad2a79cd931ea2b7@allmydata.org> References: <037.087782cc5da1cf9aad2a79cd931ea2b7@allmydata.org> Message-ID: <046.4a5d7de073ead1fe206ffe5b420e68a0@allmydata.org> #637: support "keep this much disk space free" on Windows as well as other platforms --------------------------+------------------------------------------------- Reporter: zooko | Owner: davidsarah Type: defect | Status: assigned Priority: major | Milestone: 1.6.0 Component: code-storage | Version: 1.3.0 Keywords: win32 easy | Launchpad_bug: --------------------------+------------------------------------------------- Comment(by zooko): Hm, if the call to get disk stats were to fail, what should we do? I think I would suggest logging an error message and returning that there is zero disk space free. It is okay (indeed encouraged) to refactor code to be cleaner whenever you make a change. (Part of the reason that this is okay is that each change is expected to have full test coverage.) -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Sun Oct 25 20:01:56 2009 From: trac at allmydata.org (tahoe-lafs) Date: Mon, 26 Oct 2009 03:01:56 -0000 Subject: [tahoe-dev] [tahoe-lafs] #818: Output of tahoe deep-check --repair is hard to skim Message-ID: <038.1fe7bcb2160f31b066f1c42ef51be4fe@allmydata.org> #818: Output of tahoe deep-check --repair is hard to skim -------------------------------+-------------------------------------------- Reporter: kpreid | Owner: Type: enhancement | Status: new Priority: minor | Milestone: undecided Component: code-frontend-cli | Version: 1.4.1 Keywords: deep-check | Launchpad_bug: -------------------------------+-------------------------------------------- The summary output of tahoe deep-check looks exactly the same, except for the numbers, no matter whether everything is fine or files needed to be repaired. For easier scanning, especially in a log of multiple deep-check operations at different times or of different files, it should have fewer statements that there were 0 of something; for example, in the simple case of all files being healthy, a statement like ?5 files checked and found healthy.? would be good, as opposed to the current: {{{ done: 5 objects checked pre-repair: 5 healthy, 0 unhealthy 0 repairs attempted, 0 successful, 0 failed post-repair: 5 healthy, 0 unhealthy }}} Also, the output does not include any feedback that --add-lease did its thing, but that is perhaps a separate issue. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Fri Oct 16 12:57:33 2009 From: trac at allmydata.org (pycryptopp) Date: Fri, 16 Oct 2009 19:57:33 -0000 Subject: [tahoe-dev] [pycryptopp] #31: segfault in rsa.so Message-ID: <053.0d04bc3814a21d876290fec11be8b289@allmydata.org> #31: segfault in rsa.so ---------------------+------------------------------------------------------ Reporter: zooko | Owner: Type: defect | Status: new Priority: critical | Version: 0.5.17 Keywords: security | Launchpad_bug: ---------------------+------------------------------------------------------ http://allmydata.org/buildbot-pycryptopp/builders/linux-amd64-ubuntu- jaunty-yukyuk/builds/35/steps/test/logs/stdio test_serialize_and_deserialize_signing_key_and_test (pycryptopp.test.test_rsa.SignAndVerify) ... process killed by signal 11 program finished with exit code -1 That was with the current trunk -- [20090916031341-92b7f- 0c97a42c6acaff34bcec8eb1216c14c8cdece8c3] -- Ticket URL: pycryptopp Python bindings for the Crypto++ library From trac at allmydata.org Sun Oct 25 12:18:10 2009 From: trac at allmydata.org (pycryptopp) Date: Sun, 25 Oct 2009 19:18:10 -0000 Subject: [tahoe-dev] [pycryptopp] #31: segfault in rsa.so In-Reply-To: <053.0d04bc3814a21d876290fec11be8b289@allmydata.org> References: <053.0d04bc3814a21d876290fec11be8b289@allmydata.org> Message-ID: <062.995e64dbd3724787d38c51e4d898a35e@allmydata.org> #31: segfault in rsa.so ---------------------+------------------------------------------------------ Reporter: zooko | Owner: Type: defect | Status: new Priority: critical | Version: 0.5.17 Keywords: security | Launchpad_bug: ---------------------+------------------------------------------------------ Comment(by zooko): Hm. I set up a builder which uses the system-supplied libcrypto++ (on Karmic) and it passes: http://allmydata.org/buildbot-pycryptopp/builders/linux-amd64-ubuntu- karmic-yukyuk-syslib Although the builder on the same machine that is building the embedded copy of libcrypto++ segfaults: http://allmydata.org/buildbot-pycryptopp/builders/linux-amd64-ubuntu- karmic-yukyuk This suggests that there is a bug in the embedded copy, right? But it doesn't appear on any other builders. So possibly there is a bug interaction between the embedded copy of libcrypto++ and the version of g++ that is in Karmic Koala? I just added to the show-tool-versions step to emit what version of g++ is present on each platform. -- Ticket URL: pycryptopp Python bindings for the Crypto++ library From trac at allmydata.org Mon Oct 26 14:52:22 2009 From: trac at allmydata.org (tahoe-lafs) Date: Mon, 26 Oct 2009 21:52:22 -0000 Subject: [tahoe-dev] [tahoe-lafs] #819: allmydata.test.test_repairer.Verifier.test_corrupt_crypttext_hashtree failed Message-ID: <037.bfd1c554c9b0fbd0a0a20eaee319d391@allmydata.org> #819: allmydata.test.test_repairer.Verifier.test_corrupt_crypttext_hashtree failed -----------------------+---------------------------------------------------- Reporter: zooko | Owner: zooko Type: defect | Status: new Priority: major | Milestone: undecided Component: code | Version: 1.5.0 Keywords: integrity | Launchpad_bug: -----------------------+---------------------------------------------------- http://allmydata.org/buildbot/builders/hardy- amd64/builds/295/steps/test/logs/stdio The unit test corrupted a share and the verifier failed to notice that the share was corrupted. That test corrupts the share in a random way each time -- it invokes [source:src/allmydata/test/common.py?rev=20091013052154-66853-783ac1c5b3a31f082184e0ad2060ecb4a65e2d92#L1347 _corrupt_crypttext_hash_tree()] which calls [source:src/allmydata/test/common.py?rev=20091013052154-66853-783ac1c5b3a31f082184e0ad2060ecb4a65e2d92#L1156 corrupt_field()] which corrupts the ciphertext hash tree in a randomly chosen way. Unfortunately it doesn't log the way that it chose to corrupt it or the random seed that it used so to reproduce this we'll probably have to run with the trial {{{--until-failure}}} option or something. Hm, I see that there is a {{{debug}}} flag that you can pass to {{{corrupt_field()}}} to get it to log what it is doing. I'll try that. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Mon Oct 26 20:09:07 2009 From: trac at allmydata.org (tahoe-lafs) Date: Tue, 27 Oct 2009 03:09:07 -0000 Subject: [tahoe-dev] [tahoe-lafs] #704: utf-8 decoding fails when certain pyOpenSSL library is used In-Reply-To: <037.73b480634ed55a35f361912b75b61eda@allmydata.org> References: <037.73b480634ed55a35f361912b75b61eda@allmydata.org> Message-ID: <046.7d7b647cf41c535d5911e1d861bb4260@allmydata.org> #704: utf-8 decoding fails when certain pyOpenSSL library is used ---------------------------+------------------------------------------------ Reporter: bewst | Owner: bewst Type: defect | Status: closed Priority: major | Milestone: undecided Component: packaging | Version: 1.4.1 Resolution: wontfix | Keywords: Launchpad_bug: 434411 | ---------------------------+------------------------------------------------ Changes (by zooko): * status: new => closed * resolution: => wontfix Comment: Okay, we can't reproduce this issue so I'm going to close this ticket as "wontfix". -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Mon Oct 26 21:05:37 2009 From: trac at allmydata.org (tahoe-lafs) Date: Tue, 27 Oct 2009 04:05:37 -0000 Subject: [tahoe-dev] [tahoe-lafs] #807: exceptions.OverflowError: join() result is too long for a Python string In-Reply-To: <037.5cbac2eaa31d7486b8ddac98d017998f@allmydata.org> References: <037.5cbac2eaa31d7486b8ddac98d017998f@allmydata.org> Message-ID: <046.e9164ed02720d12e694fb7593d41bdb0@allmydata.org> #807: exceptions.OverflowError: join() result is too long for a Python string --------------------+------------------------------------------------------- Reporter: zooko | Owner: warner Type: defect | Status: new Priority: major | Milestone: undecided Component: code | Version: 1.3.0 Keywords: | Launchpad_bug: --------------------+------------------------------------------------------- Changes (by zooko): * owner: somebody => warner Comment: http://svn.python.org/view/python/tags/r252/Objects/stringobject.c?revision=60915&view=markup Search in text for "join() result is too long for a Python string". It is guarded by {{{if (sz < old_sz || sz > PY_SSIZE_T_MAX)}}}. {{{PY_SSIZE_T_MAX}}} is defined in http://svn.python.org/view/python/tags/r252/Include/pyport.h?revision=60915&view=markup to be {{{((size_t)-1)>>1}}} which would be {{{2,147,483,647}}} on that platform. So, I guess we were trying to {{{.join()}}} together a string that would have been more than 2.1 billion bytes. I don't see how to investigate this, reproduce it, or determine if it has been fixed in newer versions of Tahoe-LAFS. One of the reasons why not is that the exception raised by the selectreactor's {{{_doReadOrWrite()}}} apparently didn't get propagated to foolscap, because no accompanying incident report file was generated. Brian: am I interpreting that correctly? Is there a way to make sure that all unhandled exceptions get registered with the foolscap logging system so that they can be reported as incidents? Do you have any other ideas how to learn more about this issue, or should we just close it as "wontfix"? -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Mon Oct 26 21:34:17 2009 From: trac at allmydata.org (tahoe-lafs) Date: Tue, 27 Oct 2009 04:34:17 -0000 Subject: [tahoe-dev] [tahoe-lafs] #800: improve alacrity by downloading only the part of the Merkle Tree that you need In-Reply-To: <037.d588cbb4a410f00f64e7d23d36fad03a@allmydata.org> References: <037.d588cbb4a410f00f64e7d23d36fad03a@allmydata.org> Message-ID: <046.472b0aee636a7efb980b33ecf37dc2a9@allmydata.org> #800: improve alacrity by downloading only the part of the Merkle Tree that you need -------------------------+-------------------------------------------------- Reporter: zooko | Owner: somebody Type: enhancement | Status: new Priority: major | Milestone: undecided Component: code | Version: 1.5.0 Keywords: easy | Launchpad_bug: -------------------------+-------------------------------------------------- Comment(by zooko): I think it is okay for it to use more reads, so the test should be loosened to allow it to pass even if it does. The existence of that test of the number of reads does serve to remind me, however, that multiple small reads of the of the hash tree *would* actually be a performance loss for small files. We should do some more measurements of performance. Perhaps it would be a win to heuristically over-read by fetching a few more than the required number of hashes would be a a win. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Tue Oct 27 09:43:11 2009 From: trac at allmydata.org (tahoe-lafs) Date: Tue, 27 Oct 2009 16:43:11 -0000 Subject: [tahoe-dev] [tahoe-lafs] #820: it took four hours for this query to complete Message-ID: <037.9b29135863a6009b2183e26983856001@allmydata.org> #820: it took four hours for this query to complete --------------------------+------------------------------------------------- Reporter: zooko | Owner: Type: defect | Status: new Priority: major | Milestone: undecided Component: code-network | Version: 1.5.0 Keywords: | Launchpad_bug: --------------------------+------------------------------------------------- The attached operation report shows that a volunteer grid server in Greece took four hours to reply to my question. Really? That seems unusual. Hm, it also shows that the other volunteer grid servers took around 15 to 30 seconds each. That's weird, too. I'll look for incident files to attach as well. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From zookog at gmail.com Tue Oct 27 15:54:07 2009 From: zookog at gmail.com (Zooko O'Whielacronx) Date: Tue, 27 Oct 2009 16:54:07 -0600 Subject: [tahoe-dev] plan for Tahoe-LAFS v1.6 Message-ID: Folks: It has been almost 3 months since Tahoe-LAFS v1.5 came out. (See the Parade of Release Notes [1].) It is time to start making a new Tahoe-LAFS release! And we need your help. Here's the plan: * what's new in this release Mainly #607 (immutable directories), #778 (measure file health in terms of servers not in terms of shares). Also #637, #761, #771, #814. There are some cool hacks that aren't primarily in Tahoe-LAFS itself, but still worth mentioning, such as #663 (ability to host bzr repos on the Tahoe-LAFS grid). In addition there are numerous small issues that could be fixed without destabilizing the 1.6 release. (If I've forgotten anything please let me know.) If you want to contribute any patches to Tahoe-LAFS v1.6, please do so as soon as possible. There are plenty of open tickets to give you ideas -- start with "Easy Tickets" [2] or the 1.6.0 Milestone [3]. * when can we release it Let's try to finish the major features #607 and #778, and then spend a couple of weeks on quality control. Maybe we can release 1.6 by the end of November. * quality We need to take steps to ensure that v1.6 has the same high level of quality that the previous releases did, or even better if possible. This is very important. For a lot of people, the main appeal of Tahoe-LAFS is the extreme reliability that you get from erasure-coding your data across independent servers. While the Tahoe-LAFS architecture is perfect for this, it would be all too easy for a bug or a UI issue to negate the architectural advantage. The parts that seem most vulnerable to me so far are the issues that would never affect a Tahoe-LAFS developer but might bite an unwary user, such as UI and server-selection. That's why I think #778 is important. To ensure another high-quality release, we need: 1. To maintain thorough test coverage of all code touched by all patches committed since v1.5. 2. To automatically run all the tests on all the Supported Platforms after each commit. We have that, and I've been working with buildslave maintainers to fix any unstable buildslaves and to deploy new buildslaves. If you would like to contribute a buildslave, especially one that is not already represented on the Supported Platforms list [4], please volunteer. 3. To review new patches that have been contributed [5]. This is the easiest way to contribute the most to Tahoe-LAFS right now -- spend 30 minutes or an hour reading a patch and it will help a lot. 4. To review patches that Brian has already committed to trunk since v1.5. Brian has the special privilege of being able to commit patches to trunk without submitting them for review. He is an excellent coder who consistently produces high quality work, but for added safety, another pair of eyes should read the patches that he has committed and look for mistakes. Reading Brian's recent work won't be as easy as reading the patches that are awaiting review, because Brian's patches include large refactorings of the code. It is, however, an opportunity to learn software engineering from a master. 5. To have a bunch of people manually test out the new version after all unstable new features have been committed and before v1.6 is released. Everyone who cares can install it and make sure it that it works at least as well as v1.5 did for all their needs. This is the reason that I would like there to be a couple of week "quiet period" after the unstable patches have been committed and before we release v1.6. 6. To investigate all reasonable bug reports from the field. I've been working on this for the last few days and updating the relevant tickets. There are quite a few bug reports which we do not fully understand what happened or how to reproduce it. None of them appear to threaten data integrity or confidentiality, but I would feel better if we had a more thorough understanding of the details of Tahoe-LAFS operations in the field. Thanks for your help! This is going to be a really good new Tahoe-LAFS release. :-) Regards, Zooko [1] http://allmydata.org/trac/tahoe/wiki/Doc [2] http://allmydata.org/trac/tahoe/query?status=%21closed&order=priority&keywords=%7Eeasy [3] http://allmydata.org/trac/tahoe/roadmap [4] http://allmydata.org/buildbot [5] http://allmydata.org/trac/tahoe/wiki/PatchReviewProcess tickets mentioned in this message: http://allmydata.org/trac/tahoe/ticket/607 # DIR2:IMM http://allmydata.org/trac/tahoe/ticket/637 # support "keep this much disk space free" on Windows as well as other platforms http://allmydata.org/trac/tahoe/ticket/663 # integrate a distributed revision control tool with Tahoe http://allmydata.org/trac/tahoe/ticket/761 # "tahoe cp $DIRCAP/$PATH $LOCAL" raises AttributeError http://allmydata.org/trac/tahoe/ticket/771 # tahoe ls doesn't work on files http://allmydata.org/trac/tahoe/ticket/778 # "shares of happiness" is the wrong measure; "servers of happiness" is better http://allmydata.org/trac/tahoe/ticket/814 # v1.4.1 storage servers sending a negative number for maximum-immutable-share-size? From trac at allmydata.org Tue Oct 27 18:54:12 2009 From: trac at allmydata.org (tahoe-lafs) Date: Wed, 28 Oct 2009 01:54:12 -0000 Subject: [tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better In-Reply-To: <037.6c53986cb63378fa501159cafeed32e6@allmydata.org> References: <037.6c53986cb63378fa501159cafeed32e6@allmydata.org> Message-ID: <046.fe6b63ec4f86b890f88ebbd3223d64f3@allmydata.org> #778: "shares of happiness" is the wrong measure; "servers of happiness" is better --------------------------------+------------------------------------------- Reporter: zooko | Owner: kevan Type: defect | Status: new Priority: critical | Milestone: 1.6.0 Component: code-peerselection | Version: 1.4.1 Keywords: reliability | Launchpad_bug: --------------------------------+------------------------------------------- Comment(by kevan): I was thinking about this the other day, and got to wondering about how the Encoder handles preexisting shares in the event of some servers failing during an upload. (note that the following example is in terms of the existing {{{shares_of_happiness}}} behavior -- it is easier to link to that code than to my patches) As an example, we first look at [source:src/allmydata/immutable/upload.py at 4045#L711 start_encrypted] in CHKUploader. This method creates and runs a Tahoe2PeerSelector to distribute shares of an IEncryptedUploadable across the grid. The results of this are handled in [source:src/allmydata/immutable/upload.py at 4045#L753 set_shareholders]. Note that the PeerTracker instances in {{{use_peers}}} are send to the Encoder instance, while the peerids in {{{already_peers}}} are only used in the upload results. In any case, after invoking {{{set_shareholders}}} on the Encoder, the CHKUploader starts the upload. The part of the Encoding process that concerns me is [source:src/allmydata/immutable/encode.py at 4045#L489 _remove_shareholder]. This method is called when there is an error sending data to one of the shareholders. If a shareholder is lost, the Encoder will check to make sure that {{{shares_of_happiness}}} is still met even with the lost server -- if not, it will abort the upload. The problem with this check is that the Encoder, from what I can tell, has no way of knowing about the shares that already exist on the grid, and thus can't take them into account when making this check. So, if I (say) 8 shares for my storage index already on the grid, {{{shares_of_happiness = 7}}}, only two things for the Encoder to actually send, and one (or both) of those transfers fail, my upload will fail when it shouldn't. Does it seem like I'm off-base there? If not, then it certainly seems like my implementation of {{{servers_of_happiness}}} would fall victim to pretty much the same issue. Is there an obvious way to fix that? (this comment should have a unit test or two written for it, so that what I'm saying is demonstrated/more easily understood, but I need to leave for class now) -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Tue Oct 27 19:43:10 2009 From: trac at allmydata.org (tahoe-lafs) Date: Wed, 28 Oct 2009 02:43:10 -0000 Subject: [tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better In-Reply-To: <037.6c53986cb63378fa501159cafeed32e6@allmydata.org> References: <037.6c53986cb63378fa501159cafeed32e6@allmydata.org> Message-ID: <046.0d62766265101030d984f6179e60c303@allmydata.org> #778: "shares of happiness" is the wrong measure; "servers of happiness" is better --------------------------------+------------------------------------------- Reporter: zooko | Owner: kevan Type: defect | Status: new Priority: critical | Milestone: 1.6.0 Component: code-peerselection | Version: 1.4.1 Keywords: reliability | Launchpad_bug: --------------------------------+------------------------------------------- Comment(by zooko): Okay I read your explanation and followed along using the links that you embedded and I think you are right. I hope you write that unit test or two when you get back from class! To fix this in terms of shares-of-happiness I guess [source:src/allmydata/immutable/upload.py?rev=20090815112846-66853-7015fcf1322720ece28def7b8f2e4955b4689862#L753 CHKUploader.set_shareholders()] could also give an integer "already uploaded shares" to the encoder and the encoder could add that integer to the {{{len(self.landlords)}}} that is uses to decide if there is still a chance of success. To fix it in terms of servers-of-happiness, I suspect that {{{CHKUploader.set_shareholders()}}} would need to pass more information to the encoder, perhaps telling the encoder everything it knows about which servers already have which shares. I'm not entirely sure how your current patch works, but I'll write more about that in a separate comment. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Tue Oct 27 20:51:46 2009 From: trac at allmydata.org (tahoe-lafs) Date: Wed, 28 Oct 2009 03:51:46 -0000 Subject: [tahoe-dev] [tahoe-lafs] #98: Web API is vulnerable to XSRF attacks. In-Reply-To: <040.49f1e12118e000db02329d43a04def61@allmydata.org> References: <040.49f1e12118e000db02329d43a04def61@allmydata.org> Message-ID: <049.c03a12f665d6374eb9f43f0b8a87d566@allmydata.org> #98: Web API is vulnerable to XSRF attacks. -----------------------------------+---------------------------------------- Reporter: nejucomo | Owner: zooko Type: defect | Status: closed Priority: major | Milestone: 0.5.1 Component: code-frontend-web | Version: 0.4.0 Resolution: fixed | Keywords: security Launchpad_bug: | -----------------------------------+---------------------------------------- Comment(by davidsarah): Note that JavaScript in a given file can still obtain the read URI for that file. In the case of a mutable file, this is more than least authority because it allows reading future versions. I will open a new bug about that. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Tue Oct 27 21:03:18 2009 From: trac at allmydata.org (tahoe-lafs) Date: Wed, 28 Oct 2009 04:03:18 -0000 Subject: [tahoe-dev] [tahoe-lafs] #821: A script in a file viewed through the WUI can obtain the file's read cap Message-ID: <042.6fe7727b4233229f2c8619c775a7cffb@allmydata.org> #821: A script in a file viewed through the WUI can obtain the file's read cap -------------------------------+-------------------------------------------- Reporter: davidsarah | Owner: Type: defect | Status: new Priority: major | Milestone: undecided Component: code-frontend-web | Version: 1.5.0 Keywords: newcaps security | Launchpad_bug: -------------------------------+-------------------------------------------- http://allmydata.org/trac/tahoe/ticket/98#comment:22 A script (such as JavaScript) in an [X]HTML file viewed through the WUI can obtain the read cap for that file. For an immutable file, this is not much of a problem because the script can read the contents of the file anyway. However, for a mutable file, it can also read any future version, which is a violation of the Principle of Least Authority. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Tue Oct 27 21:36:19 2009 From: trac at allmydata.org (tahoe-lafs) Date: Wed, 28 Oct 2009 04:36:19 -0000 Subject: [tahoe-dev] [tahoe-lafs] #821: A script in a file viewed through the WUI can obtain the file's read cap In-Reply-To: <042.6fe7727b4233229f2c8619c775a7cffb@allmydata.org> References: <042.6fe7727b4233229f2c8619c775a7cffb@allmydata.org> Message-ID: <051.2e08c353ee0ae77ee3870dc002f5827b@allmydata.org> #821: A script in a file viewed through the WUI can obtain the file's read cap -------------------------------+-------------------------------------------- Reporter: davidsarah | Owner: Type: defect | Status: new Priority: major | Milestone: undecided Component: code-frontend-web | Version: 1.5.0 Keywords: newcaps security | Launchpad_bug: -------------------------------+-------------------------------------------- Comment(by davidsarah): I believe this issue also applies to other scriptable file formats such as PDF and Flash. Possible solution: If the NewCapDesign implements versioned read caps (i.e. read caps that only give access to a specific version of a mutable file), then that would allow versioned read URLs to be used by default by the WUI. That would also have the side effect that cutting-and-pasting an URL from the address bar would only give access to a single file version by default (and the versioned URLs could also provide collision resistance). I'm not sure whether that is what users would expect, but it is a safer default. I think this would have to work by having the gateway perform an HTTP redirect from the unversioned read URL to the versioned one (probably conditional on a parameter in the URL). The parent directory listing cannot directly link to the versioned URLs because that would require reading every file in the listing, which would be too inefficient. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Tue Oct 27 23:08:49 2009 From: trac at allmydata.org (tahoe-lafs) Date: Wed, 28 Oct 2009 06:08:49 -0000 Subject: [tahoe-dev] [tahoe-lafs] #821: A script in a file viewed through the WUI can obtain the file's read cap In-Reply-To: <042.6fe7727b4233229f2c8619c775a7cffb@allmydata.org> References: <042.6fe7727b4233229f2c8619c775a7cffb@allmydata.org> Message-ID: <051.6298c7622a39f9cb03a9359c6067c83b@allmydata.org> #821: A script in a file viewed through the WUI can obtain the file's read cap -----------------------------------+---------------------------------------- Reporter: davidsarah | Owner: Type: defect | Status: reopened Priority: major | Milestone: undecided Component: code-frontend-web | Version: 1.5.0 Resolution: | Keywords: newcaps security Launchpad_bug: | -----------------------------------+---------------------------------------- Changes (by davidsarah): * status: closed => reopened * resolution: duplicate => Comment: It's not really a duplicate, because #615 is about scripts from one page having access to other pages. If #615 were fixed, this issue would remain, since it isn't dependent on the same-origin policy. That is, even if we were to put every page in a different origin, a script would still be able to access its ''own'' URL -- and therefore future versions of its file if this bug is not fixed. In a sense, #615 masks this bug, because it allows an attack that is a superset of this one. So I think we should leave this ticket open and reference it from #615. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Tue Oct 27 23:25:20 2009 From: trac at allmydata.org (tahoe-lafs) Date: Wed, 28 Oct 2009 06:25:20 -0000 Subject: [tahoe-dev] [tahoe-lafs] #127: Cap URLs leaked via HTTP Referer header (was: smaller CSRF attack still possible) In-Reply-To: <038.4be7d9a2c1f609ac6979edd7f7ad78e6@allmydata.org> References: <038.4be7d9a2c1f609ac6979edd7f7ad78e6@allmydata.org> Message-ID: <047.282237973edcc27828580a298fc70bfe@allmydata.org> #127: Cap URLs leaked via HTTP Referer header -------------------------------+-------------------------------------------- Reporter: warner | Owner: Type: defect | Status: new Priority: major | Milestone: undecided Component: code-frontend-web | Version: 0.7.0 Keywords: security | Launchpad_bug: -------------------------------+-------------------------------------------- Changes (by davidsarah): * keywords: => security * priority: minor => major Comment: This attack isn't CSRF; changing the summary accordingly. If you like this bug, you might also like #615 and #821 :-) (#821 is about leaking the URL to scripts in the file itself, #615 is about leaking it to other pages.) -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Tue Oct 27 23:32:32 2009 From: trac at allmydata.org (tahoe-lafs) Date: Wed, 28 Oct 2009 06:32:32 -0000 Subject: [tahoe-dev] [tahoe-lafs] #615: Can JavaScript loaded from Tahoe access all your content which is loaded from Tahoe? In-Reply-To: <037.419ca30d794f832cc443eabb14db75c3@allmydata.org> References: <037.419ca30d794f832cc443eabb14db75c3@allmydata.org> Message-ID: <046.9aff4fad3c7f732b634f445ab1e2acd0@allmydata.org> #615: Can JavaScript loaded from Tahoe access all your content which is loaded from Tahoe? ---------------------------+------------------------------------------------ Reporter: zooko | Type: defect Status: new | Priority: critical Milestone: undecided | Component: code-frontend-web Version: 1.3.0 | Keywords: newcaps security Launchpad_bug: | ---------------------------+------------------------------------------------ Changes (by davidsarah): * keywords: => newcaps security * priority: major => critical Comment: #821 (now reopened) describes a less serious security problem that would still be present even if every page had a distinct origin. Note that the fix suggested for that bug will only work if this one is also fixed, i.e. #821 is dependent on this bug. #127 seems to be almost exclusively about Referer header cap leakage, and I've changed its summary to reflect that. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Tue Oct 27 23:49:32 2009 From: trac at allmydata.org (tahoe-lafs) Date: Wed, 28 Oct 2009 06:49:32 -0000 Subject: [tahoe-dev] [tahoe-lafs] #821: A script in a file viewed through the WUI can obtain the file's read cap In-Reply-To: <042.6fe7727b4233229f2c8619c775a7cffb@allmydata.org> References: <042.6fe7727b4233229f2c8619c775a7cffb@allmydata.org> Message-ID: <051.1b1eff9c8282864a548e420f4d13262e@allmydata.org> #821: A script in a file viewed through the WUI can obtain the file's read cap -----------------------------------+---------------------------------------- Reporter: davidsarah | Owner: Type: defect | Status: reopened Priority: major | Milestone: undecided Component: code-frontend-web | Version: 1.5.0 Resolution: | Keywords: newcaps security Launchpad_bug: | -----------------------------------+---------------------------------------- Comment(by davidsarah): http://allmydata.org/pipermail/tahoe-dev/2007-September/000134.html > After the cap-talk meeting, Brian and I agreed -- I thought -- not to > bother making the URL field read-only, and instead to document the > fact that sharing a URL will (by default) share write access to your > directory as well as read access.. Apparently Brian remains > interested in a !JavaScript hack to read-only-ify URLs after loading > them. When using the WUI, is it only for directories that the URL will represent a write cap? (Directory listings do not contain untrusted scripts, so this bug shouldn't be a problem in the directory case.) -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Wed Oct 28 08:11:10 2009 From: trac at allmydata.org (tahoe-lafs) Date: Wed, 28 Oct 2009 15:11:10 -0000 Subject: [tahoe-dev] [tahoe-lafs] #200: writing of shares is fragile and/or there is no graceful shutdown In-Reply-To: <037.dd0442b313e5c64f2d29eea01e3724f6@allmydata.org> References: <037.dd0442b313e5c64f2d29eea01e3724f6@allmydata.org> Message-ID: <046.aebaca901b29b7abaad0634a2ea90173@allmydata.org> #200: writing of shares is fragile and/or there is no graceful shutdown -----------------------------------+---------------------------------------- Reporter: zooko | Owner: warner Type: enhancement | Status: new Priority: major | Milestone: eventually Component: code-storage | Version: 0.6.1 Keywords: integrity reliability | Launchpad_bug: -----------------------------------+---------------------------------------- Comment(by zooko): This isn't an integrity issue because even if a share is corrupted due to this issue that doesn't threaten the integrity of the file. Note that there are in general two possible ways to reduce the problem of shares being corrupted during a shutdown or crash. One is to make the writing of shares be more robust, for example by writing out a complete new copy of the share to a new temporary location and then renaming it into place. This is the option that increases I/O costs as discussed in the initial comment. Another is to add a "graceful shutdown" option where the storage server gets a chance to finish (or abort) updating a share before its process is killed. I'm currently opposed to the latter and would be happier with the current fragile update than with the latter. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Wed Oct 28 14:22:11 2009 From: trac at allmydata.org (tahoe-lafs) Date: Wed, 28 Oct 2009 21:22:11 -0000 Subject: [tahoe-dev] [tahoe-lafs] #200: writing of shares is fragile (was: writing of shares is fragile and/or there is no graceful shutdown) In-Reply-To: <037.dd0442b313e5c64f2d29eea01e3724f6@allmydata.org> References: <037.dd0442b313e5c64f2d29eea01e3724f6@allmydata.org> Message-ID: <046.5e2c063dfb58c5f4b41d86638b921377@allmydata.org> #200: writing of shares is fragile --------------------------+------------------------------------------------- Reporter: zooko | Owner: warner Type: enhancement | Status: new Priority: major | Milestone: eventually Component: code-storage | Version: 0.6.1 Keywords: reliability | Launchpad_bug: --------------------------+------------------------------------------------- Changes (by davidsarah): * keywords: integrity reliability => reliability Comment: I agree that "graceful shutdown" is not the right solution. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Wed Oct 28 21:20:35 2009 From: trac at allmydata.org (tahoe-lafs) Date: Thu, 29 Oct 2009 04:20:35 -0000 Subject: [tahoe-dev] [tahoe-lafs] #802: MacOS X Test Failures In-Reply-To: <037.13238bcd49a64ca7e79325d679c119f4@allmydata.org> References: <037.13238bcd49a64ca7e79325d679c119f4@allmydata.org> Message-ID: <046.15789eb9d4c61a4b22541dd0f4462bd6@allmydata.org> #802: MacOS X Test Failures ---------------------+------------------------------------------------------ Reporter: bewst | Owner: bewst Type: defect | Status: assigned Priority: major | Milestone: 0.3.0 Component: unknown | Version: 1.5.0 Keywords: | Launchpad_bug: ---------------------+------------------------------------------------------ Changes (by zooko): * milestone: undecided => 0.3.0 Comment: Hm. I hate to impose that constraint on platforms where they don't need the newer pycryptopp. pycryptopp-0.5.17 is necessary for Mac OS 10.6 and for Fedora amd64, and it may or may not be needed for other Linux amd64. pycryptopp-0.5.15 is necessary for AthlonXP and old-style Celeron. Also note that the latest pycryptopp-0.5.17 won't build properly with the brand new Ubuntu Karmic: https://bugs.launchpad.net/ubuntu/+bug/461303 . Anyway, maybe we should increase that requirement, or do so on the known- to-be-affected platforms. I'd like to put it off and think about it for a bit first, though. For one thing, we have a recurring problem with people being unable to install Tahoe-LAFS because they can't install pycryptopp. One of the main ways that I intend to address that problem is for buildslaves to build binary libraries ("eggs") of pycryptopp and upload them to here: http://allmydata.org/source/tahoe/deps/tahoe-dep-eggs . The idea is that the Tahoe-LAFS build system (using {{{setuptools}}}) will automatically download the appropriate binary egg from that list for the platform on which it is building. That mechanism basically works, but we haven't set up buildslaves for all the right platforms and all of the dependencies and tested it out. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Wed Oct 28 22:37:36 2009 From: trac at allmydata.org (tahoe-lafs) Date: Thu, 29 Oct 2009 05:37:36 -0000 Subject: [tahoe-dev] [tahoe-lafs] #822: WUI should use a more reliable, out-of-band means of reporting errors when a server connection is lost during a download Message-ID: <042.0f5713561f88d9fe554c5ed642ce752c@allmydata.org> #822: WUI should use a more reliable, out-of-band means of reporting errors when a server connection is lost during a download -------------------------------+-------------------------------------------- Reporter: davidsarah | Owner: Type: defect | Status: new Priority: major | Milestone: undecided Component: code-frontend-web | Version: 1.5.0 Keywords: integrity | Launchpad_bug: -------------------------------+-------------------------------------------- The discussion of bug #698 (which turned out to be a Firefox bug) turned up a potential integrity problem that can occur if a server connection is lost in the middle of a download via the WUI: http://allmydata.org/trac/tahoe/ticket/698#comment:1 > The first thing that comes to mind is that a server connection could have been lost in the middle of the download (in this case, after we've retrieved the UEB and some of the hashes, but before we've retrieved the first data block). The web server has to commit to success (200) or failure (404 or 500 or something) before it starts sending any of the plaintext, but it doesn't want to store the entire file either. So it bases the HTTP response code upon the initial availability of k servers, and hopes they'll stick around for the whole download. > When we get a "late failure" (i.e. one of the servers disconnects in the middle), the webapi doesn't have a lot of choices. At the moment, it emits a brief error message (attached to whatever partial content has already been written out), then drops the HTTP connection, and hopes that the client is observant enough to notice that the number of received bytes does not match the previously-sent Content-Length header, and then announce an error on the client side. > If the application doing the fetch (perhaps the browser, perhaps tiddywiki itself?) doesn't strictly check the Content-Length header, then it could get partial content without an error message. > There are two directions to fix this: > * change the webapi to use "Chunked Encoding", basically delivering data one segment at a time, possibly giving the server a chance to emit an error header in between segments: this would let us respond better to these errors > * fix the other download-should-be-better tickets (#193, #287) to tolerate lost servers better, which might reduce the rate at which these errors occur As pointed out in http://allmydata.org/pipermail/tahoe- dev/2009-May/001724.html , it is possible that the length so far plus the length of the error message, coincidentally equals the expected file length. So even for a web client that diligently checks the Content- Length, there might not be enough information to detect an error. An attacker might try to force this situation (I don't know what their chance of success would be, but probably much higher than trying to attack the crypto). In any case, the WUI is currently using in-band error reporting, which is problematic because the error message will be treated as data of whatever format the client thinks the content has. This is an integrity issue because the download from the gateway to the client has no cryptographic integrity checking. To close this bug, find and implement some way to make typical web clients reliably report an error when a download fails part-way through. Alternatively, prove that it isn't possible, and document this as an inherent limitation of the WUI. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Wed Oct 28 22:58:59 2009 From: trac at allmydata.org (tahoe-lafs) Date: Thu, 29 Oct 2009 05:58:59 -0000 Subject: [tahoe-dev] [tahoe-lafs] #822: WUI should use a more reliable, out-of-band means of reporting errors when a server connection is lost during a download In-Reply-To: <042.0f5713561f88d9fe554c5ed642ce752c@allmydata.org> References: <042.0f5713561f88d9fe554c5ed642ce752c@allmydata.org> Message-ID: <051.d971c5c5ea40d90b0157537c0058e9a7@allmydata.org> #822: WUI should use a more reliable, out-of-band means of reporting errors when a server connection is lost during a download -------------------------------+-------------------------------------------- Reporter: davidsarah | Owner: Type: defect | Status: new Priority: major | Milestone: undecided Component: code-frontend-web | Version: 1.5.0 Keywords: integrity | Launchpad_bug: -------------------------------+-------------------------------------------- Comment(by davidsarah): An alternative to Chunked Encoding that is worth considering is to use HTTP-over-TLS for the WUI (since TLS does have the ability to report errors out-of-band that clients "MUST" pay attention to, although I don't know how these are displayed to the user). Note that the TLS handshake would only occur once per client of a given gateway, so the main performance impact would only be the session encryption/MAC. OTOH, we'd probably have to use self-signed certificates, which throw up very ugly warnings in recent browsers. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Wed Oct 28 23:11:29 2009 From: trac at allmydata.org (tahoe-lafs) Date: Thu, 29 Oct 2009 06:11:29 -0000 Subject: [tahoe-dev] [tahoe-lafs] #127: Cap URLs leaked via HTTP Referer header In-Reply-To: <038.4be7d9a2c1f609ac6979edd7f7ad78e6@allmydata.org> References: <038.4be7d9a2c1f609ac6979edd7f7ad78e6@allmydata.org> Message-ID: <047.ee4a9d5d5c8d6d1d6e4b1fdfd50fc1e7@allmydata.org> #127: Cap URLs leaked via HTTP Referer header -------------------------------+-------------------------------------------- Reporter: warner | Owner: Type: defect | Status: new Priority: major | Milestone: undecided Component: code-frontend-web | Version: 0.7.0 Keywords: security | Launchpad_bug: -------------------------------+-------------------------------------------- Comment(by davidsarah): Replying to [comment:9 zooko]: > http://www.w3.org/Protocols/rfc2616/rfc2616-sec15.html#sec15.1.3 > * "Clients SHOULD NOT include a Referer header field in a (non-secure) HTTP request if the referring page was transferred with a secure protocol." I have heard someone, I think Tyler Close, say that clients interpret this in a stupidly literal way: they do include the Referer header in an HTTP- over-SSL/TLS request -- because that is not a "non-secure" request -- when the referring page was also transferred over HTTP-over-SSL/TLS, even if the keys or domains are different. Also, http://community.livejournal.com/lj_dev/707379.html seems to suggest that non-Mozilla browsers do not follow the above restriction on sending Referer at all -- although that was in 2006. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Wed Oct 28 23:14:05 2009 From: trac at allmydata.org (tahoe-lafs) Date: Thu, 29 Oct 2009 06:14:05 -0000 Subject: [tahoe-dev] [tahoe-lafs] #822: WUI should use a more reliable, out-of-band means of reporting errors when a server connection is lost during a download In-Reply-To: <042.0f5713561f88d9fe554c5ed642ce752c@allmydata.org> References: <042.0f5713561f88d9fe554c5ed642ce752c@allmydata.org> Message-ID: <051.bc18a3e938f4a9495af99950bd495c87@allmydata.org> #822: WUI should use a more reliable, out-of-band means of reporting errors when a server connection is lost during a download -------------------------------+-------------------------------------------- Reporter: davidsarah | Owner: Type: defect | Status: new Priority: major | Milestone: undecided Component: code-frontend-web | Version: 1.5.0 Keywords: integrity | Launchpad_bug: -------------------------------+-------------------------------------------- Comment(by davidsarah): Using SSL/TLS might in principle also have helped with #127 (cap leakage via Referer headers), but see http://allmydata.org/trac/tahoe/ticket/127#comment:13 -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Wed Oct 28 23:34:50 2009 From: trac at allmydata.org (tahoe-lafs) Date: Thu, 29 Oct 2009 06:34:50 -0000 Subject: [tahoe-dev] [tahoe-lafs] #127: Cap URLs leaked via HTTP Referer header In-Reply-To: <038.4be7d9a2c1f609ac6979edd7f7ad78e6@allmydata.org> References: <038.4be7d9a2c1f609ac6979edd7f7ad78e6@allmydata.org> Message-ID: <047.03f565cacf2b6dadd6862a2d497ece24@allmydata.org> #127: Cap URLs leaked via HTTP Referer header -------------------------------+-------------------------------------------- Reporter: warner | Owner: Type: defect | Status: new Priority: major | Milestone: undecided Component: code-frontend-web | Version: 0.7.0 Keywords: security | Launchpad_bug: -------------------------------+-------------------------------------------- Comment(by davidsarah): The behaviour of Mozilla browsers for the secure -> secure case is controlled by this preference [note "rr" spelling]: http://kb.mozillazine.org/Network.http.sendSecureXSiteReferrer Summary: it does the wrong thing by default :-( (This preference controls when to send Referer in other cases: http://kb.mozillazine.org/Network.http.sendRefererHeader I just changed my Firefox config to '''never''' send it, i.e. {{{network.http.sendRefererHeader = 0}}} and {{{network.http.sendSecureXSiteReferrer = false}}}. I doubt anything will break.) -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Wed Oct 28 23:44:55 2009 From: trac at allmydata.org (tahoe-lafs) Date: Thu, 29 Oct 2009 06:44:55 -0000 Subject: [tahoe-dev] [tahoe-lafs] #127: Cap URLs leaked via HTTP Referer header In-Reply-To: <038.4be7d9a2c1f609ac6979edd7f7ad78e6@allmydata.org> References: <038.4be7d9a2c1f609ac6979edd7f7ad78e6@allmydata.org> Message-ID: <047.71febe469b10c5f7edd50b1fdafa25bd@allmydata.org> #127: Cap URLs leaked via HTTP Referer header -------------------------------+-------------------------------------------- Reporter: warner | Owner: Type: defect | Status: new Priority: major | Milestone: undecided Component: code-frontend-web | Version: 0.7.0 Keywords: security | Launchpad_bug: -------------------------------+-------------------------------------------- Comment(by davidsarah): Microsoft cannot be trusted to document IE's behaviour; their knowledge base article at http://support.microsoft.com/kb/178066 is contradicted by http://marc.info/?l=bugtraq&m=107282279713152&w=2 -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Thu Oct 29 09:11:25 2009 From: trac at allmydata.org (tahoe-lafs) Date: Thu, 29 Oct 2009 16:11:25 -0000 Subject: [tahoe-dev] [tahoe-lafs] #127: Cap URLs leaked via HTTP Referer header In-Reply-To: <038.4be7d9a2c1f609ac6979edd7f7ad78e6@allmydata.org> References: <038.4be7d9a2c1f609ac6979edd7f7ad78e6@allmydata.org> Message-ID: <047.582b65d26b1f595c92a5550689295fda@allmydata.org> #127: Cap URLs leaked via HTTP Referer header -------------------------------+-------------------------------------------- Reporter: warner | Owner: Type: defect | Status: new Priority: major | Milestone: undecided Component: code-frontend-web | Version: 0.7.0 Keywords: security | Launchpad_bug: -------------------------------+-------------------------------------------- Comment(by zooko): Last year I asked Collin Jackson (who knows a good deal about web security) how to automatically prevent Referer Headers from being sent. He repied: Most of the techniques involve making the request come from a non-HTTP scheme. The browser usually won't bother to send a Referer in this case. Option A: ftp scheme {{{ ftp://site.com/source.html }}} Option B: about:blank scheme {{{ w = window.open(""); w.document.write("
"); w.document.forms[0].submit(); }}} Option C: javascript: scheme {{{ window.location="javascript:''" }}} -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Thu Oct 29 10:18:11 2009 From: trac at allmydata.org (tahoe-lafs) Date: Thu, 29 Oct 2009 17:18:11 -0000 Subject: [tahoe-dev] [tahoe-lafs] #698: corrupted file displayed to user after failure to download followed by retry In-Reply-To: <037.d4b9836fe725d104b3c238def5c2f190@allmydata.org> References: <037.d4b9836fe725d104b3c238def5c2f190@allmydata.org> Message-ID: <046.3415e68b7b517ca85d1963680ca2153a@allmydata.org> #698: corrupted file displayed to user after failure to download followed by retry ------------------------------+--------------------------------------------- Reporter: zooko | Owner: Type: defect | Status: closed Priority: critical | Milestone: 1.5.0 Component: code-network | Version: 1.4.1 Resolution: invalid | Keywords: integrity Launchpad_bug: | ------------------------------+--------------------------------------------- Comment(by zooko): I believe Brian downgraded that event to no longer be considered "weird" so it would stop triggering incident reports. {{{WEIRD Tub.connectorFinished: WEIRD, is not in [....]}}} This unexplained behavior may be related to #653, as described in ticket:653#comment:20. So, I guess it isn't clear to me whether we should ticket that issue or instead engage in some strategy which makes our use of foolscap immune to such issues. I suppose if we were going to do ''that'' then we should ticket ''that''. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Thu Oct 29 12:04:43 2009 From: trac at allmydata.org (tahoe-lafs) Date: Thu, 29 Oct 2009 19:04:43 -0000 Subject: [tahoe-dev] [tahoe-lafs] #127: Cap URLs leaked via HTTP Referer header In-Reply-To: <038.4be7d9a2c1f609ac6979edd7f7ad78e6@allmydata.org> References: <038.4be7d9a2c1f609ac6979edd7f7ad78e6@allmydata.org> Message-ID: <047.b5bbf9692b1dc56f180cb5ebaeb4238c@allmydata.org> #127: Cap URLs leaked via HTTP Referer header -------------------------------+-------------------------------------------- Reporter: warner | Owner: Type: defect | Status: new Priority: major | Milestone: undecided Component: code-frontend-web | Version: 0.7.0 Keywords: security | Launchpad_bug: -------------------------------+-------------------------------------------- Comment(by davidsarah): If all of these work, option C seems to be the simplest. Option A requires an ftp server, which seems like an unwarranted excursion if we can possibly avoid it. Option B depends on more of the DOM and HTML, hence greater exposure to browser idiosyncrasies, than option C does. (The location URL in option C needs to be properly escaped for an URL-in- JSStringLiteral-in-HTML-in-JSStringLiteral-in-JSStringLiteral-in-HTML, but that's straightforward :-) -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Thu Oct 29 12:33:18 2009 From: trac at allmydata.org (tahoe-lafs) Date: Thu, 29 Oct 2009 19:33:18 -0000 Subject: [tahoe-dev] [tahoe-lafs] #127: v (was: Cap URLs leaked via HTTP Referer header) In-Reply-To: <038.4be7d9a2c1f609ac6979edd7f7ad78e6@allmydata.org> References: <038.4be7d9a2c1f609ac6979edd7f7ad78e6@allmydata.org> Message-ID: <047.0f5397fe5e9ab3e92a8fc2b92f725270@allmydata.org> #127: v -------------------------------+-------------------------------------------- Reporter: warner | Owner: Type: defect | Status: new Priority: major | Milestone: undecided Component: code-frontend-web | Version: 0.7.0 Keywords: security | Launchpad_bug: -------------------------------+-------------------------------------------- Comment(by zooko): I don't really understand those options very well or how they would be implemented in Tahoe-LAFS. I should mention another option: moving the cap from the URL itself into the URL fragment, as Tyler Close's web-key does: http://waterken.sourceforge.net/web-key . This would certainly prevent caps from leaking into the Referer header, although they might still leak due to tools like "Yahoo Toolbar" and the like. (Tools which send all the URLs that you view to some remote server for you.) Also, as Brian wrote in comment:8, it isn't clear how Tahoe-LAFS could use caps-in-fragments for purposes other than displaying the result in on a web page. Perhaps there could be a two-layer design where the WAPI has caps in URLs (which is consistent with the REST paradigm), but a new WUI (which would be written in JavaScript, Cajita or Jacaranda) would somehow translate between caps-in-fragments and caps-in-URLs so that the URL that actually appeared in the URL widget would always be caps-in-fragment. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From zookog at gmail.com Thu Oct 29 13:13:55 2009 From: zookog at gmail.com (Zooko O'Whielacronx) Date: Thu, 29 Oct 2009 14:13:55 -0600 Subject: [tahoe-dev] buildbot report Message-ID: Folks: Welcome into the category of Supported Platforms: * NetBSD4 i386 (hosted by MidnightMagic) * Debian Lenny arm * Ubuntu Karmic (hosted by me) * Lenny-amd64 hosted by Soultcer (in addition to the current one by Eugen) Farewell (at least for now) to platforms being removed from Supported: * Debian etch (slave off-line) * cygwin (slave off-line) * Ubuntu Hardy i686 (slave off-line); note we still have Hardy amd64 thanks to Zandr * Jaunty (I upgraded to Karmic) * ArchLinux (slave died) * Mac OS 10.5 (slave off-line) Other changes to the buildbot config: * Make it possible to build a branch by specifying the branch name on the "Force Build" form. (This is going to be useful!) * Upgrade to current version of buildbot. * None of our buildslaves require darcs-version-1-format repositories anymore * Fix the pycryptopp buildslave so that it doesn't upload pycryptopp packages if the tests failed. * Run valgrind on pycryptopp on Linux. * Display g++ and as versions on pycryptopp builds. From zookog at gmail.com Thu Oct 29 13:27:23 2009 From: zookog at gmail.com (Zooko O'Whielacronx) Date: Thu, 29 Oct 2009 14:27:23 -0600 Subject: [tahoe-dev] buildbot report In-Reply-To: References: Message-ID: I accidentally sent this before it was complete. Here is the complete version. Welcome into the category of Supported Platforms: * Debian Lenny armv5tel (Fran?ois) * NetBSD4 i386 (MidnightMagic) * Ubuntu Karmic (me) * Lenny-amd64 by Soultcer (in addition to the current one by Eugen) Farewell (at least for now) to platforms being removed from Supported: * Debian etch (slave off-line) * cygwin (slave off-line) * Ubuntu Hardy i686 (slave off-line); note we still have Hardy amd64 by Zandr * Mac OS 10.5 (slave off-line); note we * Jaunty (I upgraded to Karmic) * ArchLinux (slave died) If someone wants to volunteer a Mac buildslave that would be appreciated! Also any platform which you love and which is not currently in the Supported Platforms list. Other changes to the buildbot config: * Make it possible to build a branch by specifying the branch name on the "Force Build" form. (This is going to be useful!) * Upgrade to current version of buildbot. * None of our buildslaves require darcs-version-1-format repositories anymore * Fix the pycryptopp buildslave so that it doesn't upload pycryptopp packages if the tests failed. * Run valgrind on pycryptopp on Linux. * Display g++ and as versions on pycryptopp builds. Outstanding problems: The code coverage, memory usage, and speed test builders are red. The code coverage breakage is due to issue #810. The memory usage and speed test breakages are, I think, due to those tests running against a grid which is full or which has become unavailable to them or something. There are three buildslaves which are connected but languish in the Unsupported category. BlackDew's debian-unstable-i386 lives up to the name "unstable" by occasionally getting signal 11. Likewise with Shawn's jaunty-amd64. I assume this is a problem on those machines and not a bug in Tahoe-LAFS. The third Unsupported buildslave is Ruben's Fedora, which appears to be stable but which has a problem with setuptools and/or pyutil. Hopefully this problem will be fixed soon and Ruben's Fedora buildslave can Level Up. :-) See the Buildbot Page for more information: http://allmydata.org/buildbot Thanks to everyone for their contributions! Regards, Zooko http://allmydata.org/trac/tahoe/ticket/810#where did figleaf_htmlizer.py come from? From trac at allmydata.org Thu Oct 29 13:54:33 2009 From: trac at allmydata.org (tahoe-lafs) Date: Thu, 29 Oct 2009 20:54:33 -0000 Subject: [tahoe-dev] [tahoe-lafs] #127: Cap URLs leaked via HTTP Referer header In-Reply-To: <038.4be7d9a2c1f609ac6979edd7f7ad78e6@allmydata.org> References: <038.4be7d9a2c1f609ac6979edd7f7ad78e6@allmydata.org> Message-ID: <047.cabd789da28eae0c6b46b65cd4fcb13c@allmydata.org> #127: Cap URLs leaked via HTTP Referer header -------------------------------+-------------------------------------------- Reporter: warner | Owner: Type: defect | Status: new Priority: major | Milestone: undecided Component: code-frontend-web | Version: 0.7.0 Keywords: security | Launchpad_bug: -------------------------------+-------------------------------------------- Comment(by davidsarah): For anyone trying to test option C, the syntax above was wrong; it should be {{{ }}} However, I'm not sure that options B or C work for what we are trying to do. The problem we're trying to solve is that following a link from the contents of a Tahoe file may reveal the file's URL ('capURL'). Options B and C prevent the page at 'capURL' from seeing the referring URL (of the page containing the JavaScript), but they don't prevent leakage of 'capURL' to a site that the page at 'capURL' links to. Only option A allows to you prevent sending a Referer header when following a link from a page with arbitrary contents (by serving that page via FTP). -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Thu Oct 29 14:40:45 2009 From: trac at allmydata.org (tahoe-lafs) Date: Thu, 29 Oct 2009 21:40:45 -0000 Subject: [tahoe-dev] [tahoe-lafs] #127: Cap URLs leaked via HTTP Referer header In-Reply-To: <038.4be7d9a2c1f609ac6979edd7f7ad78e6@allmydata.org> References: <038.4be7d9a2c1f609ac6979edd7f7ad78e6@allmydata.org> Message-ID: <047.e2e3fd5ee4bb08152335140d22277475@allmydata.org> #127: Cap URLs leaked via HTTP Referer header -------------------------------+-------------------------------------------- Reporter: warner | Owner: Type: defect | Status: new Priority: major | Milestone: undecided Component: code-frontend-web | Version: 0.7.0 Keywords: security | Launchpad_bug: -------------------------------+-------------------------------------------- Comment(by davidsarah): Replying to [comment:20 davidsarah]: > However, I'm not sure that options B or C work for what we are trying to do. Actually there's a variant of B that will work: send the read cap in the form data. You would make an initial request to the gateway for a given read cap encoded in the URL, and would get back a stub page containing a form filled in with that read cap. If that form is POSTed to the gateway, it would respond with the real file. When POST is used, the URL of the latter would just be the URL of the form, which is not sensitive, so it doesn't matter whether it is leaked via Referer. This approach needn't depend on !JavaScript, but if you don't have !JavaScript the user would have to click a button to submit the form. (Is there a way to do that automatically on page load even if scripting is disabled?) Alternatively the server could set a cookie and have that cookie echoed back to it with an HTTP-Refresh, but that potentially introduces other cookie-related weaknesses and complications. In the case where the referring page is generated by the gateway (for example a directory listing), then that page can directly include a form for each file link, so there is no extra request or button click even when scripting is disabled. If you can depend on !JavaScript, you can combine this with Tyler's approach and put the sensitive part of the URL in the fragment, then have a script copy it to the form data. The difference is that because form submission is used instead of XMLHttpRequest, you can download arbitrary files rather than just displaying them. A disadvantage of using POST is that if the user goes back to an URL via the history, they will get a spurious prompt asking whether they want to resubmit form data. (Using GET won't work, because then the read cap would still be in the URL.) I think this is acceptable, though. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Thu Oct 29 16:09:51 2009 From: trac at allmydata.org (tahoe-lafs) Date: Thu, 29 Oct 2009 23:09:51 -0000 Subject: [tahoe-dev] [tahoe-lafs] #615: Can JavaScript loaded from Tahoe access all your content which is loaded from Tahoe? In-Reply-To: <037.419ca30d794f832cc443eabb14db75c3@allmydata.org> References: <037.419ca30d794f832cc443eabb14db75c3@allmydata.org> Message-ID: <046.c1963f7c97a9d19cd982e14de4ba7d4f@allmydata.org> #615: Can JavaScript loaded from Tahoe access all your content which is loaded from Tahoe? ---------------------------+------------------------------------------------ Reporter: zooko | Type: defect Status: new | Priority: critical Milestone: undecided | Component: code-frontend-web Version: 1.3.0 | Keywords: newcaps security Launchpad_bug: | ---------------------------+------------------------------------------------ Comment(by davidsarah): Replying to [comment:1 swillden]: > Another option is to use cookies. A cookie can also be made specific to a host/domain but also to a path. As I understand it (haven't tested), Javascript loaded from path A should not have access to cookies set specific to path B. If Tahoe were to set per-path cookies on first access to a path, then refuse later requests that don't include the right cookie, then Javascript from path B would not be able to successfully load URLs on path A, because it wouldn't have the cookie. > There are numerous downsides to the cookie approach ... Yes. The following paper (which is essential reading for this ticket) explains why this can't work from a security point of view: * Beware of Finer-Grained Origins * Collin Jackson and Adam Barth * In Web 2.0 Security and Privacy. (W2SP 2008) * http://crypto.stanford.edu/websec/origins/fgo.pdf * "Cookie Paths. One classic example of a sub-origin privilege is the ability to read cookies with "path" attributes. In order to read such a cookie, the path of the document's URL must extend the path of the cookie. However, the ability to read these cookies leaks to all documents in the origin because a same-origin document can inject script into a document with the appropriate path (even a 404 "not found" document) and read the cookies. This "vulnerability" has been known for a number of years ... This vulnerability was "fixed" by declaring the path attribute to be a convenience feature rather than a security feature." -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Thu Oct 29 16:46:47 2009 From: trac at allmydata.org (tahoe-lafs) Date: Thu, 29 Oct 2009 23:46:47 -0000 Subject: [tahoe-dev] [tahoe-lafs] #127: Cap URLs leaked via HTTP Referer header, and to phishing filters (was: Cap URLs leaked via HTTP Referer header) In-Reply-To: <038.4be7d9a2c1f609ac6979edd7f7ad78e6@allmydata.org> References: <038.4be7d9a2c1f609ac6979edd7f7ad78e6@allmydata.org> Message-ID: <047.bfac030a5664bae1ccdb044e6acb440d@allmydata.org> #127: Cap URLs leaked via HTTP Referer header, and to phishing filters -------------------------------+-------------------------------------------- Reporter: warner | Owner: Type: defect | Status: new Priority: major | Milestone: undecided Component: code-frontend-web | Version: 0.7.0 Keywords: security | Launchpad_bug: -------------------------------+-------------------------------------------- Comment(by davidsarah): We don't appear to have a separate ticket about cap URLs leaking to phishing filters. Let's consider this ticket to cover both issues, since it is possible that the same solution could work. If we depend on JavaScript, I think that the fragment + form technique in comment:21 solves both. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Thu Oct 29 21:00:37 2009 From: trac at allmydata.org (tahoe-lafs) Date: Fri, 30 Oct 2009 04:00:37 -0000 Subject: [tahoe-dev] [tahoe-lafs] #456: it would be nice if the dependency on OpenSSL could be automatically resolved In-Reply-To: <038.5c54833922825e44383408f7f09f9a49@allmydata.org> References: <038.5c54833922825e44383408f7f09f9a49@allmydata.org> Message-ID: <047.255ddf76820d3574596599bf6cc2903a@allmydata.org> #456: it would be nice if the dependency on OpenSSL could be automatically resolved -----------------------------+---------------------------------------------- Reporter: warner | Owner: cgalvan Type: enhancement | Status: closed Priority: major | Milestone: eventually Component: packaging | Version: 1.0.0 Resolution: fixed | Keywords: test easy Launchpad_bug: 238658 | -----------------------------+---------------------------------------------- Changes (by cgalvan): * status: reopened => closed * resolution: => fixed Comment: Sorry for the delay on this one, I tested both eggs out today and I was able to run some basic scripts that use pyOpenSSL so I think it is safe to close this ticket. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Fri Oct 30 02:46:32 2009 From: trac at allmydata.org (tahoe-lafs) Date: Fri, 30 Oct 2009 09:46:32 -0000 Subject: [tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better In-Reply-To: <037.6c53986cb63378fa501159cafeed32e6@allmydata.org> References: <037.6c53986cb63378fa501159cafeed32e6@allmydata.org> Message-ID: <046.cc36675289793dddf42ce840e287afd7@allmydata.org> #778: "shares of happiness" is the wrong measure; "servers of happiness" is better --------------------------------+------------------------------------------- Reporter: zooko | Owner: kevan Type: defect | Status: new Priority: critical | Milestone: 1.6.0 Component: code-peerselection | Version: 1.4.1 Keywords: reliability | Launchpad_bug: --------------------------------+------------------------------------------- Comment(by kevan): I'm uploading the new tests. I'll need to modify them again when we define a way to give the Encoder more information about which servers have which shares. It occurred to me when writing that test that my patch to Encoder doesn't do what I thought it did when I wrote it. Specifically, since self.landlords in an Encoder is simply a list of IStorageBucketWriters (I'm not sure what I thought they were, but it wasn't that), doing set(self.landlords) doesn't really do anything (let alone give a good value for the number of servers that are left). So I need to think of a better way of doing that, too. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Fri Oct 30 16:58:58 2009 From: trac at allmydata.org (tahoe-lafs) Date: Fri, 30 Oct 2009 23:58:58 -0000 Subject: [tahoe-dev] [tahoe-lafs] #607: DIR2:IMM In-Reply-To: <037.0e9192ebb26773cd298be6a310ff8885@allmydata.org> References: <037.0e9192ebb26773cd298be6a310ff8885@allmydata.org> Message-ID: <046.064b6737c3dffac042ac830466899497@allmydata.org> #607: DIR2:IMM ---------------------------+------------------------------------------------ Reporter: zooko | Owner: warner Type: defect | Status: new Priority: major | Milestone: undecided Component: code-dirnodes | Version: 1.2.0 Keywords: newcaps | Launchpad_bug: ---------------------------+------------------------------------------------ Comment(by warner): I'm about 80% done with immutable directories. The current work is to add {{{URI:DIR2-CHK:}}} and {{{URI:DIR2-LIT:}}} to the set recognized by {{{uri.py}}}. (I'm planning to use CHK because the rest of the arguments are exactly the same as {{{URI:CHK:/URI:LIT:}}}). An ideal cap format would make the wrapping more explicit, like {{{tahoe://grid-4/dir/imm/READCAP}}} and {{{tahoe://grid-4/imm/READCAP}}}. The next few steps are: * modify nodemaker.py to recognize the new caps and create immutable Filenodes for them and then wrap them in Directorynodes (this handles the read side) * add {{{nodemaker.create_immutable_directory(children)}}} to pack the children, perform an immutable upload, then transform the filecap into a dircap. (this handles the write side) * tests for those * new webapi (probably {{{POST /uri?t=mkdir-immutable}}}) that takes a JSON dict in the children= form portion: docs, tests, then implementation * done! Along the way, I plan to change "tahoe backup" to use t=mkdir-with- children (which will speed things up a lot, but still create readcaps-to- mutable-directories). Then, once this ticket is closed, I'll change it again to use t=mkdir-immutable. Incidentally, yeah, I think that a form of "cp -r" that creates an immutable deep copy of some dirnode would be a great idea. Maybe "cp -r --immutable" ? Likewise, it might be useful to have "cp -r --mutable", which explicitly creates mutable copies of everything being copied (at least of the dirnodes). The default behavior of "cp -r" should be to re- use immutable objects. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Sat Oct 31 23:08:46 2009 From: trac at allmydata.org (tahoe-lafs) Date: Sun, 01 Nov 2009 06:08:46 -0000 Subject: [tahoe-dev] [tahoe-lafs] #680: Fix for mutable files with FTP In-Reply-To: <042.38317dcabcaa61691a9b128104891afd@allmydata.org> References: <042.38317dcabcaa61691a9b128104891afd@allmydata.org> Message-ID: <051.5bde2c21a32f539daf073a564139bb07@allmydata.org> #680: Fix for mutable files with FTP ---------------------------+------------------------------------------------ Reporter: frozenfire | Owner: warner Type: defect | Status: assigned Priority: major | Milestone: 1.6.0 Component: code-frontend | Version: 1.3.0 Keywords: | Launchpad_bug: ---------------------------+------------------------------------------------ Comment(by davidsarah): FTP is a horrible protocol, and grokking its specification (RFC 959) is not helped by the fact that it has never been streamlined and updated for a world where bytes are always 8 bits and files are flat. In any case, the relevant part of RFC 959 seems to be section 3.4, which says > The following transmission modes are defined in FTP: > 3.4.1. STREAM MODE > If the structure is a file structure, the EOF is indicated by the sending host closing the data connection and all bytes are data bytes. It is very unlikely that the FTP server library is using anything other than STREAM transmission mode with a file structure ( http://cr.yp.to/ftp/type.html says the other options are all obsolete). If so, then the tested clients are nonconformant in treating anything other than a close of the data connection as end-of-file. http://cr.yp.to/ftp/browsing.html seems to suggest that browser ftp clients do actually look for connection close to signal EOF -- so I'm stumped. -- Ticket URL: tahoe-lafs secure decentralized file storage grid From trac at allmydata.org Sat Oct 31 23:34:45 2009 From: trac at allmydata.org (tahoe-lafs) Date: Sun, 01 Nov 2009 06:34:45 -0000 Subject: [tahoe-dev] [tahoe-lafs] #827: Support forcing download using "Content-Disposition: attachment" in WUItre Message-ID: <042.06727b77b1f62d7018d16586c7f48d19@allmydata.org> #827: Support forcing download using "Content-Disposition: attachment" in WUItre --------------------------------+------------------------------------------- Reporter: davidsarah | Owner: nobody Type: defect | Status: new Priority: major | Milestone: undecided Component: unknown | Version: 1.5.0 Keywords: security usability | Launchpad_bug: --------------------------------+------------------------------------------- Typical behaviour of web browsers for a retrieved over HTTP[S], is to decide based on its inferred MIME type whether to display it, pass it to some plugin/helper app, or treat it as a download (either pass it to a download manager or prompt the user to save it locally). The inferred MIME type is arrived at by rules that '''no-one''' understands; the HTTP spec says that the HTTP Content-Type takes precedence, but browsers deliberately violate that in some cases, and the result can even vary nondeterministically depending on network delays. Sometimes it's useful to override this behaviour and force the browser to treat the file as a download, regardless of its inferred MIME type. This can be done using the Content-Disposition HTTP header, e.g. "Content- Disposition: attachment; filename=suggested_name". (Well, not actually force, but strongly hint. I think all the common browsers now implement this; IE has had some bugs http://support.microsoft.com/kb/279667 but I think they are all pre-IE6.) Content-Disposition is specified in: * http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.5.1 * http://tools.ietf.org/html/draft-reschke-rfc2183-in-http-00 (The "very serious security considerations" mentioned in http://www.w3.org/Protocols/rfc2616/rfc2616-sec15.html#sec15.5 are scaremongering. A server can't do anything with {{{Content-Disposition: attachment}}} that can't also be done via the MIME type.) To close this ticket, * add an optional parameter to the webapi that will force a download. * modify Tahoe directory listings to include 'download' and 'view' links. Note that if the most prominant link for a non-directory file was the download link, that would allow the WUI to be used from a browser (with some care on the part of users) without necessarily being vulnerable to the same-origin attacks described in #615. Users would only be vulnerable to these attacks if they click the view link, or if they view the downloaded local files in a browser that treats all file URLs as same- origin (or has some other related bug such as http://www.mozilla.org/security/announce/2009/mfsa2009-30.html ). -- Ticket URL: tahoe-lafs secure decentralized file storage grid