| | 13 | ==== Brian says: ==== |
| | 14 | |
| | 15 | Having a function or class to control server-selection is a great idea. The |
| | 16 | current code already separates out responsibility for server-selection into a |
| | 17 | distinct class, at least for immutable files |
| | 18 | (source:src/allmydata/immutable/upload.py#L131 {{{Tahoe2PeerSelector}}}). It |
| | 19 | would be pretty easy to make the uploader use different classes according to |
| | 20 | a {{{tahoe.cfg}}} option. |
| | 21 | |
| | 22 | However, there are some additional properties that need to be satified by the |
| | 23 | sever-selection algorithm for it to work at all. The basic Tahoe model is |
| | 24 | that the filecap is both necessary and sufficient (given some sort of grid |
| | 25 | membership) to recover the file. This means that the eventual |
| | 26 | '''downloader''' needs to be able to find the same servers, or at least have |
| | 27 | a sufficiently-high probability of finding "enough" servers within a |
| | 28 | reasonable amount of time, using only information which is found in the |
| | 29 | filecap. |
| | 30 | |
| | 31 | If the downloader is allowed to ask every server in the grid for shares, then |
| | 32 | anything will work. If you want to keep the download setup time low, and/or |
| | 33 | if you expect to have more than a few dozen servers, then the algorithm needs |
| | 34 | to be able to do something better. Note that this is even more of an issue |
| | 35 | for mutable shares, where it is important that publish-new-version is able to |
| | 36 | track down and update all of the old shares: the chance of accidental |
| | 37 | rollback increases when it cannot reliably/cheaply find them all. |
| | 38 | |
| | 39 | Another potential goal is for the download process to be tolerant of new |
| | 40 | servers, removed servers, and shares which have been moved (possibly as the |
| | 41 | result of repair or "rebalancing"). Some use cases will care about this, |
| | 42 | while others may never change the set of active servers and won't care. |
| | 43 | |
| | 44 | It's worth pointing out the properties we were trying to get when we came up |
| | 45 | with the current "tahoe2" algorithm: |
| | 46 | |
| | 47 | * for mostly static grids, download uses minimal do-you-have-share queries |
| | 48 | * adding one server should only increase download search time by 1/numservers |
| | 49 | * repair/rebalancing/migration may move shares to new places, including |
| | 50 | servers which weren't present at upload time, and download should be able |
| | 51 | to find and use these shares, even though the filecap doesn't change |
| | 52 | * traffic load-balancing: all non-full servers get new shares at the same |
| | 53 | bytes-per-second, even if serverids are not uniformly distributed |
| | 54 | |
| | 55 | We picked the pseudo-random permuted serverlist to get these properties. I'd |
| | 56 | love to be able to get stronger diversity among hosts, racks, or data |
| | 57 | centers, but I don't yet know how to get that '''and''' get the properties |
| | 58 | listed above, while keeping the filecaps small. |