wiki:ServerSelection

Version 1 (modified by zooko, at 2009-04-21T19:43:53Z) (diff)

ServerSelection!

Different users of Tahoe have different desires for "Which servers should I upload which shares to?".

  • allmydata.com wants to upload to a random selection, evenly distributed among servers which are not full; This is, unsurprisingly, what Tahoe v1.4 currently does.
  • Brian has mentioned that an allmydata.com-style deployment might prefer to have the servers with more remaining capacity receiving more shares, thus "filling up faster" than the servers with less remaining capacity.
  • Kevin Reid wants, at least for one of his use cases, to specify several servers each of which is guaranteed to get at least K shares of each file, in addition to potentially other servers also getting shares.
  • Shawn Willden wants, likewise, to specify a server (e.g. his mom's PC) which is guaranteed to get at least K shares of certain files (the family pictures and movies files).
  • Some people -- I'm sorry I forget who -- have said they want to upload at least K shares to the K fastest servers.
  • Jake Appelbaum has said that he wants to specify a set of servers which collectively are guaranteed to have at least K shares -- he intends to use this to specify the ones that are running as Tor hidden services and thus are extra attack-resistant but also extra slow-and-expensive to reach.
  • Several people -- again I'm sorry I've forgotten specific attribution -- want to identify which servers live in which cluster or co-lo or geographical area, and then to distribute shares evenly across clusters/colos/geographical-areas instead of evenly across servers.

As I, Zooko, have emphasized a few times, we really should not try to write a super-clever algorithm into Tahoe which satisfies all of these people, plus all the other crazy people that will be using Tahoe for crazy things in the future. Instead, we need some sort of configuration language or plugin system so that each crazy person can customize their own crazy server selection policy. I don't know the best way to implement this yet -- a domain specific language? Implement the above-mentioned list of seven policies into Tahoe and have an option to choose which of the seven you want for this upload? My current favorite approach is: you give me a Python function. When the time comes to upload a file, I'll call that function and then use whichever servers it said to use.