#662 new enhancement

add an option for "tahoe manifest" to not skip duplicates, or a --recursive option to "tahoe ls"

Reported by: warner Owned by:
Priority: major Milestone: undecided
Component: code-dirnodes Version: 1.3.0
Keywords: tahoe-manifest cycle Cc: kyle@…
Launchpad Bug:

Description (last modified by daira)

My current job involves tools which modify a directory tree [...], and I'd like to use "tahoe manifest" to compare the before- and after- trees to make sure they're the same. Unfortunately, "tahoe manifest"'s cycle-avoidance code (which simply ignores files or directories that it's seen before) is causing me trouble, since an object that's referenced by multiple places in the tree will appear in the manifest output at only one of them, and that location will depend upon the traversal order. (I just pushed a patch to make deep_traverse at least sort the child names before walking them, so it should now be consistent).

I'm thinking that it might be nice to have a flag to "tahoe manifest" that tells it to not supress duplicates like this. The cycle-avoidance code would need to change: instead of keeping a set of nodes that have already been visited, it should just keep a list of the ancestors of the current node. A cycle should be declared if the child node we're considering entering appears on its own ancestor list.

It might also be useful to have two sets of stats: one that includes shared objects, and one that does not.

Change History (5)

comment:1 Changed at 2009-03-13T18:35:38Z by warner

Oh, I should mention that partly this is the result of changing goals/definitions of "tahoe manifest". Originally, it was intended purely as a set of verifycaps: the idea being that you'd compute your manifest and then hand it to a separate Verifier service, which would take responsibility for checking up on all of them. It was also the intention that a verifycap be usable as a repaircap, so the Verifier service could be a Verifier/Repairer? service. In these cases, we don't care about duplicates: we just want the minimum-size set of verifycaps, and it doesn't matter what path or paths were used to store each one.

Later, "tahoe manifest" acquired path information, because that made it easier to backtrack and find a parent directory for any object which was later found to have problems. About this same time, the definition of "manifest" started changing, and now we sort of think about is as a list of (path,cap) tuples.

So maybe we need to be more clear about our definitions, and perhaps create a separate API for each one.

Incidentally, the cycle handling code on the "list of (path,cap) tuples" API could respond to cycles by emitting a special marker: (type="cycle", cap), and maybe include otherpath= too. The program which is receiving the manifest could conceivably use this information to stitch together the cycle somehow.

comment:2 Changed at 2010-03-25T04:53:01Z by davidsarah

  • Keywords tahoe-manifest cycle added

comment:3 Changed at 2013-09-02T17:33:32Z by kmarkley86

  • Cc kyle@… added
  • Description modified (diff)

I tried using manifest as a sort of recursive ls, and immediately ran into this issue that it wasn't showing duplicates. Unless there's recursive ls behavior available somewhere else, it would be great to fix this.

comment:4 Changed at 2013-09-04T20:23:39Z by daira

  • Description modified (diff)

comment:5 Changed at 2013-09-04T20:35:36Z by daira

  • Summary changed from change "tahoe manifest" to not skip duplicates to add an option for "tahoe manifest" to not skip duplicates, or a --recursive option to "tahoe ls"
Note: See TracTickets for help on using tickets.