FWIW, this has gotten a certain amount of attention on HN over the years since i...

l- · on Jan 11, 2020

Rust or any procedural language seems not to be the most apt lang to do a first implementation / rewrite of a patch theory. Patches themselves have already been generalized to paths using HOTT ref: https://www.cs.cmu.edu/~rwh/papers/htpt/paper.pdf. The takeaway from https://arxiv.org/abs/1311.3903 is the use of commutation to allow for push out while the HOT ref discusses pseudocommutation regarding merges. See the footnote about commutation being inaccurate term regarding darts model. With the addition of ojo "ghost" deletes and pseudo edges, a formal implementation should provide a non-exponential model/algorithm amenable for optimization.

dan-robertson · on Jan 12, 2020

I tried to read that HOTT paper. I should preface this by saying that my homotopy type theory knowledge is basically 0 and my topology/homotopy/algebra knowledge is limited and rusty. Here are some notes I took as I read it:

1. I guess I’m starting off not understanding the relevance of this paper to distributed version control. If you have a type R for states of your repository and patches are then paths of type a =_R b, with certain properties of patches being certain path homotopies, surely many functions giving you a path are somewhat boring (because a path should exist between any two repositories). But maybe the point will be proofs about the equality and construction of those paths. Or maybe it is to get functoriality for free.

2. In this formalisation patches are paths so they have to form a groupoid but if you want a distributed version control system, I think you don’t want inverses of patches. You don’t want p . p^-1 = 1 because you want properties like repo state = (in some sense) the composition of all patches in a set, and if patches form a groupoid then the sets {p, p^-1} and {p, p^-1, p} are equal (ie there isn’t a way to distinguish “apply p” from “unrevert p” and you need some way to distinguish them if you want to be distributed. But maybe there is some way to work around this.

3. How to think about apd? If I think of B like a fibration of A then I guess it makes sense but it seems weaker than that. I don’t really understand why PathOver should depend on p (and not f too)

4. Perhaps the paper aims to construct a patch theory where any two patches are equal (which I guess means homotopic) under their laws. My intuition of a patch would be that the answer to this is yes but I’m not confident. Maybe that isn’t what the paper is about.

5. Well I’ve heard a bunch about the univalence axiom so I’m glad to finally get a definition of it.

6. In 2.4.1 the definition of reversing seems wrong to me. It seems like it only gets the right types because loop has the same type as refl. It feels like it’s actually splitting up a path into its individual loops, reversing them, then putting them together in the same order. On the other hand I think I candefine a type like a : I ; b : I ; i : a = b, and then write rev a = b; rev b = a; ap rev i = ! i. And I suppose one could define a circle type with two points and two paths and it would look ok. So maybe the loop example doesn’t matter because it’s all equivalent no matter how it feels.

7. Describing patch equivalence as equivalent affect on the repo isn’t clear to me: is the repository just the (visible) contents of the files or does it also include hidden state related to deletions/merges/applied patches. I’m guessing this paper needs it to be the former.

8. It would have been nice if the paper stuck to one composition order but whatever

9. My concerns about reverting are slightly allayed by the paragraph about being careful with contexts but only slightly because of the following paragraph about coincidence. I guess invert ability is fine so long as no one cannot actually commit a patch and its inverse

10. Perhaps HTT gives a better framework for defining and proving things about pseudocommutation. Maybe that is the point of this paper. I would guess (without having read the paper) that that isn’t particularly well formalised in category theory (but formalising a merge as a push out does seem foot to me).

11. Ah so inverses + pseudocommutation laws gives you merges. That seems pretty nice. Does this let one define pseudocommutation in the categorical approach? I guess if you have inverses and want the pseudocommutation of A -f-> B -g-> C, you compute the push out of the span A <-!f- B -g-> C as A -h-> D <-k- C and then say the pseudocommutation is (h,!k). But my (vague) understanding of that theory is that it wouldn’t have nontrivial inverses

12. At the end of sec 3, I feel like I’m left wanting pseudocommutation better defined. They say how their definition falls short but they don’t say they will improve it. Currently it feels like they could go on to prove a bunch of nice properties about version control systems which are allowed to resolve merges by reverting the conflicting packages. Fingers crossed everything will make sense soon.

13. The “topological meaning” paragraph seems weirdly written. If the reader knows what a universal cover is then it should say “this is the universal cover of the circle” and then provide some actual topological colour to the description. And if the reader doesn’t know what a universal cover is then this information won’t add anything. Might as well have said “this is called the fubar of the circle” and had the same effect.

14. I would have rather had a paragraph about universality than a weird one-liner about fundamental groups

15. In sec 4.2 their patch types are all the same. Are we skipping some difficulties because of this?

16. In sec 5, why do the patches not first require that the document contains the strings they care about? I guess that isn’t necessary.

17. Why no homotopy that ! s<>t@i = s<>t@i or that <> is symmetric? Maybe it’s not used. But it feels like the authors are admitting a more complex universal cover than they intend from the semantics they give by doing that.

18. I’m left somewhat wondering what the point is. I would say that the motto of the category theory paper which I haven’t read but which pijul is somewhat related to would be “patches form a category and merges are push outs. You can use category theory to extend the category of patches into one where all push outs exist.” For this paper I guess it’s something like “homotopy type theory gives you some tools for working with groupoids. Here it is applied to patches.” But otherwise I’m not really sure what the relevance is to the implementation or correctness of a dvcs is. They don’t really provide a tool to show that your pseudocommutation function does what it’s supposed to do and so you don’t know that your merges work either. I guess if you could prove that your merges are push outs, does that imply the pseudocommutation is correct? With the category theory paper it at least gives you “correct” merges and a good idea for how to think about patches.

feanaro · on Jan 12, 2020

Regarding 2, why do you need the ability to distinguish these cases? After all, those two repositories are in the same state, which is that p is applied. The only reason you'd want to distinguish is if you want to preserve history. What about the distributed scenario makes it special?

dan-robertson · on Jan 12, 2020

Because a DVCS has to be distributed. If Alice writes patch p, Bob reverts it, and Charlie unreverts it, you might (without contexts) have three people with different states of the repo:

  Alice: {p}
  Bob: {p, !p} (or do you write this as {}?)
  Charlie: {p, !p, p} = {p, !p}? So I guess this has to be {p}

Suppose these people are pushing and merge if in patches to the master. Then it has no way to distinguish these scenarios:

  A:
  1. Alice pushes; master applies p
  2. Bob pulls, reverts p, pushes. Master reverts p
  3. Charlie pulls, reapplies, pushes. Master does ??

  B
  1. Alice pushes; master applies p
  2. Charlie pulls
  3. Bob pulls and reverts; master reverts p
  4. Charlie pushes

In both cases, the state of Charlie’s repo is the same: {p}, but in case A, p should be applied and in case B, it should not. Therefore there needs to be some additional context (or no applying a patch and it’s inverse).

In other words, this problem is basically having a distributed set with addition, deletion, and readdition of elements, which you can’t have in a nice way. A distributed set that you can only add to, on the other hand, is a lot more easy.

macintux · on Jan 12, 2020

CRDTs have to tackle this sort of problem as well.

dan-robertson · on Jan 12, 2020

Right. And a CRDT for a set you can add and remove from is either hard or weird for this reason.

Also I think you don’t actually want your repo to be a CRDT because a CRDT resolves all conflicts and that would mean merge conflicts get resolved in an arbitrary way (leading to unexpected results and bad code). Maybe you could say that merge conflicts don’t count as conflicts in the CRDT sense of the word but that just feels like you’re abusing the notion of what a CRDT is.

afiori · on Jan 19, 2020

I don't know how good of an idea CRDT would be here, but from the pijul perspective the CRDT layer should not possibly create a conflict, in the same way a `git fetch` should never cause a conflict (assuming well behaved hashes).

The CRDT layer would produce a set of patches in an unmergeable state and then pijul would have to fix it; just like git.

l- · on Jan 13, 2020

1. "not understanding the relevance of this paper to distributed version control"; 18. "what the point is.": On page 11: Our contribution [..] is to present patch theory in a categorical setting that is also a programming formalism, so it directly leads to an implementation.

2.: Patches are (homotopically) equivalent to paths and form <bold>inductive</bold> groupoids. (pg 11) "In homotopy type theory the path space of every type is symmetric, and to fit patch theories into this symmetric setting, we either considered a language where all patches were naturally total bijections on any repository (Section 4 and 5), or used types to restrict patches to repositories where they are bijections" Inverses of patches arise due to these considerations. The use of the term distributed seems to be used in the sense of dependence that arises with pseudocommutation (pcom).

3. It might be inferred that apd = Action on Paths for Dependent function (f) <-> Action on inductive groupoids. (https://ncatlab.org/nlab/show/fiber+sequence#ActionGroupoid) So, a fiber sequence? Not sure where it is stated that PathOver does not depend on f (along with p and B)?

7. section 6 uses histories that are a quotient higher inductive type to equate sequences of patches, which result in the same changes to a file.

11. "Pseudocommutation gives us a merge operation that is well-defined, symmetric [..], and reunites the two branches of a span, but this is not enough to guarantee that we get the merges that we might expect" ..

12. "the correctness of pseudocommutation follows from the induction principles for paths that are proved for each type from the basic induction principles for the higher inductive types—roughly analogously to how, for the natural numbers, course-of-values induction is derived from mathematical induction. Moreover, proving these induction principles is sometimes a significant mathematical theorem. In homotopy theory, it is called calculating the homotopy groups of a space, and even for spaces as simple as the spheres some homotopy groups are unknown."

15.; 16. see section 6 for a patch language with add and remove, not just swap. Unfortunately, the authors were unable to provide a proof of the correctness of the proposed pcom (p,q) definition for it. If it is correct, it would be a more general patch language than the merges as pushout with ojo pseudo-edge proposal along with its assumptions / restrictions on deletions that computes patch application by having [multiple] Tarjan (2-SAT) run(s). Even so, not sure how efficiently the "length-2 suffix" pcom would work out. Reminder that dependency resolution is NP complete.

17. "In homotopy type theory the path space of every type is symmetric"

18. If merges could be assumed to be push outs or proved to be that would remove the ambiguity arising from pcom -> imply its correctness. pcom is defined for each patch language. Section 6.4 gives an outline on how to use patch laws to prove the correctness of the assumed pcom definition. The references also provide further details [18], [19], etc. The "category" paper ref treats merge as a pushout not as pcom based. pcom may be seen as a generalization of merges as pushouts. For another perspective on what is the same mathematical structure regarding Darcs see the Jacobson [15] reference that interprets patches as inverse semigroups, which are essentially partial bijections. Additionally, the Ehresmann-Schein-Nambooripad theorem states that inverse semigroups are equivalent to inductive groupoids.

NB I am not an author of any of the papers.

dan-robertson · on Jan 12, 2020

What does HOT stand for? It’s surely not higher order types but I can’t think of anything beginning with homotopy.

l- · on Jan 12, 2020

Abbreviation for Homotopy Type: https://homotopytypetheory.org/book/

eikenberry · on Jan 11, 2020

> risks of major rewrites

The risks are mostly in relation to your existing users. As this was more of a prototype, with no real user base, in my mind it fits more with the "throw the first away" line of thinking.

rstarast · on Jan 12, 2020

The combination of

> (2) thinking about the future and sustainability of the project, including its funding.

and the rewrite apparently not happening in the open casts some doubt as to whether the project will remain fully open source.

pmeunier · on Jan 12, 2020

> the rewrite apparently not happening in the open

I'm the author of the rewrite. It will almost certainly be open source when it's done, but it is not open right now, because:

- the license of the current version has not been properly observed by other projects in the past.

- this project keeps getting bashed for "corrupting repositories" and "changing formats constantly", even though it is announced as experimental everywhere (also, some corruption claims have been wrong in the past, and we have released converters every time the formats have changed).

Therefore, allowing people to test the new version as it is being debugged would probably not improve this situation.

rstarast · on Jan 12, 2020

Thank you, that's good to hear! Your arguments are valid.

Freak_NL · on Jan 12, 2020

> […] the rewrite apparently not happening in the open […]

That's a major red flag for a FOSS DVCS that aims to exist as a valid alternative for existing tools like git. Fine for a hobby project, portfolio project, or experiment of course.

otabdeveloper4 · on Jan 12, 2020

"v2 major rewrite, to make the code clean and stuff" is where dumb projects go to die.

(There's even a scientific name for this phenomenon.)