eno writer

005 - document versioning / join semilattices

eno writer

Document Versioning

"Contract v5 final final 2.docx". If you spend a lot of time working with documents, you have probably witnessed these filenames in the wild. You have also probably witnessed the ensuing confusion over which version is the actual final version and, in the worst case, had a crisis when the wrong version of a document was submitted externally at the wrong time.

Programmers spend all day working with documents. Of course their documents are filled with code instead of prose, but the similarities with business document workflows are surprising. Programmers work together on teams. They review and edit each others work. On larger teams, only certain programmers are allowed to accept changes into the primary version of the documents. Often many documents are highly interrelated such that an edit in one requires edits in others. Tiny edits to a document can have massive consequences. The list goes on. Hopefully you can see some parallels with your own document workflows.

Unlike the business world, programmers have been building sophisticated version control systems for over 50 years. While the business world is still saving "Contract v5 final final 2.docx", versioning of code documents is essentially a solved problem.

The solution that most programmers have converged on is a version control system called Git. To use Git, you start by turning a folder of documents into a Git Repository. Once you do this, every change to every document in that folder and its subfolders will be tracked by Git. Now you can freely edit your files. When you next look at git, it will tell you exactly what files you have changed and how. When you are ready, you can bundle your changes into a Git Commit along with a message describing the purpose of your changes. If you are working on a team, you can push your commits to a remote copy of the repository to share it. You will also be notified if there are new commits there from your collaborators and prompted to pull them down. If your changes conflict with someone else's, git walks you through the conflicts and asks you how you want to revise them.

So why can't business people just use Git? There are a few technical challenges. First, Git cannot effectively show changes between Word documents - it only works with plain text documents (think of a .txt document you would make in Notepad). Second, Git is a fairly complicated system to learn. There are a lot of concepts that many programmers take years to master. The learning curve would likely be impossibly steep for a lot of people who don't take naturally to technical concepts.

My big question is - if all of these technical problems were solved with the wave of a magic wand, would Git actually be the right solution for the business world's document woes? Why or why not?

Join Semilattice

One day, when Christopher Robin and Winnie-the-Pooh and Piglet were all talking together, Christopher Robin finished the mouthful he was eating and said carelessly: "I saw a Heffalump to-day, Piglet."

"What was it doing?" asked Piglet.

"Just lumping along," said Christopher Robin. "I don't think it saw me."

"I saw one once," said Piglet. "At least, I think I did," he said. "Only perhaps it wasn't."

"So did I," said Pooh, wondering what a Heffalump was like.

"You don't often see them," said Christopher Robin carelessly.

"Not now," said Piglet.

"Not at this time of year," said Pooh.

This week I had coffee with a friend who has done a tremendous amount of work on text editing data structures and protocols (checkout Automerge). We were talking about a data structure for storing text and he explained it worked like a git repository's commit history then paused and said "it's a join semilattice." I felt a bit like Winnie the Pooh as I nodded my head while wondering "what's a join semilattice". My friend intuited my confusion and offered "it's just a math thing that means git repository". Crisis averted - just another one of the countless math things I missed while I was doing an undergrad in English.

In any event, I went home and watched a video on join semilattices and I kind of get them and they do kind of help me better understand what a git commit history actually is. I didn't really want to write about join-semilattices though. I wanted to write about this because it's not infrequent that I encounter some technical term that turns out to precisely describe something I have been doing unwittingly. This can often be quite rewarding because if I start typing my newly discovered term into Google I discover all sorts of smart people writing papers and blog posts on the same problems I am trying to solve.

On the other hand, I worry new terminology sometimes slows my thinking down. In a world where I know nothing, I look at a technical problem from first principles and just do whatever I need to do to solve it. When I approach a problem with a library of terminology for various solutions I sometimes switch into pattern matching mode. "What I need here is a join semilattice!" This can shortcut some problem solving when I'm correct, but sometimes it can draw me into a spiral of complexity as I try to glue together concepts that seem correct but don't exactly suit my problem.


If you liked this post, please consider sharing it with a friend.

Powered by Buttondown.

We also have an RSS feed