25 November, 2008

Think About It!

Disclaimer: My apologies to the fine people who, unlike me, actually write source control systems.

If I asked you to write a source control system, you might say: "Pff. That's easy dude! Just store a copy of the document every time make a change. When you need a particular revision, you just access that copy of the document". And although your system would work, I'd say: Think about it!

Storing a full copy of the document for every change is wasteful; revision changes are small: a line of text here and a line of text there. Why would you store a full copy of the document just for a small change?

At this point you may be tempted to say: "Well, then just store the original document, and store the changes as small delta files. We can then apply the deltas to get you any revision you want.". This, of course, is a much better solution as far as storage is concerned; however, I'd still say: Think about it!

Under normal circumstances, users access the most recent revision of a document far more often than any other revision. Furthermore, storing the original document and applying all of its changes is computationally expensive. It would then seem that having to apply all these deltas for our most common operation is a bad idea. Seeing this, how can we further improve our syste?

Well, how about we store the most recent version of the document instead of the original? This would mean we would have to store deltas to take us all the way back to the original version, but that's OK - it's not much different than what we were thinking about doing before. However, with this change, we can now perform our most common operation (return the current version of a document) in constant time. Also, the expensive operation (returning an older version of the document) now occurs on rare occasions. Better, huh?

And now that I'm out of ideas for our source control system, I'm going to go back and "Think about it!", a little bit more, because I'm sure there's still lots of room for improvement.


Mike Murray said...

So here's a question to spark further conversation:

What about branches in the source files? Does your persistence model still stand, or does it need slight alteration?

Esteban Araya said...

@Mike Murray:

I really don't know. I think the problem of applying fixes to both branch and trunk code is really hard.

Post a Comment