The Lost Art of Documentation
Published on Apr 1, 2024
10 min read
Prelude
You are working on a software project. You want this project to stand the test of time.
What does that mean?
There are many definitions and resources out there about what makes a codebase “good”. Several references are listed at the end of the article. There are however things surrounding the codebase that influence the success of a project, such as documentation.
The topic is subjective, with each source garnering both adept followers and haters. As The Engineer, I would generalize by saying that code should be reliable and maintainable by not just yourself.
Without going into too many details:
Reliable
The code does what you expect it to do. It is resilient to erroneous input and unexpected errors. It is well tested.
Maintainable
Code is self-documenting and follows the best practices of the chosen programming language. It is readable and just makes sense.
…not just yourself
There are as few “wtf” moments as possible.
Sorry, what?
Everyone reading this, at some point in time, must have gone through code and exclaimed “wtf!“. Sometimes it even was their own old code! There is even a metric for it:
If it can happen with your own code, imagine someone else first reading it. The goal here is to keep these “wtf” moments below a chosen boundary.
Good vs Bad Documentation
Here we focus on the last two points, namely maintainable by not just yourself. The point here is to handle such scenarios:
- you come back to a project months later to make a change; for some, even a few days might be enough to forget the context of that code
- you have a new team mate in the project; do you really want to spend countless hours on “Intro to Project X” calls and context-switching all the time due to questions on Slack?
- you write new code that needs to pass peer review; both small and big changes need to be understandable
- you need to handover this project to another team; maybe it transitions to the “business as usual” phase
The most natural approach is to have some level of documentation in the project.
Context and Intent
Context is important. One needs a certain level of understanding of the domain in which the code is written, be it in a tech-first company, such as Google, or in “non-tech”, such as pharma or banking. With that understanding, one may already associate certain parts of the codebase to certain business use cases.
Behind all lines of code there is an intent. It may originate as a business requirement, or just be a solution to a technical issue. Code is written, after all, in order to do something.
Within a given context the programmer has an intent to do that something, for which he/she writes code. It is exactly when the intent is not clear that the “wtf” moments arise. However, they may also arise when the context is not understood. Some simple questions for the latter would be: “why are we doing this?” or “how does this help the users?“.
Bad Documentation
So let’s go through some examples. It is important to note that they are subjective; this is simply The Engineer’s view on the topic.
Missing
This is fairly obvious. Even on very small projects, there needs to be a way to understand the context of the project. One may ask a colleague, but what if there is nobody around? Maybe the author does not even work there anymore! One should not need to go through presentation recordings to piece together these details.
When reading the implementation, the intent should be clear. When something is done unexpectedly, the “wtf” moment should immediately be quenched by a comment explaining the “why?“.
When these are missing, one is simply stuck looking for support.
No overall introduction
Context should be obtained via an introduction to the project codebase. While this may come verbally from a colleague or some recorded walkthrough, they are often incomplete and out-of-date.
The reliance on someone else being present is not desirable. Readers here might recall their former “indispensable” colleague that knew everything and how it felt when they were gone.
Multiple locations
At this point, there is documentation, so it might seem like a step forward. However, your team mate just shared with you a link to seemingly the same thing. Actually, you notice differences.
Oh this is the newer version!
He says, leaving you scratching your head on why the code does not seem to match the documentation.
Oh, wait, this is actually the plan for the next version. Here is the right link!
You request access only to find much later that the code actually has more details.
I am sure you have been through such scenarios. As soon as things are documented in multiple places*, there is a risk they will run out-of-sync and out-of-date.
*It is important to note here that there are moments when there is no alternative to having multiple versions of documentation, such as a version for the developers followed by a non-technical version for users.
No architecture overview
Most projects are not simple. They come with complex and always-changing business requirements, along with technical decisions which made perfect sense at that point in time. They contain many components, written by several teams, and done with specific technologies deemed best at those tasks. People don’t always have the same technical backgrounds and we all know how fast-changing is today’s technical landscape.
Going through code is a bottom-up approach which misses the overview and can easily lead one on a long path to finding the answer, when perhaps the answer is quite simple once the overall structure is understood.
Incredibly long and verbose
Were you ever happy to find that there is documentation in a project, only for a split second later to notice that it is 100s of pages long?
But hey, at least you can search
Fair, except when dealing with generic terms valid in multiple contexts. But I digress.
Let’s even assume it is well structured and searchable; you are still stuck with 100s of pages. You likely don’t care about the whole reasoning behind the project, all changes brought by each new version, or the overall architecture and design docs, if you’re just looking for something short and practical: “why is this here?“.
You also don’t need long and formal language that reminds you of application terms and conditions, nor the sales pitch of why the project is great. Instead, just something concise to answer the question.
Critical details only in the comments
Recall the search scenario above. Many times there are hundreds of results returned, with only 1 or 2 really relevant. This may not only be a tool problem, with many wiki’s having poor search capabilities, but also something present in the codebase.
How many times have you tried to use an API, only to get a generic 4xx response, launching you into a deep dive into the internals of the API, to find out hours later, a short comment detailing an extra restriction on the API arguments. And this is a good scenario, assuming there is a comment! You might instead be faced with just hundreds of lines of code.
Out of date
This one applies to all the examples above, though it deserves its own category. No matter how good, detailed, and well structured documentation is, as soon as its out-of-date with the actual codebase, it drastically loses its purpose. People don’t trust it anymore, so they even end up skipping it altogether.
This is exactly how you end up with emails and pings on Teams, because it turns out that it is faster to just ask someone and obtain the answer to your question. However, all this time is wasted. You context-switch, lose focus, and are no longer as efficient in your work. It is a slippery slope into putting extra hours just to recover that lost time.
The Engineer’s Tips
First, it is important to mention that no matter how good the documentation is, it does not replace a good codebase. Treating your own project as a black box is not a good situation to be in, since sooner or later you’ll need to make a change, and the whole understanding of the project comes crumbling down.
One might argue that good documentation is simply the opposite of all the bad examples above; while fairly accurate, there are of course pitfalls. After all, it is a matter of costs, both literal $$$ for the business, but also time itself. Answering these questions should be the first step into clarifying your documentation needs:
- is the project stable enough that the documentation will stay valid for a longer period of time?
- will writing it be a worthwhile investment as opposed to developing a new feature?
- will it be used by others, not just your team?
- will updating it be part of the developer flow?
Summary
So how can one create this mysterious good documentation then?
- keep it concise and to the point
- avoid elaborate phrasing
- comment the “why”, as the code should show the “how”
- keep it up-to-date
- have a single source of truth
- use a tool to derive the presentation for different audiences
- present the architecture top-bottom
- version not only the code but also the documentation, if relevant
Addendum
Why do you say it’s a lost art?
Perhaps it is just a personal rant, however most projects I have been involved in were lacking in this department. There was either too little documentation, or way too much and unstructured, which led to a lot of “wtf” moments, delays, and time wasted that kept engineers away from actual project work. It seems like a larger trend of purely focusing on constantly delivering new features (in both big and small companies!) at the expense of headaches to those that need to keep those same systems running.
Hey, I don’t know how to apply those tips. Could you elaborate?
Each bullet point could end in an entire blog post :) I may delve into these topics further in the future, however the references below do serve as a good starting point. A current example of good documentation might be the Spark Python Documentation. Do check also the references for other great takes on this topic!
What a load of rubbish. Maintaining documentation takes so much time away from actual work!
Yes, it does take time. It may not always make sense to invest in it, for example in startups or proof of concept projects. Here, however, we’re talking about the projects that last for years and are meant to still function after many iterations and reorgs.
Spending time on documentation very early on would indeed not make sense, though as soon as there are users (even if just other teams!), the earlier the better and it will pay dividends in the future. Now arguing this to the business, is a whole other topic.
Doesn’t ChatGPT solve all of this?
It might :) There certainly are attempts to make use of it for generating documentation. Searching is in principle also possible. However, it is not yet a plug-and-play solution that will solve everything presented here.