linh-ha-KN8W0Q8H3gI-unsplash
Web Development |

Strategies to Quickly Explore a New Codebase

Leonhard

October 22, 2024

As an agency, we get the opportunity to dive into existing codebases. Sometimes this is when we start supporting a project team to push a release over the finish line or maybe we are doing a project audit. However, these moments can be a little intimidating. How does this app work? What are the important files? What’s the code quality like? Where do I start?

In this post, we highlight four possible strategies and tools for exploring existing codebases easily. Usually, this takes less than 15 minutes. We hope these strategies will help you, too.

🚨 Before we dive in: Always approach a codebase with empathy. It’s easy to start judging on the style or quality of code. Keep in mind: you may not be aware of the full context of when and how the project was created. Also use the opportunity to explore the codebase together with a coworker. Two pairs of eyes spot more things. Talk aloud and you’ll see you can quickly learn from each other.

For the purpose of this post, we are looking at the fantastic app cal.com, “Scheduling infrastructure for everyone” as they say. Not only is it a great product but also an established open source project using some of our favourite packages -- perfect! Let’s dive in.

Getting started

First, we need the source code. Let’s clone the repository using the excellent GitHub CLI:

We quickly scan the README file.

You can immediately see a well-structured document which helps you get the information you need. Are you interested in development or deployment? There are sections for that. Especially the details to get the project up and running on your local machine are often overlooked. The document also contains helpful pointers to further information, like the contribution guide or the roadmap. Off to a great start!

💡 Things to look out for:

  • Did someone take time to list the steps to get the app up and running?
  • Does it state the purpose of the application? Do you understand it?
  • Does it point towards established conventions? These could be code or communication conventions.
  • Does it mention a design system like Storybook or similar? How does it look like?

Strategy 1: What programming languages is the project built with?

As we all know, lines of code aren’t a good measurement for anything. However, they can still tell us something about the structure of the project. This becomes especially useful when the overview can be grouped by file type or programming language. We like to use tokei for that. It’s a CLI tool written in Rust and insanely fast.

Immediately we can tell this is a non-trivial application with the mix of languages and roughly 500k lines of code. It seems like it uses SQL (so we assume migration files and relevant tooling), can be deployed using Docker and otherwise is primarily written in TypeScript. At this stage we are not thinking quality as none of these statistics tell us something about the contents of the files. We are merely trying to get a bird's eye view.

💡 Things to look out for:

  • What languages are present?
  • Is there any language I’m not familiar with? Can someone from the team potentially help me if I have questions?
  • Are there any code comments?
  • Is there additional documentation besides the README file? Take note of Markdown and HTML files.

Strategy 2: What’s in the package.json?

Let’s open the package.json file. We can easily tell this project is a Yarn monorepo with multiple apps and packages. There is an extensive list of scripts which help manage everything. Apparently, there is an “App Store” and an “Embed” feature, both of which sound interesting. We can see some nice tools for development like Prisma, Swagger and Storybook. For tests, it uses the holy duo of Vitest and Playwright. Developer experience bliss. ✨

As with any modern web project, we should be concerned about the dependency tree. Are there severe security issues in the production dependencies? How up to date are the versions?

Let’s take a look:

Using npx taze we can get an overview of outdated packages:

You can also run npx taze -r to recursively check all package.json files -- very helpful in a monorepo scenario like here. See the taze documentation for more details, also how to easily update your project dependencies.

Again, this output needs interpretation. But it can give us an idea about the health of the project.

💡 Things to look out for:

  • Are there scripts for linting and testing? You can use scriptlint. It’s a tool to lint the scripts section of your package.json, made by our own Moritz Jacobs!
  • Are there severe security issues in the dependencies? You can use npm audit.
  • How up to date are the dependencies? You can use taze.

Strategy 3: What’s the Velocity of the Project?

Let’s look at the Insights tab on GitHub, more specifically the contributors overview: https://github.com/calcom/cal.com/graphs/contributors

We can tell the project regularly receives over 50 commits on the main branch per week. Several different contributors are working on the code, always a good sign. Apparently development started on 7th March 2021. We can also see the GitHub Apps Crowdin and Kodiak are being used to help maintain the project.

Take a minute to click around the Insights tab. The Pulse section is also interesting to get an idea of the recent activity. The recent issues and pull requests can be important, too. But diving into this takes more time. Let’s move on.

💡 Things to look out for:

  • What’s the commit frequency?
  • How many contributors are there (in recent times)?
  • Which apps and bots are helping to maintain the project?

Strategy 4: What’s the Code Complexity like?

There are many metrics related to code complexity like cyclomatic complexity or more elaborate ones like COCOMO. One interesting approach we’ve come across is implemented in the package code-complexity. It combines two factors which help reveal the chunky bits of the application. First, it measures the complexity of the JavaScript code. It can do so using different strategies like number of lines, cyclomatic complexity or the Halstead complexity. Secondly, it measures the churn rate. This is calculated by how many times a file has been changed in the past (by looking at the git history). These two factors multiplied become a score:

Looking at the output reveals the core pieces of the project, like handleNewBooking.ts or the EventManager. Also zod is being used for validation -- something we encourage, too.

At this point, it should be obvious but we’ll say it one more time: the output needs interpretation. High complexity or high churn isn’t bad. Scheduling is an inherently difficult problem, hence it’ll require complex code. But for us to assess the structure and inner workings of the project these metrics can be very helpful.

💡 Things to look out for:

  • Which areas of the project can be found in the code-complexity report?
  • Are these areas covered by tests?
  • What are these files looking like?

Conclusion

As with any complex topic, we barely scratched the surface. We haven’t covered:

  • Looking into tests: How easily can I run the tests? Are there any?
  • Looking into bug tickets: Do new features regularly lead to new bugs? How are they addressed?
  • Looking into releases: How is the project deployed? Is there Continuous Integration (CI) or Continuous Deployment (CD) setup?
  • Looking into communication: Building software is about communication. How does the team communicate? Is there a decision log? How do Pull Requests work?
  • Static analysis tooling: Products like SonarQube can help guide you and assess the quality of a project.
  • AI prompting: You can ask Copilot to summarise what a file does and suggest improvements. See their docs for examples.

Software remains a challenging yet exciting business. It’s fast-paced meaning we often need to switch between contexts. We hope you’ve learned a new strategy to help you dive into your next codebase with a little more confidence.

Oh, by the way, we are also using cal.com! If what we offer sounds interesting to you, feel free to schedule a meeting with us: https://cal.com/team/peerigon/hello 👋

🤖 Statement about usage of AI in this article: This article was written by humans (thanks for the feedback Irena and Julia!), including the title, concepts, code samples. However we used AI to enhance the style of writing.

Photo by Linh Ha on Unsplash

Web App Development

Consulting

Audit

Read also

Dark, moody background of blueberry muffins on a cooling rack with a tablet displaying the Konsens app design layered over the image.

Klara, 12/05/2024

Accessibility – An Essential Ingredient in the Batter or the "Icing" on the Cake?

Web Accessibility

Post Mortem

Konsens

Digital Inclusion

Web Development

Digitale Barrierefreiheit

Go to Blogarticle

Francesca, Ricarda, 11/21/2024

Top 10 Mistakes to Avoid When Building a Digital Product

MVP development

UX/UI design

product vision

agile process

user engagement

product development

Go to Blogarticle
A castle-like building behind a half-open gate

Leonhard, 07/15/2024

User Input Considered Harmful

TypeScript

Web App Development

Best Practices

Full-Stack

Validation

Go to Blogarticle