Dynamic CircleCI config based on code changes for a monorepo

4 min readOct 30, 2021

I recently finished the book Software Engineering at Google, in which I read about their single monorepo that hosts most of the source code. This made me keen to start experimenting with a Python monorepo at my company.

Of course, my current company is no Google (far from it: we are fewer than 50 employees), so I needed a solution that requires minimal effort to set up and maintain. We were already using CircleCI, but mainly in a polyrepo approach.

Each Python project had its own git repository and CI pipeline. This is fine if the projects are unrelated. But when those projects want to share Python packages, it can become quite a headache. Every package version needed to be published to a package registry. And checking that a change in a shared library doesn’t break any projects that depend on it wasn’t easy.

Google extensively applies cross-project testing to ensure the correctness of changes in their monorepo. After a change, all the tests of projects that depend on that change need to keep passing for the change to be accepted.

My goal was to replicate this on a smaller scale. Automated cross-project linting and testing were to be enabled by hosting all the inter-dependent projects in the same monorepo. In addition, a monorepo would be a massive help with maintaining a consistent code style, release and versioning strategy, and testing standards.

Requirements

Our new monorepo was to be used for running image processing algorithms. These sometimes required system-level dependencies or ML models to run. Plain Python packages were not going to be sufficient. So Docker images, possibly containing system dependencies or assets other than source code, needed to be supported as well. These Docker images we called tasks because their purpose was to perform straightforward input-process-output operations.

To achieve mostly automated Continuous Integration and Continous Deployment (CI/CD), we required the following:

The CI steps are dynamically chosen based on the changes. Only the code that changed, or of which its dependencies changed, needs to be tested and potentially redeployed
Whenever a new package or task is added to the monorepo, CircleCI automatically detects it and determines the right steps. No manual configuration should be necessary
Releasing of Python packages and Docker images is fully automated

How it works

After some reading, pondering and tinkering, I came to the solution that I’m about to share with you.

It makes use of the CircleCI dynamic configuration feature. This breaks down the CI pipeline into two phases.

In the setup phase, the default .circleci/config.yml configuration file is run. In my implementation, first, some requirements are installed and then .circleci/setup.py is executed.

All the magic happens in this setup.py script. It looks at the git diff between the current branch and the main/master branch (also called “trunk” in trunk-based development).

Based on these changes and a number of rules, a set of parameters is compiled for the next step.

The rules are as follows:

On the main branch, all the packages, and tasks need to be linted and tested after every git push.
On the other branches, a package or task only needs to be linted and tested if its source code changed or any of its (indirect) internal dependencies changed. The dependencies can be found from analyzing the poetry.lock file, as it also contains transitive dependencies. This clever idea came from here.

Sidenote. Poetry is the perfect package manager for a monorepo because it has built-in fast version resolving, locking, support for private package registries, usage of virtualenvs, building and publishing to package registries. Not to mention a super smooth user experience and great documentation.

Only when all the test and linting steps have finished successfully, can everything be released. This helps prevent partial deployments.
On the main branch, a package may only be published to our internal Python package registry if its version changed (according to git diff). Packages cannot be published from other branches.
On themain branch, Docker images are published to our internal image registry with the prod tag. On all the other branches, the tag is stg. This allows testing images on our staging environment before releasing them to production.

There are also some other checks to make sure that there are no circular dependencies and the tasks don’t depend on other tasks.

Next, the parameters are injected into a new CircleCI config file, called config_template.py. This template contains Jinja2 placeholders which allow the eventual rendering to depend on the previously determined parameters. The result is the .circleci/config_continuation.yml file that contains the static CircleCI configuration for the next phase.

In the continuation phase, the continuation orb is used to execute the newly generated config file. Then all the steps for Python packages and Docker images are run in parallel, where possible.

Future ideas

Using Python instead of bash for the setup phase unlocks a lot of possibilities. Not only is Python much more readable and extensible (in my opinion at least), but it also allows easy addition of more checks or tools.

For example, we could add a check to make sure that the version and changelog of a package were updated when the source code changed. Or we could add a filter that distinguishes between the different types of changes. A change in test code, for example, does not require the rebuilding and retesting of other packages.

Credits

Many of the ideas came from https://medium.com/opendoor-labs/our-python-monorepo-d34028f2b6fa. I highly recommend you to read it! It also contains more details than I reveal here.

Dynamic CircleCI config based on code changes for a monorepo

Requirements

How it works

Future ideas

Credits

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Martin ter Haak

No responses yet