~miguelbernadi

My project maintenance helpers

As perks of the job, I tend to find myself having to maintain old code bases with many outdated dependencies. That is quite usual, but turns to be specially painful when you need to update a core dependency and every other thing starts breaking due to API changes. This has happened to me in 2 workplaces, first in Ruby on Rails, and latest in the infrastructure scripts written in Python 2.

In my former $dayjob we worked with Ruby on Rails and we had an application much bigger than the team could support well. This brought somewhat of an aversion to upgrade dependencies unless they were required for current operation, which in turn would eventually pile blockers before any update of the Ruby runtime or the Rails framework (as two big examples of pain in our projects).

In practice, a minor version upgrade of Rails or Ruby could take from a couple weeks to a couple months of figuring out the dependency tree of the breakages in the “free” time (we had 4 hours a week to devote to technical improvement, plus those extra times you carve out when something really affronts you). At one point, I found a blocking chain of 8 big and complex dependencies that had to be updated precisely in that order and required several big refactors to accommodate them all. That one thing took two full weeks to solve and by then it was a blocker.

In my current $dayjob, on the other hand, I inherited a bunch of Python scripts to set up the infrastructure of our application that had been written in 2016 and never updated to newer python versions.

In both cases, upgrading dependencies is both a necessity (or you have to keep a EOL programming language around, vulnerabilities and no community support) and an extremely problematic, long and painful process. This obviously gets compounded with the time it takes to verify that every single version update you have done did not break the functionality, specially if your test suites take a very long execution time, or are lacking.

This eventually brought me to write some scripts to automate the upgrade attempts, at least so I could go forward enough to get to a breaking change that could be researched. These proved quite useful so I kept iterating at them and making them a bit more generic.

I tend to drop these in the `contrib/` directory of offending projects. Sometimes I commit them, but other times I just keep them locally to help me in my work. Ideally, once the project is up-to-date, the usual maintenance dance should keep things going forward without the scripts.

Upgrade dependencies iteratively #

The script below starts by opening the dependency versioning file for your project and extracting the names of the dependencies. Rubyists are used to the Gemfile and Gemfile.lock files, the former with the main dependencies and the latter with all packages and versions. Similarly Pythonists use requirements.txt. An detail I find interesting is to use the command shuf to randomly order the dependencies for every execution. The order of the upgrades can be critical, specially when you are many versions behind and shuffling them ensures the order will be different with every execution. That way you may run the script several times in succession to get as many “painless” upgrades out at once, and keep the complicated ones for later.

If the dependency manager finds an update for that dependency it usually updates the dependency tracking file, or the version lock file. If none of those files has been changed by the process, the dependency can be ignored and we move on to the next candidate to upgrade. I recommend using the version pinning features of these managers to limit what can be upgraded and by how much so you can focus on the valuable dependencies (the pinned ones will be skipped that way).

If the tracking files changed we have to verify the project still works. We do that by running the test suite. As I usually use Docker to limit the amount of toolchains installed in my system and ease sharing local development setups with colleagues, I also have to do a rebuild of the image starting from a clean slate. If that succeeds, we can delude ourselves to think the upgrade went fine and commit the changes before trying the next candidate.

If any of these steps fails, we stop the process immediately and we are in a broken project. We can muck around with it and try to figure what’s going on, or drop the current changes and retry again with a different order of the dependency list. Eventually only the broken updates will remain and it will be clear which dependencies require research and effort.

#!/bin/bash

set -eu

##### Configuration variables #####
DEPENDENCY_VERSION_TRACKER="requirements.txt"
DOCKER_IMAGE_NAME="my-project"
PACKAGE_DEPENDENCY_LIST_COMMAND="awk -F= '{print $1}' ${ROOT}/${DEPENDENCY_VERSION_TRACKER}"
PACKAGE_UPGRADE_COMMAND='pip install --upgrade $1 && pip freeze > ${DEPENDENCY_VERSION_TRACKER}'
PROJECT_UNIT_TEST_COMMAND='python -m unittest discover -v'
##### END of Configuration variables #####

ROOT=$( cd "$( dirname "$0" )" && cd .. && pwd -P )

function source_repository_credentials {
    source ${ROOT}/../../artifactory-access.sh
}

function list_dependencies {
    $(${PACKAGE_DEPENDENCY_LIST_COMMAND} | shuf)
}

function update_dependency {
    docker run \
	 --rm \
	 --volume ${ROOT}/${DEPENDENCY_VERSION_TRACKER}:/run/${DEPENDENCY_VERSION_TRACKER} \
	 ${DOCKER_IMAGE_NAME} \
	 /bin/bash -c "${PACKAGE_UPGRADE_COMMAND}"
}

function build_project {
    (
      cd ${ROOT}

      source_repository_credentials
      docker build -t ${DOCKER_IMAGE_NAME} \
	     --build-arg ARTIFACTORY_USER=$ARTIFACTORY_USER \
	     --build-arg ARTIFACTORY_PWD=$ARTIFACTORY_PWD \
	     .
    )
}

function run_unit_tests {
    docker run \
	 --rm \
	 --volume ${ROOT}:/run \
	 ${DOCKER_IMAGE_NAME} \
	 /bin/bash -c "${PROJECT_UNIT_TEST_COMMAND}"
}

# Open subshell so we don't affect user's CWD
(
    cd $ROOT

    # Get all dependencies in random order
    for dependency in list_dependencies; do
      echo "Upgrading $dependency"

      # Update one dependency
      update_dependency $dependency

      # If the dependency version tracker did not change there was
      # no update available and we test the next dependency
      [[ ! $(git status -C ${ROOT} --porcelain | grep ${DEPENDENCY_VERSION_TRACKER}) ]] && continue

      # Build new container with updated dependencies
      build_project

      - # Run tests to make sure it still works
      run_unit_tests

      # Do a commit for the upgraded dependency
      # we can later bisect any breakage the tests didn't detect
      git add ${DEPENDENCY_VERSION_TRACKER} && git commit -m "Upgrade $dependency"
    done
)

In the end, we have a git branch with a bunch of commits with individual dependency upgrades that we can upstream to our project.

git bisect runner #

Running these updates over time I have found that sometimes the project gets broken no matter what you tried. It may be some flakiness in your tests or bugs in the build process that do not report the failure properly (or bugs in my scripts, the horror) and you get a bunch of commits and a broken project. This also happens regardless of upgrades when integrating big features or with refactors, unless you are very disciplined in running the tests (which you are not if your test suite takes 30 minutes to run).

In those cases, git bisect is your friend, and I found myself enough times there that I wrote myself a script helper to run the bisect automatically. That way I could leave it running during my lunch break or meetings, figuring out what had broken the project.

The way git bisect runs is by taking a “good” and a “bad” commits and checking out the intermediate history for the first commit that was “bad”. It’s up to you to define what “good” and “bad” are.

You can have a human deciding the errors, but you can also use a script to detect the errors if you are lucky enough that the issue can be detected programmatically (and building the script takes less time/effort than doing it by hand a bunch of times). Such a script must return 0 if it’s a “good” commit and non-0 if it’s not.

This scripts attempts to run the tests (and in my case with Docker, the build as well just in case there was some deep change) and returns non-0 if any of those operations fails.

For example, we may bisect a currently failing branch with:

git bisect start
git bisect bad HEAD
git bisect good master
git bisect run ./contrib/bisect_runner.sh
#!/bin/bash

set -eu

DOCKER_IMAGE_NAME="my-project"
PROJECT_UNIT_TEST_COMMAND='python -m unittest discover -v'

ROOT=$( cd "$( dirname "$0" )" && cd .. && pwd -P )

function source_repository_credentials {
    source ${ROOT}/../../artifactory-access.sh
}

function build_project {
    (
      cd ${ROOT}

      source_repository_credentials
      docker build -t ${DOCKER_IMAGE_NAME} \
	     --build-arg ARTIFACTORY_USER=$ARTIFACTORY_USER \
	     --build-arg ARTIFACTORY_PWD=$ARTIFACTORY_PWD \
	     .
    )
}

function run_unit_tests {
    docker run \
	 --rm \
	 --volume ${ROOT}:/run \
	 ${DOCKER_IMAGE_NAME} \
	 /bin/bash -c "${PROJECT_UNIT_TEST_COMMAND}"
}

build_project && run_unit_tests

return 0

As an Emacs user, I have an option to run it directly from the Magit buffer. Starting this process is a single operation for my setup thanks to having the script.

Outro #

You may notice several of the steps in bisect_runner.sh are identical to those in the dependency upgrade script and that’s because we can actually have one using the other. I did not do so here to make it easier to follow them individually.

If you have suggestions for improvement open a comment thread and we can discuss it. I do not have these in a shareable repository, but we can cook one up if people find it useful.

COMMENTS

Have a comment on this article? Start a discussion in my public inbox by sending an email to ~miguelbernadi/public-inbox@lists.sr.ht [mailing list etiquette] , or see existing discussions.