Skip to content Interested in a powerful Rails UI library?

How I improved our CI build time from 24mins to 8mins and reduced costs by 50%

Published on

If you like fast CI builds, hate having to wait for an eternity for CI to complete a build after you open a PR, and love reducing infra costs, read on.

This post will outline various things I did to bring the time it took for a build to complete down from ~24 minutes to ~8 minutes whilst at the same time halving cost.

At my work, we have a big Rails app with lots of tests. Our CI runs on CircleCI.

But the things described here should mostly apply to other CI platforms and frameworks as well.

I’ve pasted our CircleCI config at the bottom if you need it for reference.

Table of Contents

  1. Split tests and run them in parallel
  2. Cache intermediate build artifacts
  3. Crafting good cache keys
  4. Rearchitect your CI workflow into multiple jobs
  5. Do a shallow Git clone
  6. Disable unnecessary logging
  7. Run system tests in headless mode
  8. Do less unnecessary work
  9. Improve your factories or just use fixtures
  10. Split up long test files
  11. Make your test environment closer to production and faster in CI
  12. CircleCI config for reference:

Coming soon: The only Rails UI library you'll ever need

  • Includes a lot of components necessary to build a modern app
  • Dark mode support out of the box · Simple to customize & extend
  • Simple primitives as well as complex components - not "another UI library"
  • Patterns I've found incredibly useful over the past years working with Rails
  • Subscribe now to receive updates and a free preview

1. Split tests and run them in parallel

We were already doing this before I started on my improvements, but I’m including this here because it’s the most important thing you can do to speed up your tests.

If you’re not splitting and running tests in parallel, you absolutely should. If you’re on a CI platform that doesn’t support parallelism, you should migrate to another one that does.

Tests with and without parallelization

Most CI platforms will already have guides on making your tests run parallel, so you might want to refer to that:

  1. TravisCI
  2. Github Action - (I couldn’t find an official doc - but this action looks promising)
  3. CircleCI

Side note: I remember us running tests on Jenkins years ago where we did parallelism a bit differently using the parallel_tests gem.

2. Cache intermediate build artifacts

This one is quite well known and also something that we were doing already. But I still chose to include this here in case you aren’t doing this already because it’s really important.

Most CI platforms will again have guides on how to cache things:

  1. TravisCI
  2. Github Actions
  3. CircleCI

Here are the kinds of things we cache:

  1. Dependencies (i.e. things you install with bundler/npm/yarn/etc.)
  2. Pre-compilation output (i.e files generated after you run webpack/vite or rails assets:precompile)

It’s important to cache things that are independent under different cache keys.

Instead of caching everything under one cache, create a separate cache for gems, a separate one for npm/yarn, and a separate one for pre-compilation output.

For gems, we make bundler install gems to a ./vendor/bundle and cache that directory with a checksum of Gemfile.lock as the cache key.

For packages installed through yarn, we simply cache the node_modules folder with a checksum of yarn.lock as the cache key.

Finally, for caching precompilation results, we cache several folders:

paths:
  - tmp/cache/assets
  - tmp/cache/vite
  - tmp/cache/webpacker
  - public/assets
  - public/packs
  - public/packs-test
  - public/vite-test

All these paths are cached with the name of the branch as well as the revision ID of the commit as the cache key.

3. Crafting good cache keys

On CircleCI, you can save a bunch of files and directories to a cache specified by a cache key.

For example, say you are caching your installation of yarn dependencies using the checksum of yarn.lock, your save_cache step probably looks something like this:

- save_cache:
	  key: ignis-v4-yarn-{{ checksum "yarn.lock" }}
	  paths:
	    - node_modules

What this means is that your cache will have the same cache key as long as the contents of yarn.lock doesn’t change.

If you want to restore a cache, you just specify the cache key. And you can specify multiple keys!

This is what the restore_cache step might look like:

- restore_cache:
    keys:
      - ignis-v4-yarn-{{ checksum "yarn.lock" }}
      - ignis-v4-yarn

The benefit of specifying multiple keys like this with the first key being more specific, followed by a less specific one is that you increase your cache hit rate.

It will first try to find a cache that begins with or exactly matches the key ignis-v4-yarn-{{ checksum “yarn.lock” }}.

If it doesn’t find a cache with that key, it will try to find a cache that begins with or exactly matches the key ignis-v4-yarn.

The benefit of specifying multiple keys like this is that if you have a cache saved previously, and then add a new package, the checksum of yarn.lock will change, leading to a new cache key.

But it will still be able to use the old cache because of the less specific key. And then it will just need to install the new package. Instead of installing everything from scratch.

Click here to view an example

Consider the following case:

  1. You do a CI build wrapping up your work on an amazing new feature

    • It computes the checksum of yarn.lock as tyiuew67 based on its contents.
    • It tries to restore things from a cache but finds nothing (let’s assume this was the first build).
    • So it installs all dependencies from scratch.
    • Then, it saves installed dependencies to cache using ignis-v4-yarn-tyiuew67 as the cache key.
  2. Later, you add a shiny little JS library using yarn while working on something else.

    • This single addition would alter the contents of yarn.lock leading to a different checksum.
  3. You then do a new CI build.

    • There is a cache saved previously with the key ignis-v4-yarn-tyiuew67.
    • It computes the new checksum of yarn.lock as ajkhwui38.
  4. Now if you just specified a single cache key during the restore step, it will try finding a cache whose key exactly matches or begins with ignis-v4-yarn-ajkhwui38.

  5. But no cache that matches that key criteria exists, so it will restore nothing and you’ll need to re-install all your dependencies again, just because of a single change.

  6. However, if you were restoring from multiple cache keys as I showed above, it’d first try finding a cache whose key exactly matches or begins with ignis-v4-yarn-ajkhwui38 is present.

  7. If it is, it will use that cache. Otherwise, it will move on to the next key which is ignis-v4-yarn.

  8. Now, there is a cache whose key ignis-v4-yarn-tyiuew67 begins with ignis-v4-yarn. So it will use that.

  9. The cache won’t fully satisfy the new dependencies because it’s old and you’ve added a new package.

  10. But it’s still better than installing everything from scratch.

  11. You are going to be running yarn install as the next step any way which would install the missing package.

4. Rearchitect your CI workflow into multiple jobs

While we did have parallelism and test splitting configured correctly, there was one (very major) problem with the way we were doing parallelism.

We had a single job to execute all kinds of tests, and that job would run across 14 instances in parallel.

Each instance of the job would install dependencies, run asset precompilation, install Google Chrome, do some quick lint checks, and then run the tests assigned to that instance.

But doing things this way has some major disadvantages. Each instance of the job would install dependencies needed for all kinds of tests.

For example, you only need Google Chrome for system tests.

Similarly, it makes no sense to precompile assets or do quick lint checks over and over again on each parallel run of the job. It’s just a waste of time, resources, and money.

After some reading, I realized there’s a much better way to architect our CI workflow.

And it indeed turned out to be the most significant source of test speedup + cost reduction.

Here’s what I did on a high level: Split the single job we had into 3 separate jobs

Before - A single job for running all tests

Before - A single job running across 14 instances would do everything from asset precompilation to running all kinds of tests.


After - Splitting our CI build into 3 jobs to improve re-use

After - 3 separate jobs that are designed to reduce duplicate work, reduce cost, and increase parallelism.

Job 1 - assets_and_checks:

  1. This job would be responsible for asset precompilation and any other global lint checks (like Rubocop, ESLint, etc.)
  2. It would save the result of the asset precompilation to the workspace which the 2 jobs below would use.
  3. Workspaces are just another type of cache on CircleCI that’s scoped to the current run. It’s a way to share data between different jobs of the same run/workflow. Maybe a normal cache would do the trick here too if your CI provider doesn’t have a similar concept.
  4. This job will be the first to run, blocking the other 2 jobs below.
  5. And because we only need to do this once, we don’t need to parallelize this job at all.
  6. We can also bump up the resource class (machine size) for this job so it completes faster.

Job 2 - non_system_tests:

  1. Non-system tests are generally much faster than system tests.
  2. This job would be responsible for running all non-system tests.
  3. Splitting things based on system vs non-system tests allows us to not install things only needed for system tests (for example, Google Chrome).
  4. It will also pull in the asset precompilation output generated by the assets_and_checks job.
  5. I parallelized this job across 10 instances and gave it a small resource class because units tests don’t need a lot of resources.

Job 3 - system_tests:

  1. System tests are generally the slowest tests in a test suite.
  2. This job would be responsible for running all system tests.
  3. It will also need to install various dependencies that we don’t require in other jobs here such as Google Chrome/etc
  4. It will also pull in the asset precompilation output generated by the assets_and_checks job
  5. I parallelized this job across 27 instances and gave it the medium resource class.

The assets_and_check job would block the other 2 jobs, but as soon as it is completed, both non_system_tests and system_tests jobs would start up.

non_system_tests would run tests in parallel with 10 instances while system_tests would use 27 instances.

Based on the kinds of tests your test suite is made up of you might want to tweak these values. You should try to make both jobs take roughly the same amount of time.

Before we had a single job that would run in parallel on 14 instances with a medium+ resource class.

Now we have 3 jobs: job 1 on a large resource class (only 1 instance), job 2 on a small resource class (parallelized across 10 instances), and job 3 on a medium resource class (parallelized across 27 instances).

You might expect things to be costlier, but because we now avoid a lot of duplicate work, things end up taking less time + splitting things like this allows us to more efficiently allocate resource classes based on what a job is doing.

And it all turns out to be 3x faster and 50% cheaper!

I’ve pasted our CircleCI config at the bottom if you need it for reference.

5. Do a shallow Git clone

If you have a huge Git repository with tens of thousands of commits and are noticing that checking out your GitHub repository takes > 20 seconds, you should try doing a shallow checkout.

A shallow Git checkout will only pull commits up to a certain depth, instead of pulling all commits. This small improvement can reduce the time it takes to pull your changes as well as bandwidth costs.

This is done by specifying the depth argument when cloning: git clone --depth 5. Here’s a good article if you want to learn more.

Your CI platform might have documentation/plugins on how to do a shallow git clone so look that up.

For CircleCI specifically, I just went with datacamp/[email protected].

6. Disable unnecessary logging

We were logging stuff in CI which was useless 99.99% of the time. It’s another simple change you can do to save some time.

In our case, we chose to disable logging behind an environment variable called DISABLE_TEST_LOGGING and we set this to 1 on CI. So we could easily enable it later to debug something if needed.

For a Rails app, this is what that looked like:

# In config/environments/test.rb

Rails.application.configure do
  ...
  ...
  config.log_level = :fatal if ENV['DISABLE_TEST_LOGGING'] == '1'
  ActiveRecord.verbose_query_logs = ENV['DISABLE_TEST_LOGGING'] != '1'
end

7. Run system tests in headless mode

Chrome supports running in headless mode where it doesn’t render any visible UI - which is what we want in a CI environment anyway. Screenshots and everything else should continue to work as normal.

You can also put this behavior behind an environment variable and only run headless in CI.

Here’s what our configuration looked like (we use Capybara, Selenium, Chrome):

# In test/application_system_test_case.rb

class ApplicationSystemTestCase < ActionDispatch::SystemTestCase
  ...
  driven_by :chrome
  ...
end

def chrome(app)
  options = Selenium::WebDriver::Options.chrome(
    args: [
      'disable-gpu',
      'no-sandbox',
      'window-size=1400,1400',
      ENV['HEADLESS'] == '1' ? 'headless=new' : nil
    ].compact
  )

  client = Selenium::WebDriver::Remote::Http::Default.new

  Capybara::Selenium::Driver.new(app,
                                 browser:     :chrome,
                                 options:,
                                 http_client: client)
end

Capybara.register_driver(:chrome) { |app| chrome(app) }

8. Do less unnecessary work

In our app, we allow users to sign up/sign in like most other apps. We store hashes of passwords and use bcrypt for hashing (through bcrypt-ruby).

bcrypt allows us to change the cost factor associated with hashing. Higher cost = more work required = more security.

Also, as you increase the value of cost, the amount of work required to hash increases exponentially.

Now we want this cost value to be reasonably high in production (12-14 from what I can see seems to be the norm), but for tests, we don’t care.

Tests usually do a lot of user creation (e.g. through factories in model tests or system tests).

So if we just lower the value for cost in the test environment, we will make our tests a bit faster with no downsides.

Here’s what I did for our case:

# In config/environments/test.rb

Rails.application.configure do
  ...
  BCrypt::Engine.cost = 1
end

If you’re using a third-party authentication solution (e.g. through a gem like devise), you might want to check if it allows configuring something like this.

For devise specifically, it does use bcrypt, and it allows you to change the cost like so:

# In config/initializers/devise.rb

Devise.setup do |config|
  ...
  config.stretches = Rails.env.test? ? 1 : 12
end

This was just one such example.

There might be other things you can do that are specific to your application. Some things that come to my mind are not loading:

Some of these will be fairly specific to your application. We also don’t want to make our test environment completely different from production by disabling everything.

It’s up to you to decide what things might be worth it to disable/exclude.

9. Improve your factories or just use fixtures

This might be a bit controversial but hear me out.

While doing all these improvements, one thing that stuck out to me was how slow some of our unit tests were when they had no reason to be.

No external API calls or any other obvious things that might explain the slowness. What one would expect to complete in milliseconds took seconds.

Now I’ve always heard that factories can get reallyyyyy slow if done poorly. After some profiling, I found how out slow is reallyyyyy slow 🤯.

This is a really good article about profiling factories created using factory_bot.

As the article describes, we were suffering from a lot of factory cascades.

It’s where you create an instance of an object using a factory but because of how interconnected things can be you end up creating 10s or 100s of associated objects.

Additionally, in our case, there were a couple of god models that were pretty slow to create.

I fixed what I could by making things re-use objects created already by other associations.

And eliminated some usage of slow factories and replaced them with fixtures. It improved some of our tests unit tests significantly.

One of the most egregious examples was a model test that took 150s, which after this change took 40s. Still super slow for a model test, but much better than what it was before.

I avoid factories whenever possible now for new tests.

In my experience, It’s very easy to introduce factory cascades. Even if you believe you can write great factories (which I certainly don’t BTW), that might not be true for other members of your team.

10. Split up long test files

Based on how you split tests, you might not need this.

IIRC the test splitting method we use would split tests by the amount of time it takes but it would only split by file. That means a single test file with a lot of tests which takes 5 minutes would always run on a single instance.

In some cases, this might lead to outliers where your whole build is waiting on a single instance that is executing a test file that has a lot of tests.

Just splitting tests into multiple files won’t automatically speed up your tests, but it might allow the test-splitting logic to more efficiently split tests and reduce outliers like that.

11. Make your test environment closer to production and faster in CI

This is a bit Rails-specific. If you’re using something else, there might be similar things you can do.

Rails comes with a bunch of configuration options that allow you to configure how things like assets are served or whether caching is enabled. In CI, we want to optimize those options for speed. Similar to what you might do in production.

Your setup might be different, and some of these might be irrelevant to you. But here are the relevant bits from what I did:

# In config/environments/test.rb

Rails.application.configure do
  ...
  if ENV['CI']
    config.assets.compile = false
    config.assets.digest  = true
    config.action_controller.perform_caching = true
    config.action_mailer.perform_caching = true
  else
    config.assets.digest  = false
  end
end

We chose to do this only on CI builds (as evidenced by the use of the environment variable CI) instead of the test environment generally because we also run tests locally while in development where these changes weren’t desired.

12. CircleCI config for reference:

version: 2.1

#=========================================
orbs:
  ruby: circleci/[email protected]
  node: circleci/[email protected]
  browser-tools: circleci/[email protected]
  shallow-checkout: datacamp/[email protected]
#=========================================

#=========================================
executors:
  ignis:
    docker:
      - image: cimg/ruby:3.3.0-browsers
        environment:
          RAILS_ENV: test
          DB_HOST: 127.0.0.1
          MYSQL_HOST: 127.0.0.1
          HEADLESS: 1
          DISABLE_TEST_LOGGING: 1
      - image: cimg/mariadb:10.6.8
#=========================================

#=========================================
precompile_cache_paths: &precompile_cache_paths
  paths:
    - tmp/cache/assets
    - tmp/cache/vite
    - public/assets
    - public/packs
    - public/packs-test
    - public/vite-test
#=========================================

#=========================================
commands:
  apt_update_and_install:
    description: "Runs apt update & install with some base dependencies + allows a parameter to specify extra deps"
    parameters:
      extra_deps:
        type: string
        default: ""
    steps:
      - run:
          command: sudo apt-get update
      - run:
          command: sudo apt install -y poppler-utils libpoppler-glib-dev << parameters.extra_deps >>

  install_gems:
    steps:
      - restore_cache:
          keys:
            - ignis-v3-ruby-{{ checksum "./Gemfile.lock"  }}
            - ignis-v3-ruby
      - run:
          name: Bundle Install
          working_directory: .
          command: |
            bundle config set path ./vendor/bundle
            (bundle check || bundle install) && bundle clean --force
      - save_cache:
          key: ignis-v3-ruby-{{ checksum "./Gemfile.lock"  }}
          paths:
            - ./vendor/bundle

  copy_db_config_and_wait_for_db:
    steps:
      - run: mv config/database.yml.circleci config/database.yml
      - run:
          name: Wait for DB
          command: dockerize -wait tcp://localhost:3306 -timeout 1m

  run_tests:
    parameters:
      test_files:
        type: string
    steps:
      - run:
          name: Load schema
          command: bin/rails db:schema:load
      - run:
          name: Run tests
          command: |
            TESTFILES=$(<< parameters.test_files >> | circleci tests split --split-by=timings)
            bin/rails test --verbose -- ${TESTFILES}
      - store_test_results:
          path: test/reports
#=========================================

#=========================================
workflows:
  workflow:
    jobs:
      - assets_and_checks:
          filters:
            branches:
              ignore:
                - master
      - non_system_tests:
          requires:
            - assets_and_checks
      - system_tests:
          requires:
            - assets_and_checks
#=========================================

#=========================================
jobs:
  assets_and_checks:
    resource_class: large
    executor: ignis
    parallelism: 1
    steps:
      - shallow-checkout/checkout
      - run:
          name: Uninstall built-in nodejs
          command: sudo rm `which node`
      - node/install:
          install-yarn: true
          node-version: "20"
      - apt_update_and_install:
          extra_deps: "python3"
      - install_gems
      - restore_cache:
          keys:
            - ignis-v7-yarn-cache-{{ checksum "yarn.lock" }}
            - ignis-v7-yarn-cache
      - restore_cache:
          keys:
            - &precompile_cache_keys ignis-v7-precompile-cache-{{ .Branch }}-{{ .Revision }}
            - ignis-v7-precompile-cache-{{ .Branch }}
            - ignis-v7-precompile-cache
      - copy_db_config_and_wait_for_db
      - run:
          name: Precompile assets
          command: bin/rails assets:precompile --trace
      - save_cache:
          key: ignis-v7-yarn-cache-{{ checksum "yarn.lock" }}
          paths:
            - node_modules
      - run:
          name: Check for typescript errors
          command: yarn tsc --noEmit
      - run:
          name: Load schema
          command: bin/rails db:schema:load
      - run:
          name: Check for schema changes
          command: bin/rails graphql:schema:check
      - run:
          name: Run zeitwerk:check
          command: bin/rails zeitwerk:check
      - save_cache:
          key: *precompile_cache_keys
          <<: *precompile_cache_paths
      - persist_to_workspace:
          root: .
          <<: *precompile_cache_paths

  non_system_tests:
    resource_class: small
    executor: ignis
    parallelism: 10
    steps:
      - shallow-checkout/checkout
      - apt_update_and_install
      - install_gems
      - copy_db_config_and_wait_for_db
      - attach_workspace:
          at: .
      - run_tests:
          test_files: circleci tests glob "test/**/*_test.rb" | sed '/test\/system/d'

  system_tests:
    resource_class: medium
    executor: ignis
    parallelism: 27
    steps:
      - shallow-checkout/checkout
      - apt_update_and_install:
          extra_deps: "libvips libnss3-tools"
      - browser-tools/install-browser-tools:
          install-firefox: false
          install-geckodriver: false
          chrome-version: 116.0.5845.96
          replace-existing-chrome: true
      - install_gems
      - copy_db_config_and_wait_for_db
      - attach_workspace:
          at: .
      - run_tests:
          test_files: circleci tests glob "test/system/**/*_test.rb"
      - store_artifacts:
          path: tmp/screenshots
#=========================================

Get more articles like this

Subscribe to my newsletter to get my latest blog posts, and other freebies like Treact, as soon as they get published !