Caching with Gradle in Gitlab with AWS Autoscale

So you have a Continuous Integration Server that runs some build tasks. For example, you’re running ./gradlew buildTaskOne. Now, if you run ./gradlew buildTaskOne again you don’t want it to build again if nothing has changed, you just want it to say “up-to-date” and then move on to the next task. On your local machine this goal is achieved relatively straightforward. For Java builds this works out of the box, for other builds a small setup like the following is required:

task installNodeModules(type: Exec) {
    commandLine 'npm', 'i'
    inputs.files 'package.json'
    outputs.dir 'node_modules'
}

The fist time this task is ran this is executed, the second time it is skipped:

That was easy! So how can you achieve the same thing when using gitlab runners in combination with AWS Autoscale? Some more steps are required to achieve this…

Set Up Distributed S3 Caches

AWS Autoscale uses distinct EC2 instances for each build stage that are dynamically spun up and torn down. This means that for sharing caches, we need a central file store. And what could be better suited for this task than S3? Also, Gitlab integrates really well with this. Simply put the following in your config.toml of your AWS Autoscale manager (bastion server):

concurrent = 10
check_interval = 0
log_level = "debug"

[[runners]]
  name = "gitlab-aws-autoscaler"
  url = "<your url>"
  token = "${RUNNER_TOKEN}"
  executor = "docker+machine"
  limit = 6
  [runners.docker]
    image = "taskbase/gitlabrunner:2.4"
    privileged = true
    disable_cache = true
    volumes = ["/var/run/docker.sock:/var/run/docker.sock", "/cache", "/builds:/builds"]
  [runners.cache]
    Type = "s3"
    ServerAddress = "s3.amazonaws.com"
    AccessKey = "${AWS_ACCESS_KEY_ID}"
    SecretKey = "${AWS_SECRET_ACCESS_KEY}"
    BucketName = "your-bucket-name"
    BucketLocation = "eu-central-1"
    Shared = true

The bold part is interesting for you, I’ve added the rest to give you a bit more context. The red bits need to be adjusted to your setup. You’ll need to create an S3 bucket for this to work first.

Set Up Caches in .gitlab-ci.yml

You’ll also need to specify what you want to cache in .gitlab-ci.yml. This is minimal setup for the node_modules example:

cache:
  key: ${CI_COMMIT_REF_SLUG}
  paths:
  - .gradle
  - ./node_modules # important: don't forget to cache the output!

Those directories will get uploaded to S3. Now I’m not entirely sure about this, but I think the directories have to be within your repository. For example I tried to specify an absolute path /opt/gradle/caches but this didn’t get uploaded! In order to store gradle caches into the .gradle folder, run the following line in a before script part of gitlab ci:

before_script:
- export GRADLE_USER_HOME=`pwd`/.gradle # there is no $USER_HOME, and /opt/gradle/caches can't be uploaded, so local gradle home is chosen

Explanation for the above setup (optional reading)

On AWS Autoscale runners you don’t have a $USER_HOME variable and gradle seems to store caches by default into /opt/gradle/caches. You can verify this with the following gradle task:

task showMeCache << {
    configurations.compile.each { println it }
}

But when trying to store the cache at /opt/gradle/caches nothing seems to happen, so a local gradle home is chosen. This can be achieved by setting the GRADLE_USER_HOME variable to a location within your repo. That’s what the command in the before script is good for.

It’s also important to cache the entire .gradle folder and not just ./gradle/caches/ since to check whether something is up-to-date gradle checks the file hashes in ./gradle/4.6/fileHashes, at least in gradle version 4.6. I learned this the hard way by running a git diff on the .gradle folder before and after making a change to the file declared in input.files.

Setting Up the build.gradle File

Another important concept when setting up the gradle caching mechanisms is “path sensitivity”. By default, the path sensitivity is ABSOLUTE and not RELATIVE, meaning that when you have a cache at /my/path/bla and then /your/path/bla the cache will NOT take effect. From what I’ve seen the builds on AWS Autoscale for the same branches have the same absolute path so it’s not a problem, but to be sure you could write something like the following:

@PathSensitive(PathSensitivity.RELATIVE)
@InputFiles
FileTree packageJSON() {
    return fileTree('./package.json')
}
task npmInstall (type: Exec) {
    inputs.files packageJSON()
    outputs.dir 'node_modules'
    commandLine 'npm', 'i'
}

or

task npmInstall (type: Exec) {
    inputs.files 'package.json' withPathSensitivity(PathSensitivity.RELATIVE)
    outputs.dir 'node_modules'
    commandLine 'npm', 'i'
}

Of course, only apply this if your input files are really path independent. But you can also leave this away and introduce it once necessary.

Checking if Everything is Working

For some reason, the cache doesn’t seem to kick in the first time it should. So when we push changes, and then on the second run it should be cached, the caches somehow don’t work yet. But on the third run, they do work.

Summary

What is minimally required to successfully employ input / output caches are the following steps:

  • S3 distributed cache setup correctly
  • Gradle Home within repository, tried and tested with .gradle
  • Upload gradle home folder to S3
  • Upload outputs of build task to S3 as well. The build task needs an output in order be be cachable.
  • Try at least two times to check if caching works, the first time when the caches already should kick in, they don’t work for some reason…

Leave a Reply

Your email address will not be published.