How to avoid reinstalling dependencies for each job in Gitlab CI

I prefer use cache because removes files when pipeline finished.

Example

image: node

stages:
 - install
 - test
 - compile

cache:
 key: modules
 paths:
  - node_modules/

install:modules:
 stage: install
 cache:
  key: modules
  paths:
    - node_modules/
  after_script:
   - node -v && npm -v
  script:
  - npm i

test:
 stage: test
 cache:
   key: modules
   paths:
     - node_modules/
   policy: pull
 before_script:
  - node -v && npm -v
 script:
- npm run test

compile:
 stage: compile
 cache:
 key: modules
 paths:
   - node_modules/
 policy: pull
 script:
  - npm run build

EDIT: This solution was recommended in 2016. In 2021, you might consider the caching docs instead.

A better approach these days is to make use of artifacts.

In the following example, the node_modules/ directory is immediately available to the lint job once the build stage has completed successfully.

build:
  stage: build
  script:
    - npm install -q
    - npm run build
  artifacts:
    paths:
      - node_modules/
  expire_in: 1 week

lint:
  stage: test
  script:
    - npm run lint

Update: I now recommend using artifacts with a short expire_in. This is superior to cache because it only has to write the artifact once per pipeline whereas the cache is updated after every job. Also the cache is per runner so if you run your jobs in parallel on multiple runners it's not guaranteed to be populated, unlike artifacts which are stored centrally.


Gitlab CI 8.2 adds runner caching which lets you reuse files between builds. However I've found this to be very slow.

Instead I've implemented my own caching system using a bit of shell scripting:

before_script:
  # unique hash of required dependencies
  - PACKAGE_HASH=($(md5sum package.json))
  # path to cache file
  - DEPS_CACHE=/tmp/dependencies_${PACKAGE_HASH}.tar.gz
  # Check if cache file exists and if not, create it
  - if [ -f $DEPS_CACHE ];
    then
      tar zxf $DEPS_CACHE;
    else
      npm install --quiet;
      tar zcf - ./node_modules > $DEPS_CACHE;
    fi

This will run before every job in your .gitlab-ci.yml and only install your dependencies if package.json has changed or the cache file is missing (e.g. first run, or file was manually deleted). Note that if you have several runners on different servers, they will each have their own cache file.

You may want to clear out the cache file on a regular basis in order to get the latest dependencies. We do this with the following cron entry:

@daily               find /tmp/dependencies_* -mtime +1 -type f -delete

From docs:

  • cache: Use for temporary storage for project dependencies. Not useful for keeping intermediate build results, like jar or apk files. Cache was designed to be used to speed up invocations of subsequent runs of a given job, by keeping things like dependencies (e.g., npm packages, Go vendor packages, etc.) so they don’t have to be re-fetched from the public internet. While the cache can be abused to pass intermediate build results between stages, there may be cases where artifacts are a better fit.

  • artifacts: Use for stage results that will be passed between stages. Artifacts were designed to upload some compiled/generated bits of the build, and they can be fetched by any number of concurrent Runners. They are guaranteed to be available and are there to pass data between jobs. They are also exposed to be downloaded from the UI. Artifacts can only exist in directories relative to the build directory and specifying paths which don’t comply to this rule trigger an unintuitive and illogical error message (an enhancement is discussed at https://gitlab.com/gitlab-org/gitlab-ce/issues/15530 ). Artifacts need to be uploaded to the GitLab instance (not only the GitLab runner) before the next stage job(s) can start, so you need to evaluate carefully whether your bandwidth allows you to profit from parallelization with stages and shared artifacts before investing time in changes to the setup.

So, I use cache. When don't need to update de cache (eg. build folder in a test job), I use policy: pull (see here).