Init script for Cassandra with docker-compose

I solved this problem by patching cassandra's docker-entrypoint.sh so it will execute sh and cql files located in /docker-entrypoint-initdb.d on startup. This is similar to how MySQL docker containers work.

Basically, I add a small script at the end of the docker-entrypoint.sh (right before the last line, exec "$@"), that will run the cql scripts once cassandra is up. A simplified version is:

INIT_DIR=docker-entrypoint-initdb.d
# this whole block will execute in the background
(
    cd $INIT_DIR
    # wait for cassandra to be ready
    while ! cqlsh -e 'describe cluster' > /dev/null 2>&1; do sleep 6; done
    echo "$0: Cassandra cluster ready: executing cql scripts found in $INIT_DIR"
    # find and execute cql scripts, in name order
    for f in $(find . -type f -name "*.cql" -print | sort); do
        echo "$0: running $f"
        cqlsh -f "$f"
        echo "$0: $f executed"
    done
) &

This solution works for all cassandra versions (at least until 3.11, as the time of writing).

Hence, you only have to build and use this cassandra image version, and then add proper initializations scripts to the container using docker-compose volumes.

A complete gist with a more robust entrypoint patch (and example) is available here.


We recently tried to solve a similar problem in KillrVideo, a reference application for Cassandra. We are using Docker Compose to spin up the environment needed by the application which includes a DataStax Enterprise (i.e. Cassandra) node. We wanted that node to do some bootstrapping the first time it was started to install the CQL schema (using cqlsh to run the statements in a .cql file just like you're trying to do). Basically the approach we took was to write a shell script for our Docker entrypoint that:

  1. Starts the node normally but in the background.
  2. Waits until port 9042 is available (this is where clients connect to run CQL statements).
  3. Uses cqlsh -f to run some CQL statements and init the schema.
  4. Stops the node that's running in the background.
  5. Continues on to the usual entrypoint for our Docker image that starts up the node normally (in the foreground like Docker expects).

We just use the existence of a file to indicate whether the node has already been bootstrapped and check that on startup to determine whether we need to do that logic above or can just start it normally. You can see the results in the killrvideo-dse-docker repository on GitHub.

There is one caveat to this approach. This worked great for us because in our reference application, we're only spinning up a single node (i.e. we aren't creating a cluster with more than one node). If you're running multiple nodes, you'll probably want to make sure that only one of the nodes does the bootstrapping to create the schema because multiple clients modifying the schema simultaneously can cause some issues with your cluster. (This is a known issue and will hopefully be fixed at some point.)


I was also searching for the solution to this question, and here is the way how I accomplished it.
Here the second instance of Cassandra has a volume with the schema.cql and runs CQLSH command

My Version with healthcheck so we can get rid of sleep command

version: '2.2'

services:
  cassandra:
      image: cassandra:3.11.2
      container_name: cassandra
      ports:
        - "9042:9042"
      environment:
        - "MAX_HEAP_SIZE=256M"
        - "HEAP_NEWSIZE=128M"
      restart: always
      volumes:
        - ./out/cassandra_data:/var/lib/cassandra
      healthcheck:
        test: ["CMD", "cqlsh", "-u cassandra", "-p cassandra" ,"-e describe keyspaces"]
        interval: 15s
        timeout: 10s
        retries: 10

  cassandra-load-keyspace:
      container_name: cassandra-load-keyspace
      image: cassandra:3.11.2
      depends_on:
        cassandra:
          condition: service_healthy
      volumes:
        - ./src/main/resources/cassandra_schema.cql:/schema.cql
      command: /bin/bash -c "echo loading cassandra keyspace && cqlsh cassandra -f /schema.cql"

NetFlix Version using sleep

version: '3.5'

services:
  cassandra:
      image: cassandra:latest
      container_name: cassandra
      ports:
        - "9042:9042"
      environment:
        - "MAX_HEAP_SIZE=256M"
        - "HEAP_NEWSIZE=128M"
      restart: always
      volumes:
        - ./out/cassandra_data:/var/lib/cassandra

  cassandra-load-keyspace:
      container_name: cassandra-load-keyspace
      image: cassandra:latest
      depends_on:
        - cassandra
      volumes:
        - ./src/main/resources/cassandra_schema.cql:/schema.cql 
      command: /bin/bash -c "sleep 60 && echo loading cassandra keyspace && cqlsh cassandra -f /schema.cql"

P.S I found this way at one of the Netflix Repos