AWS Elastic Beanstalk, running a cronjob

I spoke to an AWS support agent and this is how we got this to work for me. 2015 solution:

Create a file in your .ebextensions directory with your_file_name.config. In the config file input:

files:
  "/etc/cron.d/cron_example":
    mode: "000644"
    owner: root
    group: root
    content: |
      * * * * * root /usr/local/bin/cron_example.sh

  "/usr/local/bin/cron_example.sh":
    mode: "000755"
    owner: root
    group: root
    content: |
      #!/bin/bash

      /usr/local/bin/test_cron.sh || exit
      echo "Cron running at " `date` >> /tmp/cron_example.log
      # Now do tasks that should only run on 1 instance ...

  "/usr/local/bin/test_cron.sh":
    mode: "000755"
    owner: root
    group: root
    content: |
      #!/bin/bash

      METADATA=/opt/aws/bin/ec2-metadata
      INSTANCE_ID=`$METADATA -i | awk '{print $2}'`
      REGION=`$METADATA -z | awk '{print substr($2, 0, length($2)-1)}'`

      # Find our Auto Scaling Group name.
      ASG=`aws ec2 describe-tags --filters "Name=resource-id,Values=$INSTANCE_ID" \
        --region $REGION --output text | awk '/aws:autoscaling:groupName/ {print $5}'`

      # Find the first instance in the Group
      FIRST=`aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names $ASG \
        --region $REGION --output text | awk '/InService$/ {print $4}' | sort | head -1`

      # Test if they're the same.
      [ "$FIRST" = "$INSTANCE_ID" ]

commands:
  rm_old_cron:
    command: "rm *.bak"
    cwd: "/etc/cron.d"
    ignoreErrors: true

This solution has 2 drawbacks:

  1. On subsequent deployments, Beanstalk renames the existing cron script as .bak, but cron will still run it. Your Cron now executes twice on the same machine.
  2. If your environment scales up, you get several instances, all running your cron script. This means your mail shots are repeated, or your database archives duplicated

Workaround:

  1. Ensure any .ebextensions script which creates a cron also removes the .bak files on subsequent deployments.
  2. Have a helper script which does the following: -- Gets the current Instance ID from the Metadata -- Gets the current Auto Scaling Group name from the EC2 Tags -- Gets the list of EC2 Instances in that Group, sorted alphabetically. -- Takes the first instance from that list. -- Compares the Instance ID from step 1 with the first Instance ID from step 4. Your cron scripts can then use this helper script to determine if they should execute.

Caveat:

  • The IAM Role used for the Beanstalk instances needs ec2:DescribeTags and autoscaling:DescribeAutoScalingGroups permissions
  • The instances chosen from are those shown as InService by Auto Scaling. This does not necessarily mean they are fully booted up and ready to run your cron.

You would not have to set the IAM Roles if you are using the default beanstalk role.


Regarding jamieb's response, and as alrdinleal mentions, you can use the 'leader_only' property to ensure that only one EC2 instance runs the cron job.

Quote taken from http://docs.amazonwebservices.com/elasticbeanstalk/latest/dg/customize-containers-ec2.html:

you can use leader_only. One instance is chosen to be the leader in an Auto Scaling group. If the leader_only value is set to true, the command runs only on the instance that is marked as the leader.

Im trying to achieve a similar thing on my eb, so will update my post if I solve it.

UPDATE:

Ok, I now have working cronjobs using the following eb config:

files:
  "/tmp/cronjob" :
    mode: "000777"
    owner: ec2-user
    group: ec2-user
    content: |
      # clear expired baskets
      */10 * * * * /usr/bin/wget -o /dev/null http://blah.elasticbeanstalk.com/basket/purge > $HOME/basket_purge.log 2>&1
      # clean up files created by above cronjob
      30 23 * * * rm $HOME/purge*
    encoding: plain 
container_commands:
  purge_basket: 
    command: crontab /tmp/cronjob
    leader_only: true
commands:
  delete_cronjob_file: 
    command: rm /tmp/cronjob

Essentially, I create a temp file with the cronjobs and then set the crontab to read from the temp file, then delete the temp file afterwards. Hope this helps.


This is how I added a cron job to Elastic Beanstalk:

Create a folder at the root of your application called .ebextensions if it doesn't exist already. Then create a config file inside the .ebextensions folder. I'll use example.config for illustration purposes. Then add this to example.config

container_commands:
  01_some_cron_job:
    command: "cat .ebextensions/some_cron_job.txt > /etc/cron.d/some_cron_job && chmod 644 /etc/cron.d/some_cron_job"
    leader_only: true

This is a YAML configuration file for Elastic Beanstalk. Make sure when you copy this into your text editor that your text editor uses spaces instead of tabs. Otherwise you'll get a YAML error when you push this to EB.

So what this does is create a command called 01_some_cron_job. Commands are run in alphabetical order so the 01 makes sure it's run as the first command.

The command then takes the contents of a file called some_cron_job.txt and adds it to a file called some_cron_job in /etc/cron.d.

The command then changes the permissions on the /etc/cron.d/some_cron_job file.

The leader_only key ensures the command is only run on the ec2 instance that is considered the leader. Rather than running on every ec2 instance you may have running.

Then create a file called some_cron_job.txt inside the .ebextensions folder. You will place your cron jobs in this file.

So for example:

# The newline at the end of this file is extremely important.  Cron won't run without it.
* * * * * root /usr/bin/php some-php-script-here > /dev/null

So this cron job will run every minute of every hour of every day as the root user and discard the output to /dev/null. /usr/bin/php is the path to php. Then replace some-php-script-here with the path to your php file. This is obviously assuming your cron job needs to run a PHP file.

Also, make sure the some_cron_job.txt file has a newline at the end of the file just like the comment says. Otherwise cron won't run.

Update: There is an issue with this solution when Elastic Beanstalk scales up your instances. For example, lets say you have one instance with the cron job running. You get an increase in traffic so Elastic Beanstalk scales you up to two instances. The leader_only will ensure you only have one cron job running between the two instances. Your traffic decreases and Elastic Beanstalk scales you down to one instance. But instead of terminating the second instance, Elastic Beanstalk terminates the first instance that was the leader. You now don't have any cron jobs running since they were only running on the first instance that was terminated. See the comments below.

Update 2: Just making this clear from the comments below: AWS has now protection against automatic instance termination. Just enable it on your leader instance and you're good to go. – Nicolás Arévalo Oct 28 '16 at 9:23


This is the official way to do it now (2015+). Please try this first, it's by far easiest method currently available and most reliable as well.

According to current docs, one is able to run periodic tasks on their so-called worker tier.

Citing the documentation:

AWS Elastic Beanstalk supports periodic tasks for worker environment tiers in environments running a predefined configuration with a solution stack that contains "v1.2.0" in the container name. You must create a new environment.

Also interesting is the part about cron.yaml:

To invoke periodic tasks, your application source bundle must include a cron.yaml file at the root level. The file must contain information about the periodic tasks you want to schedule. Specify this information using standard crontab syntax.

Update: We were able to get this work. Here are some important gotchas from our experience (Node.js platform):

  • When using cron.yaml file, make sure you have latest awsebcli, because older versions will not work properly.
  • It is also vital to create new environment (at least in our case it was), not just clone old one.
  • If you want to make sure CRON is supported on your EC2 Worker Tier instance, ssh into it (eb ssh), and run cat /var/log/aws-sqsd/default.log. It should report as aws-sqsd 2.0 (2015-02-18). If you don't have 2.0 version, something gone wrong when creating your environment and you need to create new one as stated above.