Node.js: What's a good way to automatically restart a node server that's not responding?

I figured out how to solve this using native Node functionality. Migg's answer was good and lead me in the right direction, but it still doesn't show how to automatically restart when the event loop is completely blocked.

The trick is to use Node's native child_process module and the fork method to start the server from another node instance and have that instance ping the server for responses, restarting it when it's stuck. This is similar to how Forever and PM2 work. It's hard to believe there's not a simple way to implement this with either of those libraries, but this is how you can do it naively.

I have commented this code heavily to point out what everything is doing. Also note that I am using ES2015's Arrow Functions. Go read about them if you're not familiar.

var fork = require('child_process').fork;
var server, heartbeat; 

function startServer () {
  console.log('Starting server');
  server = fork('server');

  //when the server goes down restart it
  server.on('close', (code) => {
    startServer();
  });

  //when server sends a heartbeat message save it
  server.on('message', (message) => {
    heartbeat = message ? message.heartbeat : null;
  });

  //ask the server for a heartbeat
  server.send({request: 'heartbeat'});

  //wait 5 seconds and check if the server responded
  setTimeout(checkHeartbeat, 5000);
}

function checkHeartbeat() {
  if(heartbeat) {
    console.log('Server is alive');

    //clear the heart beat and send request for a new one
    heartbeat = null; 
    server.send({request: 'heartbeat'});

    //set another hearbeat check
    setTimeout(checkHeartbeat, 5000);

  } else {
    console.log('Server looks stuck...killing');
    server.kill();
  }
}

startServer();

Be sure to change out server.js with whatever Node app you want to run.

Now on your server add the following to respond to the heartbeat request.

//listen and respond to heartbeat request from parent
process.on('message', (message) => {
  if(message && message.request === 'heartbeat') {
    process.send({heartbeat: 'thump'});
  }
});

Finally add a timeout to test that it works (not for production!)

//block the even loop after 30 seconds 
setTimeout(() => {
  for(;;){}
}, 30000);

First of all you should try to find the problems in the code by reviewing it.

Memory Leaks

For the running app you should use pm2. It has a setting to restart the app based on too much memory consumption. Directly from the docs:

pm2 start big-array.js --max-memory-restart 20M

Or using an ecosystem.json:

{
    "max_memory_restart" : "20M"
}

There are also several great articles about debugging memory leaks in node.js to find online. There is even a module that reports leaks which we used in the early days. This is too big a subject to fill it in here.

Blocking Event Loop / Infinite Loops

You can instrument your app to report the responsiveness of the event loop. So if some code blocks the loop for too long you can programmatically terminate the process. You will have to look at process.nextTick.

You can introduce a measurement that for example calls process.nextTick every X seconds, and if it then takes more than some defined time, send process.exit(1) to terminate the process and let pm2 restart it.

The upside of this would be that your app runs most of the time. The downside would be that all users with open connections would get no answer when process.exit is called.

Debugging

To find memory leaks and other problems in the running code you should dive into https://www.joyent.com/developers/node/debug. There is a whole section about MDB which will help you find the problems, but it will take some time and getting used to it. All of this is too much information to not link to it here.

Best of luck with your app!