How to remove dead node out of the Cassandra cluster?

I had the same problem and I resolved it with removenode, which does not require you to find and change the node token.

First, get the node UUID:

nodetool status

DN  192.168.56.201  ?          256     13.1%  4fa4d101-d8d2-4de6-9ad7-a487e165c4ac  r1
DN  192.168.56.202  ?          256     12.6%  e11d219a-0b65-461e-babc-6485343568f8  r1
UN  192.168.2.91    156.04 KB  256     12.4%  e1a33ed4-d613-47a6-8b3b-325650a2bbd4  RAC1
UN  192.168.2.92    156.22 KB  256     13.6%  3a4a086c-36a6-4d69-8b61-864ff37d03c9  RAC1
UN  192.168.2.93    149.6 KB   256     11.3%  20decc72-8d0a-4c3b-8804-cc8bc98fa9e8  RAC1

As you can see the .201 and .202 are dead and on a different network. These have been changed to .91 and .92 without proper decommissioning and recommissioning. I was working on installing the network and made a few mistakes...

Second, remove the .201 with the following command:

nodetool removenode 4fa4d101-d8d2-4de6-9ad7-a487e165c4ac

(in older versions it was nodetool remove ...)

But just like for the nodetool removetoken ..., it blocks... (see comment by samarth in psandord answer) However, it has a side effect, it puts that UUID in a list of nodes to be removed. So next we can force the removal with:

nodetool removenode force

(in older versions it was nodetool remove ...)

Now the node accepts the command it tells me that it is removing the invalid entry:

RemovalStatus: Removing token (-9136982325337481102). Waiting for replication confirmation from [/192.168.2.91,/192.168.2.92].

We also see that it communicates with the two other nodes that are up and thus it takes a little time, but it is still quite fast.

Next a nodetool status does not show the .201 node. I repeat with .202 and now the status is clean.

After that you may also want to run a cleanup as mentioned in psanford answer:

nodetool cleanup

The cleanup should be run on all nodes, one by one, to make sure the change is fully taken in account.

Normally when replacing a node you want to set the new node's token to (failure node's token) - 1 and let it bootstrap. As of 1.0 there is now a flag you can specify on startup to replace a dead node: "cassandra.replace_token=".

Since you have already added the new node with the same token there's an extra step:

Move the new node's token to (failure node's token) - 1 using nodetool move
Run nodetool removetoken <failed node's token> from one of the up nodes
Run nodetool cleanup on each node

These are basically the pre 1.0 instructions for replacing a dead node with the additional token move.

How to remove dead node out of the Cassandra cluster?

Tags:

Amazon Ec2

Cluster Computing

Cassandra

Cassandra 0.7

Related

Recent Posts