How to remove dead node out of the Cassandra cluster?
I had the same problem and I resolved it with removenode
, which does not require you to find and change the node token.
First, get the node UUID:
nodetool status
DN 192.168.56.201 ? 256 13.1% 4fa4d101-d8d2-4de6-9ad7-a487e165c4ac r1
DN 192.168.56.202 ? 256 12.6% e11d219a-0b65-461e-babc-6485343568f8 r1
UN 192.168.2.91 156.04 KB 256 12.4% e1a33ed4-d613-47a6-8b3b-325650a2bbd4 RAC1
UN 192.168.2.92 156.22 KB 256 13.6% 3a4a086c-36a6-4d69-8b61-864ff37d03c9 RAC1
UN 192.168.2.93 149.6 KB 256 11.3% 20decc72-8d0a-4c3b-8804-cc8bc98fa9e8 RAC1
As you can see the .201 and .202 are dead and on a different network. These have been changed to .91 and .92 without proper decommissioning and recommissioning. I was working on installing the network and made a few mistakes...
Second, remove the .201 with the following command:
nodetool removenode 4fa4d101-d8d2-4de6-9ad7-a487e165c4ac
(in older versions it was nodetool remove ...
)
But just like for the nodetool removetoken ...
, it blocks... (see comment by samarth in psandord answer) However, it has a side effect, it puts that UUID in a list of nodes to be removed. So next we can force the removal with:
nodetool removenode force
(in older versions it was nodetool remove ...
)
Now the node accepts the command it tells me that it is removing the invalid entry:
RemovalStatus: Removing token (-9136982325337481102). Waiting for replication confirmation from [/192.168.2.91,/192.168.2.92].
We also see that it communicates with the two other nodes that are up and thus it takes a little time, but it is still quite fast.
Next a nodetool status
does not show the .201 node. I repeat with .202 and now the status is clean.
After that you may also want to run a cleanup as mentioned in psanford answer:
nodetool cleanup
The cleanup should be run on all nodes, one by one, to make sure the change is fully taken in account.
Normally when replacing a node you want to set the new node's token to (failure node's token) - 1
and let it bootstrap. As of 1.0 there is now a flag you can specify on startup to replace a dead node: "cassandra.replace_token=".
Since you have already added the new node with the same token there's an extra step:
- Move the new node's token to
(failure node's token) - 1
usingnodetool move
- Run
nodetool removetoken <failed node's token>
from one of the up nodes - Run
nodetool cleanup
on each node
These are basically the pre 1.0 instructions for replacing a dead node with the additional token move.