MongoDB replica set promotes SECONDARY to PRIMARY

Background of the incident

There is a MongoDB replica set in the online environment, because it is deployed in the customer’s local server room, the customer mistakenly deleted the VMs of the other 2 nodes where the replica set is deployed (and the VMs are not recoverable). Fortunately, there is still a node alive, and after logging in to the node, we found that this node is SECONDARY, so some data may be lost, and at this time, we cannot provide read/write service to the application. At this point, we can only stop the service for maintenance and restore the cluster.

Based on the above problem, the following is a record of the copy set recovery operation steps.

Processing ideas

backup the mongodb data (to prevent data loss due to accidents when restoring the cluster).
upgrade the only surviving SECONDARY node to PRIMARY, delete the other 2 non-surviving nodes in the cluster, and then reconfigure the MongoDB replica set.
deploy 2 new MongoDB nodes and join them to the cluster. 4.
Wait for the PRIMARY node data to be synchronized to the other 2 new nodes, then perform data validation and end the production environment maintenance.

Note:

Since only the SECONDARY node exists in the original cluster and the PRIMARY node has been lost, there is a possibility that the deployment data is not synchronized to SECONDARY. However, since the VM of the PRIMARY node has been deleted, the loss of this unsynchronized data is inevitable, and you can only recover this data according to your business and code logic settings to make up for the lost data.

Cluster recovery

1、Delete the hung primary node in SECONDARY node

1.1 View the current replica set configuration

rs1:SECONDARY> rs.conf()

The output reads：

rs1:SECONDARY> use admin
switched to db admin
rs1:SECONDARY> rs_conf = rs.config()
{
	"_id" : "rs1",
	"version" : 7,
	"protocolVersion" : NumberLong(1),
	"members" : [
		{
			"_id" : 0,
			"host" : "192.168.30.207:27017",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : false,
			"priority" : 1,
			"tags" : {
				
			},
			"slaveDelay" : NumberLong(0),
			"votes" : 1
		},
		{
			"_id" : 1,
			"host" : "192.168.30.213:27017",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : false,
			"priority" : 1,
			"tags" : {
				
			},
			"slaveDelay" : NumberLong(0),
			"votes" : 1
		},
		{
			"_id" : 2,
			"host" : "192.168.30.214:27017",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : false,
			"priority" : 1,
			"tags" : {
				
			},
			"slaveDelay" : NumberLong(0),
			"votes" : 1
		}
	],
	"settings" : {
		"chainingAllowed" : true,
		"heartbeatIntervalMillis" : 2000,
		"heartbeatTimeoutSecs" : 10,
		"electionTimeoutMillis" : 10000,
		"catchUpTimeoutMillis" : -1,
		"catchUpTakeoverDelayMillis" : 30000,
		"getLastErrorModes" : {
			
		},
		"getLastErrorDefaults" : {
			"w" : 1,
			"wtimeout" : 0
		},
		"replicaSetId" : ObjectId("5f5094994a4d5004eae73e2f")
	}
}

1.2 Deleting cluster members

For example, to delete a member with host 192.168.30.213:27017 from members, find the member’s _id via rs.conf()

{
	"_id" : 1,
	"host" : "192.168.30.213:27017",
	"arbiterOnly" : false,
	"buildIndexes" : true,
	"hidden" : false,
	"priority" : 1,
	"tags" : {
		
	},
	"slaveDelay" : NumberLong(0),
	"votes" : 1
},

delete the member whose _id is 1

The first argument of splice indicates the subscript of the array element to be deleted

0 indicates the "_id" of the member node in the cluster

1 indicates the number of deletions

rs1:SECONDARY> rs_conf = rs.conf()
rs1:SECONDARY> rs_conf.members.splice(0,1)

Output：

rs1:SECONDARY> rs_conf.members.splice(1,1)
[
	{
		"_id" : 1,
		"host" : "192.168.30.213:27017",
		"arbiterOnly" : false,
		"buildIndexes" : true,
		"hidden" : false,
		"priority" : 1,
		"tags" : {
			
		},
		"slaveDelay" : NumberLong(0),
		"votes" : 1
	}
]

Delete the nodes that are not alive in the replica set according to this method.

Note:

One thing to note is that since the member with _id of 1 has been deleted, the _id number of the subsequent members will be reduced by 1, the same as the subscript of the element in the array.

2. Reconfigure the MongoDB replica set

2.1 Reset cluster configuration

rs_conf is the modified configuration above, with the force parameter because SECONDARY does not have permission to execute this command by default

rs1:SECONDARY> rs.reconfig(rs_conf, {"force":true})

Return content：

rs1:SECONDARY> rs.reconfig(rs_conf, {"force":true})
{
	"ok" : 1,
	"operationTime" : Timestamp(1619586716, 1),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1619588924, 1),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}
rs1:PRIMARY>

2.2 Viewing cluster status

rs1:PRIMARY> rs.status()

Return content：

{
	"set" : "rs1",
	"date" : ISODate("2021-04-28T05:51:03.672Z"),
	"myState" : 1,
	"term" : NumberLong(17),
	"syncingTo" : "",
	"syncSourceHost" : "",
	"syncSourceId" : -1,
	"heartbeatIntervalMillis" : NumberLong(2000),
	"optimes" : {
		"lastCommittedOpTime" : {
			"ts" : Timestamp(1619589055, 1),
			"t" : NumberLong(17)
		},
		"readConcernMajorityOpTime" : {
			"ts" : Timestamp(1619589055, 1),
			"t" : NumberLong(17)
		},
		"appliedOpTime" : {
			"ts" : Timestamp(1619589055, 1),
			"t" : NumberLong(17)
		},
		"durableOpTime" : {
			"ts" : Timestamp(1619589055, 1),
			"t" : NumberLong(17)
		}
	},
	"members" : [
		{
			"_id" : 0,
			"name" : "192.168.30.207:27017",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 7482,
			"optime" : {
				"ts" : Timestamp(1619589055, 1),
				"t" : NumberLong(17)
			},
			"optimeDate" : ISODate("2021-04-28T05:50:55Z"),
			"syncingTo" : "",
			"syncSourceHost" : "",
			"syncSourceId" : -1,
			"infoMessage" : "",
			"electionTime" : Timestamp(1619588924, 1),
			"electionDate" : ISODate("2021-04-28T05:48:44Z"),
			"configVersion" : 124340,
			"self" : true,
			"lastHeartbeatMessage" : ""
		}
	],
	"ok" : 1,
	"operationTime" : Timestamp(1619589055, 1),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1619589055, 1),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}

At this point, we find that the SECONDARY node has been promoted to PRIMARY and there is only our current node in the cluster state.

The next step is to add a new MongoDB node to the replica set.

3. Add new MongoDB nodes

The deployment process of new nodes is omitted here, you can refer to “MongoDB Single Node Upgrade to Replica Set High Availability Cluster” The step-by-step MongoDB node deployment in the article.

Note:

After adding instances to the mongodb replica set, the PRIMARY node data can be automatically synchronized to the newly added SECONDARY node without manual intervention.

3.1 Adding Instances

The priority weight of the newly added instance is 1 by default, if you want to adjust it, it is recommended to change the weight after the data synchronization is finished.

rs1:PRIMARY> use admin
rs1:PRIMARY> rs.add('192.168.30.213:27017')
rs1:PRIMARY> rs.add('192.168.30.214:27017')

The return result of adding the node is as follows.

rs1:PRIMARY> rs.add('192.168.30.213:27017')
{
	"ok" : 1,
	"operationTime" : Timestamp(1619581966, 1),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1619581966, 1),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}
rs1:PRIMARY> rs.add('192.168.30.214:27017')
{
	"ok" : 1,
	"operationTime" : Timestamp(1619581975, 1),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1619581975, 1),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}

3.2 Removing instances

If you add the wrong node, you can remove the wrong node with rs.remove() (since the current instance is already PRIMARY, you don’t need to remove the node with the method in 1.2).

Removing an instance from a mongodb replica set does not remove the PRIMARY

rs1:PRIMARY> use admin
rs1:PRIMARY> rs.remove('192.168.30.214:27017')

Return content.

rs1:PRIMARY> rs.remove('192.168.30.213:27017')
{
	"ok" : 1,
	"operationTime" : Timestamp(1619581713, 1),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1619581713, 1),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}
rs1:PRIMARY> rs.remove('192.168.30.214:27017')
{
	"ok" : 1,
	"operationTime" : Timestamp(1619581777, 2),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1619581777, 2),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}

Note:

The order of the copy set will be messed up after adding and deleting, you can set the weight to adjust it as needed.

4. Adjusting node weights

If you want to keep a node always as PRIMARY after the cluster is down and recovered, you can set the weight of this node to the maximum.

4.1 Setting weights

Find the member _id of the corresponding node in the replica set and set the weight.

Here is the example of member 0, whose host is ``192.168.30.207:27017`’’

rs1:PRIMARY> rs_conf = rs.config()
rs1:PRIMARY> rs_conf.members[0].priority=10

4.2 Effective configuration

rs1:PRIMARY> rs.reconfig(rs_conf)

Return results.

rs1:PRIMARY> rs.reconfig(rs_conf)
{
	"ok" : 1,
	"operationTime" : Timestamp(1619591404, 1),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1619591404, 1),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}

4.3 Validation weight configuration

Query the weight of member 0

rs1:PRIMARY> rs.config()

Return content.

{
		"_id" : 0,
		"host" : "192.168.30.207:27017",
		"arbiterOnly" : false,
		"buildIndexes" : true,
		"hidden" : false,
		"priority" : 10,
		"tags" : {
			
		},
		"slaveDelay" : NumberLong(0),
		"votes" : 1
	}

Simulate the cluster state after recovery from downtime

Shut down the mongodb service of the three nodes, and then restore them out of order, and then connect into node 192.168.30.207:27017, member 0 is still PRIMARY.(For the sake of inevitable chance, you can test multiple times)