当前位置：网站首页>Redis cluster details

Redis cluster details

2022-07-20 21:58:00 【bseayin】

Let's look at a picture , Roughly touch Redis Cluster

Redis Cluster The requirement requires at least 3 individual master To form a cluster , At the same time, each master There needs to be at least one slave node . Between nodes TCP signal communication . When master There was a crash , Redis Cluster The corresponding slave Node promotion to master, To provide services again .

Redis Cluster function ： Load balancing , Fail over , Master slave copy .

Load balancing

Let's start with the next slot , Each of the clusters redis Each instance is responsible for taking over part of the slot , The total number of slots is ：16384（2^14）, If there is 3 platform master, Then each machine is responsible for 5461 Slot （16384/3）.

redis node	Responsible slot
node 1	0-5461
node 2	5461-10922
node 3	10922-16383

When redis When the client sets the value , Can take key Conduct CRC16 Algorithm , then Follow 16384 modulus , What you get is which slot you fall in , According to the above table, we can get which node . The slot formula is as follows ：

slot = CRC16(key) & 16383

Redis In the cluster , Each node will have other nodes ip, Responsible tank etc. Information .

JedisCluster How to address clusters ？

JedisCluster The configuration only specifies a node in the cluster IP, Port information is ok .JedisCluster On initialization , You will find the configured node to get the information of the whole cluster （cluster nodes command ）.

Analyze cluster information , Get all in the cluster master Information , Then traverse each master, adopt ip, Port building jedis example , then put To an overall situation nodes In variables （Map type ） , key by ip, port , The value is Jedis example ,nodes Values are as follows ：

nodes={172.19.93.120:[email protected],.....}

Traverse above master In the process , One more thing , Traverse this platform master Responsible slot index , And then put To an overall situation map slots Inside . The value is above Jedis example , slots Values are as follows ：

slots={[email protected],
[email protected],
[email protected],
....
5461 = [email protected],    #### additional master machine 
....
[email protected]}

With the top slots Variable , When there is value set when , I'll figure it out first slot = getCRC16(key)&(16383-1), If so 12182 , And then call slots.get(12182) obtain jedis example , And then to operate redis.

If you find that MovedDataException, It indicates that there is a problem with the corresponding relationship between the initialized slot and the node ,（ New nodes or downtime ） It will reset slots.

Communication between cluster machines

There are usually two ways for data information such as cluster machines , One is centralized , such as springcloud The service cluster information is saved in the configuration center . The other way is redis The way ,gossip.

Centralized ： The advantage is that , Update and read metadata , Very good timeliness , Once the metadata changes , Immediately update to centralized storage , When other nodes read, they can immediately sense ; The bad thing is , All the pressure of updating metadata is concentrated in one place , May cause pressure on metadata storage .

gossip： The advantage is that , Metadata updates are scattered , Not in one place , Renewal requests will continue , Hit all nodes to update , There is a certain delay , It reduces the pressure ; shortcoming , Metadata update has a delay , Some operations of the cluster may be delayed .

The port of communication is itself redis Listening port +10000 , such as Listening port 6379, The communication port is 16379 .

Gossip The main responsibility is to exchange information . The carrier of information exchange is sent by nodes Gossip news , frequently-used Gossip The message can be divided into ：ping news 、pong news 、meet news 、fail News, etc. .

meet news ： Used to notify new nodes to join . The sender informs the receiver to join the current cluster ,meet After the message communication is completed normally , The receiving node will join the cluster and perform periodic ping、pong The message exchange .
ping news ： The most frequently exchanged messages in the cluster , Each node in the cluster sends messages to multiple other nodes per second ping news , It is used to detect whether nodes are online and exchange status information with each other .ping Message sending encapsulates the status data of its own node and some other nodes .
pong news ： When receiving ping、meet When the news , As a response message, reply to the sender to confirm that the message communicates normally .pong The message encapsulates its own state data . Nodes can also broadcast their messages to the cluster pong Message to inform the whole cluster to update its status .
fail news ： When a node decides that another node in the cluster is offline , It will broadcast a fail news , Other nodes receive fail After the message, update the corresponding node to offline status .

For example, when adding a new node , That is to say Meet Message process

node A Will be for the node B Create a clusterNode structure , And add the structure to your own clusterState.nodes In the dictionary .
node A according to CLUSTER MEET The command is given IP Address and port number , To the node B Send a MEET news .
node B Received node A Sent MEET news , node B Will be for the node A Create a clusterNode structure , And add the structure to your own clusterState.nodes In the dictionary .
node B To the node A Return a PONG news .
node A Will be affected by the node B Back to PONG news , Through this article PONG Message node A You can know the node B You have successfully received your own MEET news .
after , node A To the node B Return a PING news .
node B The received nodes A Back to PING news , Through this article PING Message node B You can know the node A Has successfully received their own return PONG news , The handshake is complete .
after , node A The node will be B Information through Gossip The protocol is propagated to other nodes in the cluster , Let other nodes also be associated with nodes B A handshake , Final , After a period of time , node B Will be recognized by all nodes in the cluster .

For example, when a node fails , How to judge offline

Every node in the cluster sends... To other nodes on a regular basis ping command , If you accept ping The node of the message did not reply within the specified time pong, Then send ping The node will accept ping The nodes of are marked as Subjective offline .

If more than half of the primary nodes of the cluster will be primary nodes A Mark as subjective offline , The node A Will be marked as objective offline （ Broadcast through nodes ） That is, offline .

Fail over

When a slave node finds that the master node it is copying has entered the offline state , The following node will initiate failover for the offline master node , Here are the steps to perform failover ：

The slave node performs SLAVEOF no one command , Becomes the new master node ;
The new master node will cancel all slot assignments to the offline master node , And assign all these slots to yourself ;
The new master node broadcasts a message to the cluster PONG news , This article PONG The message can let other nodes in the cluster know immediately that this node has changed from a slave node to a master node , And this master node has taken over the slot that was handled by the offline node .
The new master node begins to receive command requests related to the slot it is responsible for processing , Failover complete .

Master slave copy

Simple steps of master-slave replication

Two fields are maintained from within the node server , namely masterhost and masterport Field , Used to store the master node ip and port Information .
slave There's a scheduled task inside , Every time 1s Check for new master To connect and copy , If you find , Just follow master establish socket network connections .
password authentication - if master Set up requirepass, that salve Must be sent at the same time masterauth Password authentication for
master For the first time, perform full replication , Send all data to slave .（run id It's different to make a full copy ）
master Orders will continue to be written later , Asynchronously replicate to slave.

The full replication process

After the master node receives the command of full replication , perform bgsave, Generate in the background RDB file , And use a buffer （ Called replication buffer ） Record all write commands executed from now on .
The master node bgsave After execution , take RDB The file is sent to the slave node ; From the node first clear their old data , Then load the received RDB file , Update the database status to the master node to execute bgsave The database state at .
The master node sends all the write commands in the aforementioned copy buffer to the slave node , Execute these write commands from the node , Update the database state to the latest state of the master node .
If the slave node is turned on AOF, It triggers bgrewriteaof Implementation , To ensure that AOF The file is updated to the latest state of the master node .

Server running ID(runid)

Every Redis node ( Whether it's Master-Slave ), A random will be generated automatically at startup ID( It's not the same every time you start ), from 40 Random hexadecimal characters ;runid Used to uniquely identify a Redis node . adopt info Server command , You can view the runid：

When the master and slave nodes replicate for the first time , The main node will be its own runid Send to slave , Take this from the node runid Save up ; When it's disconnected and reconnected , This node will be runid Send to master ; The master node is based on runid Determine whether full replication can be carried out ：

If you save from the node runid With the master node now runid Different , Indicates that the slave node is synchronized before disconnection Redis The node is not the current master node , Make full copies .

An interview question from Tencent

Redis Let's talk about the working principle of cluster mode ？ In cluster mode ,key How to address ？ What are the algorithms for addressing ？ Understanding consistency hash Do you ？

What are the algorithms for addressing

hash Algorithm

according to key Of hash Value and then take the number of modular nodes , hash(key)% Number of nodes .

shortcoming ： When the node is down or new , It will cause the number of nodes to change , All data should be recalculated .

redis cluster Of hash slot Algorithm

It has been said that

Uniformity hash Algorithm

Uniformity hash The algorithm uses a method called consistency hash Data structure implementation of ring , The integer distribution range of the ring is ( 0 , 1 , 2 , 3 … 2^32-1 ) , Here's the picture ：

Suppose now we have 4 Objects , Respectively o1,o2,o3,o4, Use hash Function to calculate this 4 Object's hash value （ The scope is 0 ~ 2^32-1）:

hash(o1) = m1
hash(o2) = m2
hash(o3) = m3
hash(o4) = m4

take m1,m2,m3,m4 Fall in the hash On the ring ：

Suppose we have c1,c2,c3 Three machines , Use them respectively ip Address access hash：

hash(c1 Of ip) = t1
hash(c2 Of ip) = t2
hash(c3 Of ip) = t3

take t1,t2,t3 Fall in the hash On the ring ：

stay hash Find the distance from the object clockwise on the ring hash The nearest machine , It's the machine that this object belongs to . As shown in the figure above ：

o1[m1] The object falls on t3[c3] On the machine
o2[m2] The object falls on t1[c1] On the machine
o3[m3] The object falls on t2[c2] On the machine
o4[m4] The object falls on t2[c2] On the machine

New machines

Pictured above , We have added c4 machine , It is calculated that hash On the ring t4 Location , Now just reorganize o4 The object falls back on c4 On the machine ok 了 , Other objects are still on the original machine .

Downtime

Pictured above , We c1 It's down. , be o2 It needs to be reorganized to c3 On the machine , Other objects are still on the original machine .

Hash Data skew of the ring

Uniformity Hash When the number of service nodes is too small , It is easy to cause data skew due to uneven node segments （ Most of the cached objects are cached on a certain server ） problem , For example, there are only two servers in the system , The ring distribution is as follows ：

At this time, a large number of data will be collected to Node A On , And only a very small number will be able to locate Node B On . To solve this data skew problem , Uniformity Hash The algorithm introduces the virtual node mechanism .

Virtual node

It is to map multiple virtual nodes to real machines , So in hash It seems that there are many machine nodes on the ring . The specific method can be found in the server IP Or add a number after the host name to achieve .

For example, the above situation , You can compute three virtual nodes for each server , So we can calculate “Node A#1”、“Node A#2”、“Node A#3”、“Node B#1”、“Node B#2”、“Node B#3” Hash value of , So six virtual nodes are formed ：

At the same time, the data location algorithm remains unchanged , It's just one more step from virtual node to actual node , For example, positioning to “Node A#1”、“Node A#2”、“Node A#3” The data of the three virtual nodes are located at Node A On . This solves the problem of data skew when there are few service nodes . in application , The number of virtual nodes is usually set to 32 Even larger , Therefore, even a few service nodes can achieve relatively uniform data distribution .

More learning materials Please follow the WeChat public account

Personal learning wechat official account