当前位置：网站首页>Under what circumstances is it necessary for enterprises to introduce distributed databases?

Under what circumstances is it necessary for enterprises to introduce distributed databases?

2022-07-21 00:54:00 【Software testing network】

One 、“ The form is scattered and the spirit is gathered ” Distributed database

Many people habitually check Baidu Encyclopedia before doing research , We also follow ：“ Distributed database systems usually use smaller computer systems , Each computer can be placed in a separate place , There may be... In every computer DBMS A complete copy of , Or some copies , And has its own local database , Many computers in different places are connected to each other through the network , To form a whole 、 The logic of the whole is centralized 、 Large databases that are physically distributed .”

Next , Let's take a look at the understanding of it in the industry , Prepared under the leadership of China Software Evaluation Center 《 Research on the development path of distributed database 》 It describes ：“ According to the current situation of distributed database technology in China , We believe that distributed database has the ability of distributed transaction processing 、 Can be extended smoothly 、 A logically unified database distributed in computer networks , With distributed transaction processing 、 Smooth expansion and physical distribution 、 Logical unity and other characteristics .”

In short , We think we should use “ The form is scattered and the spirit is gathered ” To describe the characteristics of distributed database is the most appropriate . The so-called shape dispersion refers to the computing resources it shows 、 Distribution space 、 Interconnection topology and other forms , The so-called Shenju refers to the data processing ability that it finally completes at the functional level .

Two 、 The development history of distributed database

We won't talk too far , Start with relational databases .20 century 70 years ,IBM Company researcher E.F.Code A relational model is proposed for the first time , Ushered in the era of relational databases .80 s , The first batch of commercial relational databases began to be born , for example Oracle、DB2、SQL Server etc. ,90 years , An engineer from Finland Michael Widenius It has launched the MySQL, At the same time PostgreSQL It's also the birth of .2000 After year , But as the amount of data increases , The database bottleneck of a single machine can no longer meet the demand for large amounts of data , At this time, various schemes of sub database and sub table began to emerge .2006 In, Google published three papers , It is also known as the big data Troika “GFS、Big Table、Map-Reduce”. The ideas of these three papers were born Hadoop ecology , It also paves the way for distributed databases .2012 In, Google published two more papers , Namely Spanner and F1, It provides a theoretical basis for solving the global transactions and data splitting of distributed databases . Then there will be the distributed attempt and development of many domestic Internet companies , Alibaba 、 tencent 、 Baidu 、 Bytes to beat 、 Meituan 、 sound of dripping water 、 Well quickly 、 You know 、58 And other Internet companies have begun to use and put their own use and research and development results into products to the market , Today , The follow-up and general promotion stage of all walks of life with the financial industry as the leader .

3、 ... and 、 What problems can distributed database solve ？

1. What difficulties do centralized relational databases encounter ？

（1） Processing capacity of data volume

In fact, we can see from the development history of distributed database , It is the birth and development of distributed databases spawned by big data . The most fundamental problem is that the upgrading of data volume has led to great challenges for traditional relational databases . Traditional relational databases are dealing with GB、TB Magnitude data can still be handled , But once we get there PB And above data processing , Even if the technology of stand-alone hardware develops by leaps and bounds , The processing capacity of a single node alone will never achieve the efficiency goal required by the business .

（2） High concurrency of business

With the development of Internet , From the initial e-commerce to the current various Internet models （ Industrial Internet 、 online finance 、 Internet social networking 、...）, The database supporting these businesses must have high concurrent processing capability for decentralized business requests , At the same time, the basic security attributes of data must be guaranteed . This is also persistence CAP In theory C&A Features that the ultimate relational database does not have .

（3） Scalability of data and architecture

With the high concurrency of data volume and business access , The inevitable result is that data inflation is faster than ever , And it can't be predicted accurately . Another result is the matching improvement of data processing ability . However, it is difficult for database products based on traditional centralized architecture to meet the actual needs with the advantage of point-based vertical resources , This requires the database to have the ability to expand horizontal resources from the architecture to the data carrier , And it's safe 、 ordinary 、 fast .

（4） The adaptability of data processing to emergencies

Internet development to today , Almost all walks of life have carried out industrial upgrading , More and more businesses rely on the Internet , And the Internet has spawned many new industries and economies . There are too many uncertain events on the Internet every day , With its rapid network transmission benefits and influence breadth , It is likely that relevant businesses in some industries will be affected , For example, star events . Then the carrying capacity of the information system and the processing capacity of data will be tested in an instant , This requires that the database also has a strong adaptability .

（5） Matching of data model and access

In the age of centralized relational databases , The demand for data in all walks of life is basically in the form of structured two-dimensional tables , Supplemented by a small amount of unstructured or semi-structured data . But in this era of rapid development and change of data , Data from the representation 、 Visit features 、 Great changes have taken place in access efficiency . Formally , Expand from two-dimensional table model to document 、 Key value 、 Determinant 、 Diagram and other types ; Visit features , Read only but not write 、 Only write but not read and other special businesses ; Access efficiency , There are various massive retrieval services that need memory level efficiency . This requires matching the correct database type according to the data model and access characteristics , We can no longer use general thinking .

3.2 Why can distributed database technology break out of the dilemma ？

Before analyzing why distributed database technology can solve the problems that traditional relational database can't solve , We need to make it clear that the distributed database we are talking about is not a database or a class of database , It should have “ The form is scattered and the spirit is gathered ” A collection of all databases for the feature .

First , With the “ The form is scattered and the spirit is gathered ” Feature database products , It can aggregate distributed computing resources through the network , Form a logical whole with independent data storage and processing capabilities , It also has the ability to process massive data .

secondly , With the “ The form is scattered and the spirit is gathered ” Feature database products , The pursuit is CAP In theory A&P, Lowered the right C The expectation of . This gives up strong consistency , Weak consistency that comes second , It must have the ability to transform the processing of data from physical centralization to logical centralization , It also has the processing ability of high concurrency .

Again , With the “ The form is scattered and the spirit is gathered ” Feature database products , Naturally, it has good expansibility and adaptability . Because the physical nodes of this kind of database are decentralized , It depends on the software mechanism of the database itself to combine them to form an organic whole . therefore , Adding or reducing nodes or capacity is a normal operation for it , Just consider the magnitude and performance of data migration in the process of expansion and change .

Last , Distributed database itself is not a product or a kind of product , Among these products , From data model to data access features, there will be many dedicated database products , For example, documents MongoDB, For example, support memory access Redis, For example, it supports big data processing Hbase. Compared with traditional relational databases , These distributed databases are actually more focused on the data processing capabilities of some special data models or data access scenarios , So in this sense , Distributed database is more suitable for data processing in some special scenarios , It is more compatible with special scenes .

3.3 Distributed database technology can't solve any problems ？

Since distributed database has so many advantages , Then is it omnipotent ？

First , From the concept of distributed database , It does not focus on general scenarios , Instead, it focuses on some special data access scenarios , Then take it to the general scene or other scenes that do not match its attributes , It must have many defects . For example, the complexity and rationality of data migration algorithm 、 Data model mismatch 、 The defect of data persistence and so on . But in terms of the analysis of the technical characteristics of a special scene , It is bound to find more suitable distributed database products . But the distributed database “ The form is scattered and the spirit is gathered ” In terms of common characteristics , Is there a scenario that cannot be found in the distributed database “ food ” Well ？

How to be successful and how to be defeated , The advantages lie in “ The form is scattered and the spirit is gathered ” On , The fatal flaw is also on this . This feature inevitably leads to its failure to complete its mission in a business scenario with strict transactional requirements . Although people continue to make up for this through subtle solutions , for example “ Two stage transaction processing scheme ”, But this can also be reluctantly adopted in some business scenarios with transaction tolerance . Transaction business scenarios that require zero tolerance for transactions , We have to go back to the traditional centralized relational database .

Four 、 How should enterprises think about the road of distributed database ？

in summary , Enterprises make technical choices on how to choose distributed databases , I think we should follow the following principles ：

1. Based on the data business scenario , Choose the technical route .

No technical route can absolutely represent the future trend , Any technical route serves the needs of business scenarios . So when we choose the technical route , It is necessary to analyze the data model characteristics of business scenarios 、 Data access characteristics and data access efficiency are three aspects to analyze the attributes of requirements , Then use the results of these analyses to match the appropriate database technology route .

2. No superstitious propaganda , Believe in your own technical analysis and testing .

Controlling the technical route is a very serious matter . Third party evaluation and manufacturer publicity conclusion , But these can only be used as reference , Not to mention the impact of advertising benefits , As far as model selection is concerned , Others' choices may not be suitable for their own enterprises , Even if the industry is similar, there is also a distinction between the size of the data volume and the number of visits . Therefore, on the basis of extensive reference, it is still necessary to analyze and test .

3. Don't choose the most advanced , Choose only the most appropriate .

Business scenarios have different requirements and priorities for database capabilities . It is difficult to choose a general-purpose product to meet the whole scene , Then we need to make targeted choices according to the actual situation , The database product suitable for your own scenario is the best product . Don't think that a certain technical feature is advanced , Representing the future development trend .

原网站

版权声明
本文为[Software testing network]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/202/202207200052420202.html