当前位置:网站首页>Under what circumstances is it necessary for enterprises to introduce distributed databases?
Under what circumstances is it necessary for enterprises to introduce distributed databases?
2022-07-21 00:54:00 【Software testing network】
One 、“ The form is scattered and the spirit is gathered ” Distributed database
Many people habitually check Baidu Encyclopedia before doing research , We also follow :“ Distributed database systems usually use smaller computer systems , Each computer can be placed in a separate place , There may be... In every computer DBMS A complete copy of , Or some copies , And has its own local database , Many computers in different places are connected to each other through the network , To form a whole 、 The logic of the whole is centralized 、 Large databases that are physically distributed .”
Next , Let's take a look at the understanding of it in the industry , Prepared under the leadership of China Software Evaluation Center 《 Research on the development path of distributed database 》 It describes :“ According to the current situation of distributed database technology in China , We believe that distributed database has the ability of distributed transaction processing 、 Can be extended smoothly 、 A logically unified database distributed in computer networks , With distributed transaction processing 、 Smooth expansion and physical distribution 、 Logical unity and other characteristics .”
In short , We think we should use “ The form is scattered and the spirit is gathered ” To describe the characteristics of distributed database is the most appropriate . The so-called shape dispersion refers to the computing resources it shows 、 Distribution space 、 Interconnection topology and other forms , The so-called Shenju refers to the data processing ability that it finally completes at the functional level .
Two 、 The development history of distributed database
We won't talk too far , Start with relational databases .20 century 70 years ,IBM Company researcher E.F.Code A relational model is proposed for the first time , Ushered in the era of relational databases .80 s , The first batch of commercial relational databases began to be born , for example Oracle、DB2、SQL Server etc. ,90 years , An engineer from Finland Michael Widenius It has launched the MySQL, At the same time PostgreSQL It's also the birth of .2000 After year , But as the amount of data increases , The database bottleneck of a single machine can no longer meet the demand for large amounts of data , At this time, various schemes of sub database and sub table began to emerge .2006 In, Google published three papers , It is also known as the big data Troika “GFS、Big Table、Map-Reduce”. The ideas of these three papers were born Hadoop ecology , It also paves the way for distributed databases .2012 In, Google published two more papers , Namely Spanner and F1, It provides a theoretical basis for solving the global transactions and data splitting of distributed databases . Then there will be the distributed attempt and development of many domestic Internet companies , Alibaba 、 tencent 、 Baidu 、 Bytes to beat 、 Meituan 、 sound of dripping water 、 Well quickly 、 You know 、58 And other Internet companies have begun to use and put their own use and research and development results into products to the market , Today , The follow-up and general promotion stage of all walks of life with the financial industry as the leader .
3、 ... and 、 What problems can distributed database solve ?
1. What difficulties do centralized relational databases encounter ?
(1) Processing capacity of data volume
In fact, we can see from the development history of distributed database , It is the birth and development of distributed databases spawned by big data . The most fundamental problem is that the upgrading of data volume has led to great challenges for traditional relational databases . Traditional relational databases are dealing with GB、TB Magnitude data can still be handled , But once we get there PB And above data processing , Even if the technology of stand-alone hardware develops by leaps and bounds , The processing capacity of a single node alone will never achieve the efficiency goal required by the business .
(2) High concurrency of business
With the development of Internet , From the initial e-commerce to the current various Internet models ( Industrial Internet 、 online finance 、 Internet social networking 、...), The database supporting these businesses must have high concurrent processing capability for decentralized business requests , At the same time, the basic security attributes of data must be guaranteed . This is also persistence CAP In theory C&A Features that the ultimate relational database does not have .
(3) Scalability of data and architecture
With the high concurrency of data volume and business access , The inevitable result is that data inflation is faster than ever , And it can't be predicted accurately . Another result is the matching improvement of data processing ability . However, it is difficult for database products based on traditional centralized architecture to meet the actual needs with the advantage of point-based vertical resources , This requires the database to have the ability to expand horizontal resources from the architecture to the data carrier , And it's safe 、 ordinary 、 fast .
(4) The adaptability of data processing to emergencies
Internet development to today , Almost all walks of life have carried out industrial upgrading , More and more businesses rely on the Internet , And the Internet has spawned many new industries and economies . There are too many uncertain events on the Internet every day , With its rapid network transmission benefits and influence breadth , It is likely that relevant businesses in some industries will be affected , For example, star events . Then the carrying capacity of the information system and the processing capacity of data will be tested in an instant , This requires that the database also has a strong adaptability .
(5) Matching of data model and access
In the age of centralized relational databases , The demand for data in all walks of life is basically in the form of structured two-dimensional tables , Supplemented by a small amount of unstructured or semi-structured data . But in this era of rapid development and change of data , Data from the representation 、 Visit features 、 Great changes have taken place in access efficiency . Formally , Expand from two-dimensional table model to document 、 Key value 、 Determinant 、 Diagram and other types ; Visit features , Read only but not write 、 Only write but not read and other special businesses ; Access efficiency , There are various massive retrieval services that need memory level efficiency . This requires matching the correct database type according to the data model and access characteristics , We can no longer use general thinking .
3.2 Why can distributed database technology break out of the dilemma ?
Before analyzing why distributed database technology can solve the problems that traditional relational database can't solve , We need to make it clear that the distributed database we are talking about is not a database or a class of database , It should have “ The form is scattered and the spirit is gathered ” A collection of all databases for the feature .
First , With the “ The form is scattered and the spirit is gathered ” Feature database products , It can aggregate distributed computing resources through the network , Form a logical whole with independent data storage and processing capabilities , It also has the ability to process massive data .
secondly , With the “ The form is scattered and the spirit is gathered ” Feature database products , The pursuit is CAP In theory A&P, Lowered the right C The expectation of . This gives up strong consistency , Weak consistency that comes second , It must have the ability to transform the processing of data from physical centralization to logical centralization , It also has the processing ability of high concurrency .
Again , With the “ The form is scattered and the spirit is gathered ” Feature database products , Naturally, it has good expansibility and adaptability . Because the physical nodes of this kind of database are decentralized , It depends on the software mechanism of the database itself to combine them to form an organic whole . therefore , Adding or reducing nodes or capacity is a normal operation for it , Just consider the magnitude and performance of data migration in the process of expansion and change .
Last , Distributed database itself is not a product or a kind of product , Among these products , From data model to data access features, there will be many dedicated database products , For example, documents MongoDB, For example, support memory access Redis, For example, it supports big data processing Hbase. Compared with traditional relational databases , These distributed databases are actually more focused on the data processing capabilities of some special data models or data access scenarios , So in this sense , Distributed database is more suitable for data processing in some special scenarios , It is more compatible with special scenes .
3.3 Distributed database technology can't solve any problems ?
Since distributed database has so many advantages , Then is it omnipotent ?
First , From the concept of distributed database , It does not focus on general scenarios , Instead, it focuses on some special data access scenarios , Then take it to the general scene or other scenes that do not match its attributes , It must have many defects . For example, the complexity and rationality of data migration algorithm 、 Data model mismatch 、 The defect of data persistence and so on . But in terms of the analysis of the technical characteristics of a special scene , It is bound to find more suitable distributed database products . But the distributed database “ The form is scattered and the spirit is gathered ” In terms of common characteristics , Is there a scenario that cannot be found in the distributed database “ food ” Well ?
How to be successful and how to be defeated , The advantages lie in “ The form is scattered and the spirit is gathered ” On , The fatal flaw is also on this . This feature inevitably leads to its failure to complete its mission in a business scenario with strict transactional requirements . Although people continue to make up for this through subtle solutions , for example “ Two stage transaction processing scheme ”, But this can also be reluctantly adopted in some business scenarios with transaction tolerance . Transaction business scenarios that require zero tolerance for transactions , We have to go back to the traditional centralized relational database .
Four 、 How should enterprises think about the road of distributed database ?
in summary , Enterprises make technical choices on how to choose distributed databases , I think we should follow the following principles :
1. Based on the data business scenario , Choose the technical route .
No technical route can absolutely represent the future trend , Any technical route serves the needs of business scenarios . So when we choose the technical route , It is necessary to analyze the data model characteristics of business scenarios 、 Data access characteristics and data access efficiency are three aspects to analyze the attributes of requirements , Then use the results of these analyses to match the appropriate database technology route .
2. No superstitious propaganda , Believe in your own technical analysis and testing .
Controlling the technical route is a very serious matter . Third party evaluation and manufacturer publicity conclusion , But these can only be used as reference , Not to mention the impact of advertising benefits , As far as model selection is concerned , Others' choices may not be suitable for their own enterprises , Even if the industry is similar, there is also a distinction between the size of the data volume and the number of visits . Therefore, on the basis of extensive reference, it is still necessary to analyze and test .
3. Don't choose the most advanced , Choose only the most appropriate .
Business scenarios have different requirements and priorities for database capabilities . It is difficult to choose a general-purpose product to meet the whole scene , Then we need to make targeted choices according to the actual situation , The database product suitable for your own scenario is the best product . Don't think that a certain technical feature is advanced , Representing the future development trend .
边栏推荐
- How to export tables using MySQL statements
- What is the reason why the easycvr video Plaza device list cannot be scrolled and loaded?
- Dynamic memory management + flexible array
- CCTV news news channel "Nanjing opens catering quota invoice by hand"_ People's network
- Makefile详解
- Hollysys le5107_ LE5106_ Free port agreement
- 央视新闻《北京开餐饮手撕定额发票》新闻频道_人民网
- Voulez - vous assurer la sécurité des logiciels à faible coût? Cinq missions de sécurité méritent d'être examinées
- 动态内存管理+柔性数组
- C # understand these 100 + lines of code, and you will really get started (Classic)
猜你喜欢
Openvino model learning - from model to pipeline production
容易被忽视的五个安全环节,比想象中的更危险!
Codeworks 5 questions per day (average 1500) - day 20
How to check whether win11 can be upgraded to 22h2? How to upgrade 22h2 in win11
【JVM 系列】JVM 对象的分配策略
API策略因何成为企业数字化转型的制胜法宝?
C # understand these 100 + lines of code, and you will really get started (Classic)
信息系统项目管理师核心考点(四十六)采购工作说明书(SOW)
Voulez - vous assurer la sécurité des logiciels à faible coût? Cinq missions de sécurité méritent d'être examinées
What is the win11 staging folder? Where is the win11 online upgrade staging folder
随机推荐
央视新闻《济南开餐饮手撕定额发票》新闻频道_人民网
CCTV news "Beijing opens accommodation quota invoice by hand" news channel_ People's network
How to check whether win11 can be upgraded to 22h2? How to upgrade 22h2 in win11
央视新闻《北京开餐饮手撕定额发票》新闻频道_人民网
CCTV news news news channel of Shenzhen catering manual tearing quota invoice_ People's network
SAP ABAP字符和字符串变量隐式转换的一些规则
Learun, open source, one Net web quick open
C asynchronous programming read this article is enough
How can easycvr solve RTMP offline caused by restarting after configuring RTMP streaming?
CCTV news news news channel "Hangzhou opens catering quota invoice by hand"_ People's network
STM32移植LVGL8.2
数据仓库开发 SQL 使用技巧总结
What are the differences and connections between cloud computing and edge computing?
Want to ensure software security at low cost? Five safety tasks worth considering
[leetcode] split the basic template and find the left and right boundaries
CCTV news "Qingdao opens catering quota invoice by hand" news channel_ People's network
Sweetalert notes - add input box pictures, etc. in the pop-up window
Voulez - vous assurer la sécurité des logiciels à faible coût? Cinq missions de sécurité méritent d'être examinées
模型压缩-方案(一)-参数共享:ALBERT、BERT-of-Theseus
Information system project manager core examination center (46) procurement statement of work (SOW)