当前位置：网站首页>Realization of data warehouse technology

Realization of data warehouse technology

2022-07-21 03:27:00 【Aisxi】

1 Traditional warehouse ：

Developed from stand-alone database , It is composed of relational database MPP（ Large scale parallel processing ） colony

Split a large table into nodes for storage , Sub database and table storage （ Hash ）

When the data volume is small , Superior performance 、 Once the amount of data reaches a certain level , There are limitations .

shortcoming ： 1） Limited scalability ,2） Hotspot issues （ Data skew ）-- It can be solved by adding salt to the data

2 Big data warehouse

Relying on big data technology , Take advantage of the natural scalability of big data , Complete the storage of massive data .

take SQL Convert to big data computing engine task , Complete data analysis .

Concurrent computing

advantage ：

Scalable , Distributed file system split storage , When calculating, restore the file to the original table structure ;

Security ： More secure

shortcoming ：

1 SQL rate of support

2 Things support

3 When the amount of data does not reach a certain level , Low computational efficiency .

MPP The difference between architecture and distributed structure

MPP ：

Common technical architecture of traditional data warehouse , Stand alone database nodes form a cluster , Improve overall processing performance
The nodes are non shared （share nothing）, Each node has an independent disk storage system and memory system
No data nodes are connected to each other through private networks or commercial general Networks , Work together to calculate , Provide services as a whole
Design priority C, Second, consider A, Try to be good at P

advantage ：

The operation method is fine , Low latency , Low throughput

Suitable for medium-sized structured data processing

shortcoming ：

The storage location is opaque , Determine the physical node of the data through hash , The query task is executed on all nodes * Architecture decision ）

In parallel computing , The single node bottleneck will become a short board of the whole system , Low fault tolerance

The time limit of distributed transactions will reduce scalability

Distributed architecture （hadoop framework / Batch Architecture ）

Each node realizes site autonomy （ Local applications can be run separately ）, Data is shared globally and transparently in the cluster （ And MPP The biggest difference ）
Each node is connected through Wan or LAN , The communication overhead between nodes is large , Strive to reduce data movement in Computing （ Mobile computing, not mobile data ）
Give priority to P（ Partition tolerance ）, then A（ Usability ）, Last C（ Uniformity ）.

A piece of data will be split into multiple pieces , A shard will store multiple copies , Solve the single point of failure problem .

The distributed processing method is rough （ Store data as a file ）,mpp It's fine .

advantage ：

It's very scalable , Handle structured and semi-structured data , Suitable for massive data storage and processing .

MPP+ Distributed architecture

Storage layer ： Distributed storage , Adopted by upper layer MPP Fine processing , Reduce the delay .

原网站

版权声明
本文为[Aisxi]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/202/202207200529328403.html

当前位置：网站首页>Realization of data warehouse technology

Realization of data warehouse technology

1 Traditional warehouse ：

2 Big data warehouse

MPP The difference between architecture and distributed structure

MPP ：

Distributed architecture （hadoop framework / Batch Architecture ）

MPP+ Distributed architecture

边栏推荐

猜你喜欢

随机推荐