当前位置:网站首页>Realization of data warehouse technology
Realization of data warehouse technology
2022-07-21 03:27:00 【Aisxi】
1 Traditional warehouse :
Developed from stand-alone database , It is composed of relational database MPP( Large scale parallel processing ) colony
Split a large table into nodes for storage , Sub database and table storage ( Hash )
When the data volume is small , Superior performance 、 Once the amount of data reaches a certain level , There are limitations .
shortcoming : 1) Limited scalability ,2) Hotspot issues ( Data skew )-- It can be solved by adding salt to the data
2 Big data warehouse
Relying on big data technology , Take advantage of the natural scalability of big data , Complete the storage of massive data .
take SQL Convert to big data computing engine task , Complete data analysis .
Concurrent computing
advantage :
Scalable , Distributed file system split storage , When calculating, restore the file to the original table structure ;
Security : More secure
shortcoming :
1 SQL rate of support
2 Things support
3 When the amount of data does not reach a certain level , Low computational efficiency .
MPP The difference between architecture and distributed structure
MPP :
- Common technical architecture of traditional data warehouse , Stand alone database nodes form a cluster , Improve overall processing performance
- The nodes are non shared (share nothing), Each node has an independent disk storage system and memory system
- No data nodes are connected to each other through private networks or commercial general Networks , Work together to calculate , Provide services as a whole
- Design priority C, Second, consider A, Try to be good at P
advantage :
The operation method is fine , Low latency , Low throughput
Suitable for medium-sized structured data processing
shortcoming :
The storage location is opaque , Determine the physical node of the data through hash , The query task is executed on all nodes * Architecture decision )
In parallel computing , The single node bottleneck will become a short board of the whole system , Low fault tolerance
The time limit of distributed transactions will reduce scalability
Distributed architecture (hadoop framework / Batch Architecture )
- Each node realizes site autonomy ( Local applications can be run separately ), Data is shared globally and transparently in the cluster ( And MPP The biggest difference )
- Each node is connected through Wan or LAN , The communication overhead between nodes is large , Strive to reduce data movement in Computing ( Mobile computing, not mobile data )
- Give priority to P( Partition tolerance ), then A( Usability ), Last C( Uniformity ).
A piece of data will be split into multiple pieces , A shard will store multiple copies , Solve the single point of failure problem .
The distributed processing method is rough ( Store data as a file ),mpp It's fine .
advantage :
It's very scalable , Handle structured and semi-structured data , Suitable for massive data storage and processing .
MPP+ Distributed architecture
Storage layer : Distributed storage , Adopted by upper layer MPP Fine processing , Reduce the delay .
边栏推荐
猜你喜欢
随机推荐
nvm安装使用
使用swiper4平滑纵向无间隙滚动,鼠标点击或拖动后,动画未全部完成,鼠标移出 自动轮播失效,以及动态渲染数据,动画紊乱
Virtual DOM 的实现原理
antd mobile 表单验证 rc-form 使用
关于编码(ASCII / Unicode / UTF8)
SQL处理数据 同期群分析
vscode拷贝 同步插件 拓展
Damon ODBC installation
【深度学习】运用mnist数据集实例化一个简单的卷积神经网络
(六)PyTorch深度学习:加载数据集
(七)PyTorch深度学习:全连接层网络
二分图
Question B: the real topic of the 11th Bluebridge cup 2020 - palindrome date
xshell安装完,启动报错:由于找不到 mfc110.dll,无法继续执行代码。重新安装程序可能会解决此问题
Flink DataStream API (十四)Flink 输出到 MySQL(JDBC)
(the 13th Blue Bridge Cup) test question a: Nine decimal to decimal
User login demo
ClickHouse启动失败_Unit clickhouse-server.service entered failed state
(八)PyTorch深度学习:卷积神经网络(基础)---将(七)全连接神经网络改成卷积神经网络
达梦DEM的部署