当前位置：网站首页>Spark FAQs

Spark FAQs

2022-07-22 20:50:00 【roo_ one】

Catalog

RDD What are the aspects of elasticity
The most complete in history spark Interview questions
spark Operation process

RDD What are the aspects of elasticity

Reference resources 1：
RDD The elasticity of is reflected in the calculation , When Spark When calculating , Data loss or failure occurs at a certain stage , Can pass RDD The blood relationship can be repaired .
1、 Memory elasticity ： Automatic switch between memory and disk
2、 The resilience of fault tolerance ： Data loss can be recovered automatically
3、 Elasticity of calculation ： Calculation error retrial mechanism
4、 The elasticity of slices ： Regroup as needed
Reference resources 2：
1. Automatically switch between memory and disk
2. be based on lineage High efficient fault tolerance of
3.task If it fails, it will retry a certain number of times
4.stage If it fails, it will automatically retry a certain number of times , And only the failed fragments will be calculated
5.checkpoint【 Every time the RDD Operation will produce new RDD, If the chain is long , The calculation is cumbersome , Just put the data in the hard disk 】 and persist 【 Reuse data in memory or disk 】( checkpoint 、 Persistence )
6. Data scheduling flexibility ：DAG TASK It has nothing to do with resource management
7. High elasticity of data fragmentation repartion
Reference resources ：RDD What are the aspects of elasticity
Spark Master/Driver Will save RDD Upper Transformations. thus , If a RDD The loss of （ That is to say salves Break down ）, It can be quickly and easily transferred to the surviving hosts in the cluster . This is the same. RDD The elasticity of .

RDD There's a dependency , It can be traced back to . Build into DAG, This DAG Will create many stages , These stages are called stage,RDDstage There will be dependencies between , Later, we will build according to the previous dependencies , If the previous data is lost , It will remember the previous dependencies , Restore from the front . Each operator produces a new RDD.spark Medium DAG Namely rdd Internal transformation relationship , These transformation relationships will be transformed into dependencies , Then it is divided into different stages , So as to describe the sequence of tasks .