当前位置:网站首页>Creation of sparksql dataset
Creation of sparksql dataset
2022-07-22 19:54:00 【Erha of Xiaowu family】
Spark SQL framework
- Spark SQL yes Spark One of the core components of (2014.4 Spark1.0)
- Direct access to existing Hive data
- Provide JDBC/ODBC Interface for third-party tools Spark Data processing
- It provides a higher-level interface to process data conveniently
- Support multiple operation modes :SQL、API Programming
- Support multiple external data sources :Parquet、JSON、RDBMS etc.
- Catalyst The optimizer is Spark SQL At the heart of
Dataset The creation of
stay Spark shell When created in , Packages that need to be imported :
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import spark.implicits._
import org.apache.spark.sql.types._
Creation method 1 :
val dt=spark.createDataset(1 to 5)
dt.show
Creation mode 2
spark.createDataset(List(("a",1),("b",2),("c",3))).show
Custom column names .
Creation method 3 :
Can be in Dataset put RDD.
spark.createDataset(sc.parallelize(List(("a",1,2),("b",2,3),("c",3,4)))).show
Conclusion :
createDataset() The parameter of can be :Seq、Array、RDD;
The above three lines of code generate Dataset Namely :
Dataset[Int]、Dataset[(String,Int)]、Dataset[(String,Int,Int)];Dataset=RDD+Schema, therefore Dataset And RDD There are most common functions , Such as map、filter etc.
Use Cass Class establish Dataset
case class Point(name:String,age:Int,score:Int)
val stu=Seq(Point("zs",18,90),Point("ls",19,85))
// convert to Dataset
val stuInfo=stu.toDS
stuInfo.show
This method creates Dataset It can correspond to the corresponding column name .
In addition to calling the sample class at creation , We can also go through map To name the column .
val stu2=spark.createDataset(List(("sam",15,79),("john",17,80)))
stu2 Data and type :
Use map function :
val stu3=stu2.map(x=>Point(x._1,x._2,x._3))
stu3.show
Use select see :
stu3.select("name","score").show
Use SQL Statement to query information
In addition to the above methods, you can query information , We can also use SQL Syntax to query data information :
Spark DataFrame Provides registerTempTable Such an interface , You can save data objects as temporary tables , It is convenient for subsequent query operations . Such as select, join etc. .
stu3.registerTempTable("info")
spark.sql("select * from info").show
Add :
Scala In the class Add... Before keywords case keyword This class becomes the sample class , The difference between sample class and ordinary class :
- Unwanted new You can generate objects directly
- The default implementation is serialization interface
- Default auto overwrite toString()、equals()、hashCode()
边栏推荐
- Data architecture and database modeling
- shell script “<< EOF”我的用途和遇到的问题
- 关与 @EnableConfigurationProperties 注解
- How can ZABBIX customize MySQL monitoring items and trigger alarms
- MySQL创建分区表,并按天自动分区
- Modify the contents of /etc/crontab file directly, and the scheduled task will not take effect
- Atr5179 single pole double throw switch chip replaces as179-92lf
- Ci24r1 low-cost 2.4G wireless transceiver chip replaces xn297 compact si24r1
- Dnsmasq installation and configuration
- HTB- Armageddon
猜你喜欢
It is found that the MariaDB database is 12 hours late, and the xxljob scheduled task scheduling is abnormal
Domestic stereo audio frequency d/a converter dp4344 replaces compatible cs4344
zabbix怎样自定义mysql监控项并触发告警
发现mariadb数据库时间晚了12个小时,xxljob定时任务调度异常
从数据标准到数据库设计:解决基础数据标准落地的最后一公里难题(下)
JVM-VisualVM:多合-故障处理工具
记一次优化我的个人博客
作为初学者,我表示不想使用ESLint
数据架构与数据库建模
SVN服务端与客户端安装(汉化包)以及简单使用
随机推荐
Spark:图(Graph)
NewSQL数据库数据模型设计
解决Couldn‘t determine repo type for URL
From data standards to database design: solve the last mile problem of basic data standards (Part 2)
从数据标准到数据库设计:解决基础数据标准落地的最后一公里难题(上)
Vs Code常用快捷键
登录页面的代码
Spark GraphX 中的 pregel函数(转载)
C regular expression extracts the index position where the specified word appears
Spark RDD算子:分区操作,mapPartitions和mapPartitionsWIthIndex
Firewall CMD common operation commands
Structure, enumeration, joint blog tutorial
【TA-霜狼_may-《百人计划》】图形3.3 曲面细分与几何着色器 大规模草渲染
关与 @EnableConfigurationProperties 注解
数据仓库模型设计与工具
shell中小数运算(bc)
Why choose b+ tree for index
ES6新特性分享(完结)
NIO字符集和Charset
ES6 new features sharing (IV)