当前位置:网站首页>temporal database
temporal database
2022-07-22 15:25:00 【Nice night】
List of articles
What is a time series database
Time series database Time Series Database (TSDB)
Time series data is a series of data generated over time , Simply speaking , Data with timestamp .
Although other databases can also process time series data to a certain extent when the data scale is small , but TSDB Data ingestion over time can be handled more effectively 、 Compression and aggregation . Take the Internet of vehicles scenario as an example ,20000 Vehicles , Each car 60 Indicators , Suppose you collect once per second , Then it will be reported every second 20000 * 60 = 1200000 Index value , namely 120W Data index value per second , The value of each indicator is 16 byte ( Assumptions include only 8 Byte timestamp and 8 Floating point number of bytes ), Then every hour will produce 64G Left and right data . In fact, each indicator value will be accompanied by additional data such as labels , The actual storage space required will be larger .
Time series database related concepts
Time series database is a database that deals with time series data , Therefore, its related concepts are closely related to time series data , Here are some basic concepts of time series database .
* Measure Metric:Metric Similar to tables in relational databases (Table), Represents a set of similar time series data , For example, build an air quality sensor Table, Store the monitoring data of all sensors .
* label Tag:Tag Describe the characteristics of the data source , It usually doesn't change over time , For example, sensor equipment , Including equipment DeviceId、 Where the equipment is located Region etc. Tag Information , The internal database will automatically be Tag Index , Support according to Tag To carry out multidimensional retrieval query ;Tag from Tag Key、Tag Value form , Both are String type .
* Time stamp Timestamp:Timestamp Represents the time point of data generation , Can be specified when writing , It can also be automatically generated by the system ;
* Measured value Field:Field Describe the measurement indicators of the data source , It usually changes over time , For example, the sensor device contains temperature 、 Humidity, etc Field;
* The data points Data Point: A measurement index value generated by the data source at a certain time (Field Value) It is called a data point , Database query 、 When writing, data points are used as statistical indicators ;
* Timeline Time Series : An indicator of the data source changes over time , Form a timeline ,Metric + Tags + Field Combine to determine a timeline ; The calculation of time series data includes downsampling 、 polymerization (sum、count、max、min etc. )、 Interpolation is based on the timeline dimension ;
Application scenario of time series database
The application scenario of time series database is in the Internet of things and the Internet APM There are many applications in such scenarios , Here are some application scenarios of time series database , But not all :
* Public safety : Online records 、 Call record 、 Individual tracking 、 Interval screening ;
* The power industry : Smart meters 、 Power grid 、 Centralized monitoring of power generation equipment ;
* Internet : The server / Application monitoring 、 User access logs 、 Ad Click log ;
* The Internet of things : The elevator 、 Boiler 、 mechanical 、 Water meters and other networking devices ;
* Transportation industry : Live traffic 、 Intersection flow monitoring 、 Bayonet data ;
* Financial industry : Transaction records 、 Access record 、ATM、POS Machine monitoring ;
Maybe except for this air conditioner , The next elevator project is also a timing database
Characteristics of time series database
It's invariant 、 Uniqueness 、 Time sequencing
Time series data is a series of data based on time . Connect these data points into a line in time coordinates , In the past, we can make multi latitude reports , Reveal its trend 、 Regularity 、 Anomalies ; In the future, we can do big data analysis , machine learning , Realize prediction and early warning .
Think that a time series database is a database that stores time series data , And it needs to support the fast writing of timing data 、 Persistence 、 Multi dimensional aggregation query and other basic functions .
Characteristics of data writing
- Write smooth 、 continued 、 High concurrency and high throughput : The writing of timing data is relatively stable , This is different from application data , Application data is usually proportional to the number of applications accessed , However, there are usually peaks and troughs in application traffic . Time series data is usually generated at a fixed time frequency , Not subject to other factors , The speed of data generation is relatively stable .
- Write less and read less. : Time series data 95%-99% All operations are write operations , It is typical to write more and read less data . This is related to its data characteristics , For example, monitoring data , You may have a lot of monitoring items , But you may actually read less , Usually only care about several specific key indicators or read data in specific scenarios .
- Write the recently generated data in real time , No updates : The writing of timing data is real-time , And each write is the most recently generated data , This is related to the characteristics of its data generation , Because its data generation advances over time , The newly generated data will be written in real time . Data write no update , In the dimension of time , Over time , Every time the data is new , There will be no updates to old data , However, it does not rule out artificial correction of the data .
Characteristics of data storage
- Large amount of data : Take monitoring data as an example , If the time interval of the monitoring data we collect is 1s, That monitoring item will produce... Every day 86400 Data points , If you have any 10000 Monitoring items , Then one day there will be 864000000 Data points . In the Internet of things scenario , This number will be bigger . The size of the whole data , yes TB Even PB Class .
- Hot and cold : Time series data have very typical cold and hot characteristics , The more historical data , The lower the probability of being queried and analyzed .
- Have timeliness : Time series data has timeliness , Data usually has a storage cycle , Data beyond this storage period can be considered invalid , Can be recycled . On the one hand, because the more historical data , The lower the value available ; The other is to save storage costs , Low value data can be cleaned up .
- Multi precision data storage : In the characteristics of query, the time series data is mentioned for the consideration of storage cost and query efficiency , You will need a multi precision query , It also needs a multi precision data storage .
Data model
Time series data can be divided into two parts
- Sequence : It's an identifier ( dimension ), The main purpose is to facilitate search and screening
- The data points : An array of timestamps and values
- Bank deposit : An array contains multiple points , Such as [{t: 2017-09-03-21:24:44, v: 0.1002}, {t: 2017-09-03-21:24:45, v: 0.1012}]
- Column to save : Two arrays , A save timestamp , A stored value , Such as [ 2017-09-03-21:24:44, 2017-09-03-21:24:45], [0.1002, 0.1012]
In general : Column storage can have better compression rate and query performance
Contrast and choice
You can choose the right storage according to the following requirements :
- Small and fine , High performance , The amount of data is small ( Billion level ): InfluxDB
- Simple , Not a lot of data ( Tens of millions ), There are joint queries 、 Relational database foundation :timescales
- Large amount of data , Big data service foundation , Distributed cluster requirements : opentsdb、KairosDB
- Distributed cluster requirements ,olap Real time online analysis , More resources :druid
- The ultimate pursuit of performance , There is a big difference between the hot and cold data :Beringei
- Also search loading , Distributed aggregate Computing : elsaticsearch
- If you have both index and time series requirements . that Druid and Elasticsearch Is the best choice . Its performance is not bad , At the same time, it meets the characteristics of retrieval and time series , And they are all high availability fault-tolerant architectures .
TDengine And InfluxDB Contrast test
TDengine And InfluxDB Contrast test - TDengine | Taosi data (taosdata.com)
边栏推荐
- Paper reading | point voxel CNN for efficient 3D deep learning
- AI chief architect 11 - "3d+ai" application and expansion in smart Sports
- A Recommendation for interface-based programming
- AI首席架构师11-“3D+AI”在智慧体育的应用与拓展
- Redis uses jedis operation
- 高压差分探头导致的驱动电压离谱的原因
- Worthington peptide synthesis application chymotrypsin scheme
- Wafer thickness measurement
- JMeter notes 1 | introduction and architecture of JMeter
- Bi analytical thinking of business intelligence: cash cycle of manufacturing industry (II)
猜你喜欢
Redis缓存穿透和雪崩
Various application schemes of animal free collagenase Worthington
[information collection] write data from fofa API interface into txt and excel
PLT draw and save the results
The installation and use of harbor+trivy -- the way to build a dream
AI首席架构师11-“3D+AI”在智慧体育的应用与拓展
Let security move | no matter what industry network architecture, these six tactics win the target
等保合规2022系列 | 一个中心+三重防护,助力企业等级保护建设更科学
Visual studio pit record
Distributed link tracking skywalking practice
随机推荐
Redis持久化的使用拓展
硅片厚度测量
【医学分割】Medical Image Segmentation Using Deep Learning: A Survey
Aidl summary
Graffiti Wi Fi & ble SoC development slide strip (5) -- burning authorization
Equal protection compliance 2022 series | one center + triple protection, helping the construction of enterprise level protection to be more scientific
Redis使用Jedis操作
[activity registration] stack a buff for your code! Click "tea" to receive the gift
res中values-swxxdp计算
How to use first-hand data visualization to win the favor of the boss and grasp the key points of data visualization
Pdf to image and content reading
Conf configuration of redis
MySQL Exercise one database Knowledge
分布式链路追踪-skywalking基础
df.drop_duplicates() 详解+用法
Codeworks 5 questions per day (average 1500) - day 22
AI chief architect 11 - "3d+ai" application and expansion in smart Sports
分布式链路追踪-skywalking实战
【华为机试真题】组成最大数【2022 Q3 | 100分】
TDengine实验集群搭建 Success