当前位置：网站首页>temporal database

temporal database

2022-07-22 15:25:00 【Nice night】

List of articles

What is a time series database

Time series database Time Series Database (TSDB)

Time series data is a series of data generated over time , Simply speaking , Data with timestamp .

Although other databases can also process time series data to a certain extent when the data scale is small , but TSDB Data ingestion over time can be handled more effectively 、 Compression and aggregation . Take the Internet of vehicles scenario as an example ,20000 Vehicles , Each car 60 Indicators , Suppose you collect once per second , Then it will be reported every second 20000 * 60 = 1200000 Index value , namely 120W Data index value per second , The value of each indicator is 16 byte （ Assumptions include only 8 Byte timestamp and 8 Floating point number of bytes ）, Then every hour will produce 64G Left and right data . In fact, each indicator value will be accompanied by additional data such as labels , The actual storage space required will be larger .

Time series database related concepts

Time series database is a database that deals with time series data , Therefore, its related concepts are closely related to time series data , Here are some basic concepts of time series database .

* Measure Metric：Metric Similar to tables in relational databases （Table）, Represents a set of similar time series data , For example, build an air quality sensor Table, Store the monitoring data of all sensors .

* label Tag：Tag Describe the characteristics of the data source , It usually doesn't change over time , For example, sensor equipment , Including equipment DeviceId、 Where the equipment is located Region etc. Tag Information , The internal database will automatically be Tag Index , Support according to Tag To carry out multidimensional retrieval query ;Tag from Tag Key、Tag Value form , Both are String type .

* Time stamp Timestamp：Timestamp Represents the time point of data generation , Can be specified when writing , It can also be automatically generated by the system ;

* Measured value Field：Field Describe the measurement indicators of the data source , It usually changes over time , For example, the sensor device contains temperature 、 Humidity, etc Field;

* The data points Data Point: A measurement index value generated by the data source at a certain time （Field Value） It is called a data point , Database query 、 When writing, data points are used as statistical indicators ;

* Timeline Time Series ： An indicator of the data source changes over time , Form a timeline ,Metric + Tags + Field Combine to determine a timeline ; The calculation of time series data includes downsampling 、 polymerization （sum、count、max、min etc. ）、 Interpolation is based on the timeline dimension ;

Application scenario of time series database

The application scenario of time series database is in the Internet of things and the Internet APM There are many applications in such scenarios , Here are some application scenarios of time series database , But not all ：

* Public safety ： Online records 、 Call record 、 Individual tracking 、 Interval screening ;

* The power industry ： Smart meters 、 Power grid 、 Centralized monitoring of power generation equipment ;

* Internet ： The server / Application monitoring 、 User access logs 、 Ad Click log ;

* The Internet of things ： The elevator 、 Boiler 、 mechanical 、 Water meters and other networking devices ;

* Transportation industry ： Live traffic 、 Intersection flow monitoring 、 Bayonet data ;

* Financial industry ： Transaction records 、 Access record 、ATM、POS Machine monitoring ;

Maybe except for this air conditioner , The next elevator project is also a timing database

Characteristics of time series database

It's invariant 、 Uniqueness 、 Time sequencing

Time series data is a series of data based on time . Connect these data points into a line in time coordinates , In the past, we can make multi latitude reports , Reveal its trend 、 Regularity 、 Anomalies ; In the future, we can do big data analysis , machine learning , Realize prediction and early warning .
Think that a time series database is a database that stores time series data , And it needs to support the fast writing of timing data 、 Persistence 、 Multi dimensional aggregation query and other basic functions .

Characteristics of data writing

Write smooth 、 continued 、 High concurrency and high throughput ： The writing of timing data is relatively stable , This is different from application data , Application data is usually proportional to the number of applications accessed , However, there are usually peaks and troughs in application traffic . Time series data is usually generated at a fixed time frequency , Not subject to other factors , The speed of data generation is relatively stable .
Write less and read less. ： Time series data 95%-99% All operations are write operations , It is typical to write more and read less data . This is related to its data characteristics , For example, monitoring data , You may have a lot of monitoring items , But you may actually read less , Usually only care about several specific key indicators or read data in specific scenarios .
Write the recently generated data in real time , No updates ： The writing of timing data is real-time , And each write is the most recently generated data , This is related to the characteristics of its data generation , Because its data generation advances over time , The newly generated data will be written in real time . Data write no update , In the dimension of time , Over time , Every time the data is new , There will be no updates to old data , However, it does not rule out artificial correction of the data .

Characteristics of data storage

Large amount of data ： Take monitoring data as an example , If the time interval of the monitoring data we collect is 1s, That monitoring item will produce... Every day 86400 Data points , If you have any 10000 Monitoring items , Then one day there will be 864000000 Data points . In the Internet of things scenario , This number will be bigger . The size of the whole data , yes TB Even PB Class .
Hot and cold ： Time series data have very typical cold and hot characteristics , The more historical data , The lower the probability of being queried and analyzed .
Have timeliness ： Time series data has timeliness , Data usually has a storage cycle , Data beyond this storage period can be considered invalid , Can be recycled . On the one hand, because the more historical data , The lower the value available ; The other is to save storage costs , Low value data can be cleaned up .
Multi precision data storage ： In the characteristics of query, the time series data is mentioned for the consideration of storage cost and query efficiency , You will need a multi precision query , It also needs a multi precision data storage .

Data model

Time series data can be divided into two parts

Sequence ： It's an identifier （ dimension ）, The main purpose is to facilitate search and screening
The data points ： An array of timestamps and values
- Bank deposit ： An array contains multiple points , Such as [{t: 2017-09-03-21:24:44, v: 0.1002}, {t: 2017-09-03-21:24:45, v: 0.1012}]
- Column to save ： Two arrays , A save timestamp , A stored value , Such as [ 2017-09-03-21:24:44, 2017-09-03-21:24:45], [0.1002, 0.1012]
  In general ： Column storage can have better compression rate and query performance

Contrast and choice

You can choose the right storage according to the following requirements ：

Small and fine , High performance , The amount of data is small ( Billion level ): InfluxDB
Simple , Not a lot of data （ Tens of millions ）, There are joint queries 、 Relational database foundation ：timescales
Large amount of data , Big data service foundation , Distributed cluster requirements ： opentsdb、KairosDB
Distributed cluster requirements ,olap Real time online analysis , More resources ：druid
The ultimate pursuit of performance , There is a big difference between the hot and cold data ：Beringei
Also search loading , Distributed aggregate Computing ： elsaticsearch
If you have both index and time series requirements . that Druid and Elasticsearch Is the best choice . Its performance is not bad , At the same time, it meets the characteristics of retrieval and time series , And they are all high availability fault-tolerant architectures .