Time-series Primer
Prologue
Time-series is a concept of statistics used in science and engineering verticals having temporal measurements. The fundamentals of time-series can be intuitively understood by drawing an analogy with a solar system. The center of the time-series solar system is its definition. The properties of time-series are like the planets around the centre and the applications of time-series are similar to the satellites revolving around these planets. To understand how and why, continue reading by acknowledging that you are familiar with basic statistical properties such as mean and variance.
Photo by Ross Sneddon on Unsplash
The definition
An ordered sequence of values of a variable at equally-spaced time intervals.
In this definition, there are three important words that explain almost all important properties and applications. Those words are:
- ordered
- variable
- equally-spaced
Properties
The properties of time-series are grouped into three different tabs containing properties explainable by the word. Click on each tab below to go through the properties.
Auto-correlation
In traditional statistical analysis such as Regression, a variable contains a sequence of observations that are assumed to be independent of each other. Hence the order of values is irrelavant. But in a time-series sequence, the observations may not be independent as the dependence is established through time dimension. The values observed at a time T can be related to those observed before and after T.This phenomenon is called Auto-correlation and it is the quantitative measurement of the similarity between the time-series and its lagged version over successive time intervals. Therefore, the order of values of a time-series is prominent and unchangeable.
In the plot above, the crests and troughs occur at time periods that are multiples of 7. There is a high correlation between the values of time series Xt and the series Xt-7 obtained by taking the 7th preceding value for each value of X.
Scroll up to select the next tab and view remaining properties.
Time-series is a random variable often with meaningful summary statistics. The following are some characteristic properties of defined as a function of mean and variance.
Stationarity
A time-series is said to be stationary if its mean and variance do not change with time. In the plot below, it can be noticed that time-series values vary within a bounded range over a fixed period of time. This can be quantitatively verified by computing mean and variance over shorter time-periods within the measurement period. Stationarity can be statistically verified by performing Dickey-Fuller statistical test.The following plot shows the hourly temperature measurements of a furnace in a controlled industrial setup. The time-series appears to be stationary.
Stationarity plays a prominent role in time-series analysis. In literature and practice, many variants of stationarity are dealt with and are broadly classified into strong and weak stationarities. Time-series with temporally constant summary statistcs are termed to have strong stationarity whereas those having some properties changing with time are termed to have weak stationarity. In such cases, it is sometimes possbile to transform or decompose the time-series to obtain stationarity. If you wish to learn more, you are recommended to go through this detailed post on stationarity
Trend
Time-series values might follow a curve with a non-zero slope over an extended period of time or the enitre time-period of measurement. In such cases, the mean changes with time and the time-series is said to be having an increasing or decreasing trend.For example, consider the monthly retail sales in the US for the last many years which seems to be having an overall increasing trend. The two major dips can be correlated to the reduction in consumers' spending power during the events of economic recession in 2008 and the COVID-19 pandemic in 2020.
Seasonality
If time-series values at certain timestamps significantly differ from the remaining values and if all such timestamps have any of the same calendar attributes such as day of year, day of month, day of week and time of day among others, then the time-series is said to be having seasonality. The variance of time-series changes due to these fluctuations and the difference between such consecutive timestamps is constant.The quantity of ice-creams produced throughout the year is a vanilla example of seasonality. As the demand for ice-creams peaks during summer and dips during winter, we notice a corresponding pattern of increase in production till summer and decerase thereafter until winter. Every year the production peaks in June and dips in December. This is a time-series with yearly seasonality.
Cyclicity
If time-series values significantly fluctuate at timestamps that are not equally apart, the effect is called Cyclicity. In this case, the difference between such consecutive timestamps is not constant. A classic example is a time-series with fluctuations during holidays such as Thanksgiving, Easter among other holidays that occur on different days each year. The time gap between consecutive Thanksgiving days varies every year.Noticably, it is possible to convert some cyclic effects into seasonal effects by aggregating time-series to a larger time interval. Thanksgiving always occurs in November and hence it could become a yearly seasonal effect in a monthly-aggregated time-series. However this is not possible with Easter unless the aggregation is at least at a half-yearly level. Such operations could be useless as aggregation to significantly larger time intervals would normalize other cyclic and seasonal fluctuations. It is quite important to understand this subtle difference between Seasonality and Cyclicity. If you wish to learn more about dealing with Cyclicity, you are recommended to go through this informative post on Seasonality vs Cyclicity.
Scroll up to select a different tab and view other properties.
Periodicity
Periodicity is frequency of occurence of values of time-series. It is generally yearly, quarterly, monthly, weekly, daily, hourly or sub-hourly. It is computed by measuring the time difference between consequent values of time-series. Presence of missing values can cause ambiguity in determining periodicity and either they have to be treated or the series can be aggregated to larger time intervals to make periodicity more deterministic, if such transformation still suits the purpose.Applications
These properties of time-series are termed characteristic as they bring a sense of identity to the variable. Several time-series variables can be clustered into groups based on their properties, say variables with increasing trend, variables with strong yearly seasonality, variables that are intermittent among others. Statistical or machine learning models can be trained to learn these properties of time-series from historical observations to predict values to be observed in the future. And eventually after observing the actual values, they can be compared with the predictions to detect outliers. In standard nomenclature, the problems where concepts of time-series can be applied are:
- Exploratory Research
in situations like customer/product segmentation, interpreting weather etc. - Future Prediction
in planning situations like forecasting demand, capacity, weather and stock-price - Anomaly Detection
in monitoring situations like preventive maintenance