3. Time Series - 3.1 What is a time series? - 《Trading with Machine Learning》

No one can predict the future. However, there’s a way to predict the future in certain data sets with greater accuracy; time series data.

In this lesson, we will learn what is the time series and understand the basic concepts of time series modeling. We will also try to learn some basic terminology and then use some time-tested, no pun intended, techniques to predict the future. We will first try to answer the question: What is a time series and why is it important in finance? We will then discuss how to analyze a time series data set. What are the different terms used in time series analysis including ARIMA and how we can use it to make predictions? In this lesson, our goals are to understand what is a time series and what are some of the basic concepts in time series that we need to know about?
Then we will learn about how to analyze time series data and build a model to predict a future value from past values. First, let’s understand what a time series is. A time series is a series of data points indexed in time order. Most commonly a time series is a sequence of snapshots of a process taken at successive equally spaced points in time. Thus, it is a sequence of discrete time data. Examples of time series are heights of ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average. Now, let’s look at what the basic terminology that we use in analyzing time series data are.
First, we need to understand the concept known as stationarity. In time series data such as the chart to the right of us, US GDP data over the last 200 years. We see that it’s summary statistics such as mean and variance change over time. This is because US GDP has expanded over time and an average in
one 10-year period is not the same average a century later. We call such data non-stationary.
So what is stationary data then? Any data such that these statistical structure of the series is independent of time is known to be stationary. In simple terms, it implies that its mean and standard deviation don’t change over time. How can we find whether time series is stationary?
One way to do that is simply by looking at the plot.
As you can see in the chart here,
it is non-stationary,
meaning that it has a definite trend.
Secondly, you can measure summary statistics such as
average and standard deviation
at various points of time in
the data and check for
obvious or significant differences.
Third, you can look and do some statistical tests to
check if the expectations of
stationarity are met or have been violated.
Suppose your data is trendy like
the US GDP chart on the screen,
how do you make it stationary?
Statistical time series methods and
even modern machine learning methods
benefit from a clearer signal in the data,
which we obtained when we stationarize a time series.
One way to make
non-stationary time series data stationary
is by identifying and removing
trends and removing seasonal effects.
An easy way to do it is to
difference one time period from another.
That is, we take the difference between
two data points and plot
it like we see on screen to the right.
How does it look?
It still looks like it has
some trend or even higher averages over time.
Let’s try differencing it one more time.
After differencing it once more,
it appears to be stationary.
If you want to confirm that the mean and variance
of this series is not dependent on time,
you can do a statistical test known
as the augmented Dickey-Fuller test.
Without going into details about how the test works,
we will give you a hint on how to read the test output.
If the test statistic of
the test is greater than a certain p-value,
let’s say 0.05, then the given time series is stationary.
If you need more details,
check out the two links on this slide.
Next, let’s look at why
stationarity is important in a time series model.
There are two reasons.
Let’s say we want to build
a model in which averaging is used.
What mean and standard deviation
of your data will you use?
If your data is non-stationary,
then you will choose the mean from the beginning or
the middle or the end, they’re all different.
Hence, stationarity allows you to build
a stable model that uses
stable parameters that don’t change over time.
In the air passenger traffic chart
that we see to the right,
we notice that there are
some interesting components in time series data.
The first component is called trend.
A trend is a long-run increase
or decrease in a time series.
You can see that the chart on
screen has a slight upward trend.
Second, when data is affected by the time of the year,
it is set to be seasonal.
In this case, we can see that
almost every year the chart tended
to peak during the middle of the year
and decrease slightly afterwards.
This is most pronounced in retail sales such
as snow shovels or lawnmowers.
Snow shovels tend to sell well
in fall and winter and then decline afterward.
Third, is a cyclical component.
A cyclical component is
measured over a long time horizon,
typically one year or longer.
For example, sales at
fast food chains may
rise during recessions when consumers
are more cost-conscious and then fall
during recoveries This is tied to the business cycle.
Finally, an irregular component.
Irregular effects are the impacts
of random events such as crashes,
earthquakes, or sudden changes in the weather.
By their very nature,
these effects are completely unpredictable.
Putting it all together,
we can see that a time series is
an amalgam of all these components.
The definition of stationarity implies
that the mean and variance of
a process remains stationary,
that is they should not change over time.
However, looking at a stock chart,
we can tell that it is not stationary.
Stock prices are typically trending up or down.
But in this case,
they have a rising average price over time.
So how do we make it stationary?
One way to do that would be the difference
the stock prices to get daily, monthly,
or annual returns, that should make them
stationary as you can see on the bottom right.