Time Series Analysis for Business Forecasting

» 1757

You may like using Statistics for Time Series, and Testing Correlation JavaScript.

A Summary of Forecasting Methods

Ideally, organizations which can afford to do so will usually assign crucial forecast responsibilities to those departments and/or individuals that are best qualified and have the necessary resources at hand to make such forecast estimations under complicated demand patterns. Clearly, a firm with a large ongoing operation and a technical staff comprised of statisticians, management scientists, computer analysts, etc. is in a much better position to select and make proper use of sophisticated forecast techniques than is a company with more limited resources. Notably, the bigger firm, through its larger resources, has a competitive edge over an unwary smaller firm and can be expected to be very diligent and detailed in estimating forecast (although between the two, it is usually the smaller firm which can least afford miscalculations in new forecast levels).

A time series is a set of ordered observations on a quantitative characteristic of a phenomenon at equally spaced time points. One of the main goals of time series analysis is to forecast future values of the series.

A trend is a regular, slowly evolving change in the series level. Changes that can be modeled by low-order polynomials

We examine three general classes of models that can be constructed for purposes of forecasting or policy analysis. Each involves a different degree of model complexity and presumes a different level of comprehension about the processes one is trying to model.

Many of us often either use or produce forecasts of one sort or another. Few of us recognize, however, that some kind of logical structure, or model, is implicit in every forecast.

In making a forecast, it is also important to provide a measure of how accurate one can expect the forecast to be. The use of intuitive methods usually precludes any quantitative measure of confidence in the resulting forecast. The statistical analysis of the individual relationships that make up a model, and of the model as a whole, makes it possible to attach a measure of confidence to the model’s forecasts.

Once a model has been constructed and fitted to data, a sensitivity analysis can be used to study many of its properties. In particular, the effects of small changes in individual variables in the model can be evaluated. For example, in the case of a model that describes and predicts interest rates, one could measure the effect on a particular interest rate of a change in the rate of inflation. This type of sensitivity study can be performed only if the model is an explicit one.

In Time-Series Models we presume to know nothing about the causality that affects the variable we are trying to forecast. Instead, we examine the past behavior of a time series in order to infer something about its future behavior. The method used to produce a forecast may involve the use of a simple deterministic model such as a linear extrapolation or the use of a complex stochastic model for adaptive forecasting.

One example of the use of time-series analysis would be the simple extrapolation of a past trend in predicting population growth. Another example would be the development of a complex linear stochastic model for passenger loads on an airline. Time-series models have been used to forecast the demand for airline capacity, seasonal telephone demand, the movement of short-term interest rates, and other economic variables. Time-series models are particularly useful when little is known about the underlying process one is trying to forecast. The limited structure in time-series models makes them reliable only in the short run, but they are nonetheless rather useful.

In the Single-Equation Regression Models the variable under study is explained by a single function (linear or nonlinear) of a number of explanatory variables. The equation will often be time-dependent (i.e., the time index will appear explicitly in the model), so that one can predict the response over time of the variable under study to changes in one or more of the explanatory variables. A principal purpose for constructing single-equation regression models is forecasting. A forecast is a quantitative estimate (or set of estimates) about the likelihood of future events which is developed on the basis of past and current information. This information is embodied in the form of a model—a single-equation structural model and a multi-equation model or a time-series model. By extrapolating our models beyond the period over which they were estimated, we can make forecasts about near future events. This section shows how the single-equation regression model can be used as a forecasting tool.

The term forecasting is often thought to apply solely to problems in which we predict the future. We shall remain consistent with this notion by orienting our notation and discussion toward time-series forecasting. We stress, however, that most of the analysis applies equally well to cross-section models.

An example of a single-equation regression model would be an equation that relates a particular interest rate, such as the money supply, the rate of inflation, and the rate of change in the gross national product.

The choice of the type of model to develop involves trade-offs between time, energy, costs, and desired forecast precision. The construction of a multi-equation simulation model may require large expenditures of time and money. The gains from this effort may include a better understanding of the relationships and structure involved as well as the ability to make a better forecast. However, in some cases these gains may be small enough to be outweighed by the heavy costs involved. Because the multi-equation model necessitates a good deal of knowledge about the process being studied, the construction of such models may be extremely difficult.

The decision to build a time-series model usually occurs when little or nothing is known about the determinants of the variable being studied, when a large number of data points are available, and when the model is to be used largely for short-term forecasting. Given some information about the processes involved, however, it may be reasonable for a forecaster to construct both types of models and compare their relative performance.

Two types of forecasts can be useful. Point forecasts predict a single number in each forecast period, while interval forecasts indicate an interval in which we hope the realized value will lie. We begin by discussing point forecasts, after which we consider how confidence intervals (interval forecasts) can be used to provide a margin of error around point forecasts.

The information provided by the forecasting process can be used in many ways. An important concern in forecasting is the problem of evaluating the nature of the forecast error by using the appropriate statistical tests. We define the best forecast as the one which yields the forecast error with the minimum variance. In the single-equation regression model, ordinary lest-squares estimation yields the best forecast among all linear unbiased estimators having minimum mean-square error.

The error associated with a forecasting procedure can come from a combination of four distinct sources. First, the random nature of the additive error process in a linear regression model guarantees that forecasts will deviate from true values even if the model is specified correctly and its parameter values are known. Second, the process of estimating the regression parameters introduces error because estimated parameter values are random variables that may deviate from the true parameter values. Third, in the case of a conditional forecast, errors are introduced when forecasts are made for the values of the explanatory variables for the period in which the forecast is made. Fourth, errors may be introduced because the model specification may not be an accurate representation of the “true” model.

Multi-predictor regression methods include logistic models for binary outcomes, the Cox model for right-censored survival times, repeated-measures models for longitudinal and hierarchical outcomes, and generalized linear models for counts and other outcomes. Below we outline some effective forecasting approaches, especially for short to intermediate term analysis and forecasting:

Modeling the Causal Time Series: With multiple regressions, we can use more than one predictor. It is always best, however, to be parsimonious, that is to use as few variables as predictors as necessary to get a reasonably accurate forecast. Multiple regressions are best modeled with commercial package such as SAS or SPSS. The forecast takes the form:

Y = b0 + b1X1 + b2X2 + . . .+ bnXn,

where b0 is the intercept, b1, b2, . . . bn are coefficients representing the contribution of the independent variables X1, X2,…, Xn.

Forecasting is a prediction of what will occur in the future, and it is an uncertain process. Because of the uncertainty, the accuracy of a forecast is as important as the outcome predicted by forecasting the independent variables X1, X2,…, Xn. A forecast control must be used to determine if the accuracy of the forecast is within acceptable limits. Two widely used methods of forecast control are a tracking signal, and statistical control limits.

Tracking signal is computed by dividing the total residuals by their mean absolute deviation (MAD). To stay within 3 standard deviations, the tracking signal that is within 3.75 MAD is often considered to be good enough.

Statistical control limits are calculated in a manner similar to other quality control limit charts, however, the residual standard deviation are used.

Multiple regressions are used when two or more independent factors are involved, and it is widely used for short to intermediate term forecasting. They are used to assess which factors to include and which to exclude. They can be used to develop alternate models with different factors.

Trend Analysis: Uses linear and nonlinear regression with time as the explanatory variable, it is used where pattern over time have a long-term trend. Unlike most time-series forecasting techniques, the Trend Analysis does not assume the condition of equally spaced time series.

Nonlinear regression does not assume a linear relationship between variables. It is frequently used when time is the independent variable.

You may like using Detective Testing for Trend JavaScript.

In the absence of any “visible” trend, you may like performing the Test for Randomness of Fluctuations, too.

Modeling Seasonality and Trend: Seasonality is a pattern that repeats for each period. For example annual seasonal pattern has a cycle that is 12 periods long, if the periods are months, or 4 periods long if the periods are quarters. We need to get an estimate of the seasonal index for each month, or other periods, such as quarter, week, etc, depending on the data availability.

1. Seasonal Index: Seasonal index represents the extent of seasonal influence for a particular segment of the year. The calculation involves a comparison of the expected values of that period to the grand mean.

A seasonal index is how much the average for that particular period tends to be above (or below) the grand average. Therefore, to get an accurate estimate for the seasonal index, we compute the average of the first period of the cycle, and the second period, etc, and divide each by the overall average. The formula for computing seasonal factors is:

Si = Di/D,

where:
Si = the seasonal index for ith period,
Di = the average values of ith period,
D = grand average,
i = the ith seasonal period of the cycle.

A seasonal index of 1.00 for a particular month indicates that the expected value of that month is 1/12 of the overall average. A seasonal index of 1.25 indicates that the expected value for that month is 25% greater than 1/12 of the overall average. A seasonal index of 80 indicates that the expected value for that month is 20% less than 1/12 of the overall average.

2. Deseasonalizing Process: Deseasonalizing the data, also called Seasonal Adjustment is the process of removing recurrent and periodic variations over a short time frame, e.g., weeks, quarters, months. Therefore, seasonal variations are regularly repeating movements in series values that can be tied to recurring events. The Deseasonalized data is obtained by simply dividing each time series observation by the corresponding seasonal index.

Almost all time series published by the US government are already deseasonalized using the seasonal index to unmasking the underlying trends in the data, which could have been caused by the seasonality factor.

3. Forecasting: Incorporating seasonality in a forecast is useful when the time series has both trend and seasonal components. The final step in the forecast is to use the seasonal index to adjust the trend projection. One simple way to forecast using a seasonal adjustment is to use a seasonal factor in combination with an appropriate underlying trend of total value of cycles.

4. A Numerical Application: The following table provides monthly sales ($1000) at a college bookstore. The sales show a seasonal pattern, with the greatest number when the college is in session and decrease during the summer months.

M

T

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Total

1

196

188

192

164

140

120

112

140

160

168

192

200

1972

2

200

188

192

164

140

122

132

144

176

168

196

194

2016

3

196

212

202

180

150

140

156

144

164

186

200

230

2160

4

242

240

196

220

200

192

176

184

204

228

250

260

2592

Mean:

208.6

207.0

192.6

182.0

157.6

143.6

144.0

153.0

177.6

187.6

209.6

221.0

2185

Index:

1.14

1.14

1.06

1.00

0.87

0.79

0.79

0.84

0.97

1.03

1.15

1.22

12

Suppose we wish to calculate seasonal factors and a trend, then calculate the forecasted sales for July in year 5.

The first step in the seasonal forecast will be to compute monthly indices using the past four-year sales. For example, for January the index is:

S(Jan) = D(Jan)/D = 208.6/181.84 = 1.14,

where D(Jan) is the mean of all four January months, and D is the grand mean of all past four-year sales.

Similar calculations are made for all other months. Indices are summarized in the last row of the above table. Notice that the mean (average value) for the monthly indices adds up to 12, which is the number of periods in a year for the monthly data.

Next, a linear trend often is calculated using the annual sales:

Y = 1684 + 200.4T,

The main question is whether this equation represents the trend.

Determination
of the Annual Trend for the Numerical Example

Year
No:

Actual
Sales

Linear
Regression

Quadratic
Regression

1

1972

1884

1981

2

2016

2085

1988

3

2160

2285

2188

4

2592

2486

2583

Often fitting a straight line to the seasonal data is misleading. By constructing the scatter-diagram, we notice that a Parabola might be a better fit. Using the Polynomial Regression JavaScript, the estimated quadratic trend is:

Y = 2169 – 284.6T + 97T2

Predicted values using both the linear and the quadratic trends are presented in the above tables. Comparing the predicted values of the two models with the actual data indicates that the quadratic trend is a much superior fit than the linear one, as often expected.

We can now forecast the next annual sales; which, corresponds to year 5, or T = 5 in the above quadratic equation:

Y = 2169 – 284.6(5) + 97(5)2 = 3171

sales for the following year. The average monthly sales during next year is, therefore: 3171/12 = 264.25.

Finally, the forecast for month of July is calculated by multiplying the average monthly sales forecast by the July seasonal index, which is 0.79; i.e., (264.25).(0.79) or 209.

You might like to use the Seasonal Index JavaScript to check your hand computation. As always you must first use Plot of the Time Series as a tool for the initial characterization process.

For testing seasonality based on seasonal index, you may like to use the
Test for Seasonality JavaScript.

Trend Removal and Cyclical Analysis: The cycles can be easily studied if the trend itself is removed. This is done by expressing each actual value in the time series as a percentage of the calculated trend for the same date. The resulting time series has no trend, but oscillates around a central value of 100.

Decomposition Analysis: It is the pattern generated by the time series and not necessarily the individual data values that offers to the manager who is an observer, a planner, or a controller of the system. Therefore, the Decomposition Analysis is used to identify several patterns that appear simultaneously in a time series.

A variety of factors are likely influencing data. It is very important in the study that these different influences or components be separated or decomposed out of the ‘raw’ data levels. In general, there are four types of components in time series analysis: Seasonality, Trend, Cycling and Irregularity.

Xt = St . Tt. Ct . I

The first three components are deterministic which are called “Signals”, while the last component is a random variable, which is called “Noise”. To be able to make a proper forecast, we must know to what extent each component is present in the data. Hence, to understand and measure these components, the forecast procedure involves initially removing the component effects from the data (decomposition). After the effects are measured, making a forecast involves putting back the components on forecast estimates (recomposition). The time series decomposition process is depicted by the following flowchart:

Time Series Decomposition Process

Definitions of the major components in the above flowchart:

Seasonal variation: When a repetitive pattern is observed over some time horizon, the series is said to have seasonal behavior. Seasonal effects are usually associated with calendar or climatic changes. Seasonal variation is frequently tied to yearly cycles.

Trend: A time series may be stationary or exhibit trend over time. Long-term trend is typically modeled as a linear, quadratic or exponential function.

Cyclical variation: An upturn or downturn not tied to seasonal variation. Usually results from changes in economic conditions.

  1. Seasonalities are regular fluctuations which are repeated from year to year with about the same timing and level of intensity. The first step of a times series decomposition is to remove seasonal effects in the data. Without deseasonalizing the data, we may, for example, incorrectly infer that recent increase patterns will continue indefinitely; i.e., a growth trend is present, when actually the increase is ‘just because it is that time of the year’; i.e., due to regular seasonal peaks. To measure seasonal effects, we calculate a series of seasonal indexes. A practical and widely used method to compute these indexes is the ratio-to-moving-average approach. From such indexes, we may quantitatively measure how far above or below a given period stands in comparison to the expected or ‘business as usual’ data period (the expected data are represented by a seasonal index of 100%, or 1.0).
  2. Trend is growth or decay that is the tendencies for data to increase or decrease fairly steadily over time. Using the deseasonalized data, we now wish to consider the growth trend as noted in our initial inspection of the time series. Measurement of the trend component is done by fitting a line or any other function. This fitted function is calculated by the method of least squares and represents the overall trend of the data over time.
  3. Cyclic oscillations are general up-and-down data changes; due to changes e.g., in the overall economic environment (not caused by seasonal effects) such as recession-and-expansion. To measure how the general cycle affects data levels, we calculate a series of cyclic indexes. Theoretically, the deseasonalized data still contains trend, cyclic, and irregular components. Also, we believe predicted data levels using the trend equation do represent pure trend effects. Thus, it stands to reason that the ratio of these respective data values should provide an index which reflects cyclic and irregular components only. As the business cycle is usually longer than the seasonal cycle, it should be understood that cyclic analysis is not expected to be as accurate as a seasonal analysis.

    Due to the tremendous complexity of general economic factors on long term behavior, a general approximation of the cyclic factor is the more realistic aim. Thus, the specific sharp upturns and downturns are not so much the primary interest as the general tendency of the cyclic effect to gradually move in either direction. To study the general cyclic movement rather than precise cyclic changes (which may falsely indicate more accurately than is present under this situation), we ‘smooth’ out the cyclic plot by replacing each index calculation often with a centered 3-period moving average. The reader should note that as the number of periods in the moving average increases, the smoother or flatter the data become. The choice of 3 periods perhaps viewed as slightly subjective may be justified as an attempt to smooth out the many up-and-down minor actions of the cycle index plot so that only the major changes remain.

  4. Irregularities (I) are any fluctuations not classified as one of the above. This component of the time series is unexplainable; therefore it is unpredictable. Estimation of I can be expected only when its variance is not too large. Otherwise, it is not possible to decompose the series. If the magnitude of variation is large, the projection for the future values will be inaccurate. The best one can do is to give a probabilistic interval for the future value given the probability of I is known.
  5. Making a Forecast: At this point of the analysis, after we have completed the study of the time series components, we now project the future values in making forecasts for the next few periods. The procedure is summarized below.
    • Step 1: Compute the future trend level using the trend equation.
    • Step 2: Multiply the trend level from Step 1 by the period seasonal index to include seasonal effects.
    • Step 3: Multiply the result of Step 2 by the projected cyclic index to include cyclic effects and get the final forecast result.

Exercise your knowledge about how to forecast by decomposition method? by using a sales time series available at

Therein you will find a detailed workout numerical example in the context of the sales time series which consists of all components including a cycle.

Smoothing Techniques: A time series is a sequence of observations, which are ordered in time. Inherent in the collection of data taken over time is some form of random variation. There exist methods for reducing of canceling the effect due to random variation. A widely used technique is “smoothing”. This technique, when properly applied, reveals more clearly the underlying trend, seasonal and cyclic components.

Smoothing techniques are used to reduce irregularities (random fluctuations) in time series data. They provide a clearer view of the true underlying behavior of the series. Moving averages rank among the most popular techniques for the preprocessing of time series. They are used to filter random “white noise” from the data, to make the time series smoother or even to emphasize certain informational components contained in the time series.

Exponential smoothing is a very popular scheme to produce a smoothed time series. Whereas in moving averages the past observations are weighted equally, Exponential Smoothing assigns exponentially decreasing weights as the observation get older. In other words, recent observations are given relatively more weight in forecasting than the older observations. Double exponential smoothing is better at handling trends. Triple Exponential Smoothing is better at handling parabola trends.

Exponential smoothing is a widely method used of forecasting based on the time series itself. Unlike regression models, exponential smoothing does not imposed any deterministic model to fit the series other than what is inherent in the time series itself.

Simple Moving Averages:
The best-known forecasting methods is the moving averages or simply takes a certain number of past periods and add them together; then divide by the number of periods. Simple Moving Averages (MA) is effective and efficient approach provided the time series is stationary in both mean and variance. The following formula is used in finding the moving average of order n, MA(n) for a period t+1,

MAt+1 = [Dt + Dt-1 + … +Dt-n+1] / n

where n is the number of observations used in the calculation.

The forecast for time period t + 1 is the forecast for all future time periods. However, this forecast is revised only when new data becomes available.

You may like using Forecasting by Smoothing Javasript, and then performing some numerical experimentation for a deeper understanding of these concepts.

Weighted Moving Average: Very powerful and economical. They are widely used where repeated forecasts required-uses methods like sum-of-the-digits and trend adjustment methods. As an example, a Weighted Moving Averages is:

Weighted MA(3) = w1.Dt + w2.Dt-1 + w3.Dt-2

where the weights are any positive numbers such that: w1 + w2 + w3 = 1. A typical weights for this example is, w1 = 3/(1 + 2 + 3) = 3/6, w2 = 2/6, and w3 = 1/6.

You may like using Forecasting by Smoothing JavaScript, and then performing some numerical experimentation for a deeper understanding of the concepts.

An illustrative numerical example: The moving average and weighted moving average of order five are calculated in the following table.

WeekSales ($1000)MA(5)WMA(5)
1105–
2100–
3105–
495–
5100101100
6959998
7105100100
8120103107
9115107111
10125117116
11120120119
12120120119

Moving Averages with Trends: Any method of time series analysis involves a different degree of model complexity and presumes a different level of comprehension about the underlying trend of the time series. In many business time series, the trend in the smoothed series using the usual moving average method indicates evolving changes in the series level to be highly nonlinear.

In order to capture the trend, we may use the Moving-Average with Trend (MAT) method. The MAT method uses an adaptive linearization of the trend by means of incorporating a combination of the local slopes of both the original and the smoothed time series.

The following formulas are used in MAT method:

X(t): The actual (historical) data at time t.

M(t) = å X(i) / n
i.e., finding the moving average smoothing M(t) of order n, which is a positive odd integer number ³ 3, for i from t-n+1 to t.

F(t) = the smoothed series adjusted for any local trend
F(t) = F(t-1) + a [(n-1)X(t) + (n+1)X(t-n) -2nM(t-1)], where constant coefficient a = 6/(n3 – n).

with initial conditions F(t) =X(t) for all t £ n,

Finally, the h-step-a-head forecast f(t+h) is:
F(t+h) = M(t) + [h + (n-1)/2] F(t).

To have a notion of F(t), notice that the inside bracket can be written as:

n[X(t) – F(t-1)] + n[X(t-m) – F(t-1)] + [X(t-m) – X(t)],

this is, a combination of three rise/fall terms.

In making a forecast, it is also important to provide a measure of how accurate one can expect the forecast to be. The statistical analysis of the error terms known as residual time-series provides measure tool and decision process for modeling selection process.
In applying MAT method sensitivity analysis is needed to determine the optimal value of the moving average parameter n, i.e., the optimal number of period m. The error time series allows us to study many of its statistical properties for goodness-of-fit decision. Therefore it is important to evaluate the nature of the forecast error by using the appropriate statistical tests. The forecast error must be a random variable distributed normally with mean close to zero and a constant variance across time.

For computer implementation of the Moving Average with Trend (MAT) method one may use the forecasting (FC) module of WinQSB which is commercial grade stand-alone software. WinQSB’s approach is to first select the model and then enter the parameters and the data. With the Help features in WinQSB there is no learning-curve one just needs a few minutes to master its useful features.

Exponential Smoothing Techniques: One of the most successful forecasting methods is the exponential smoothing (ES) techniques. Moreover, it can be modified efficiently to use effectively for time series with seasonal patterns. It is also easy to adjust for past errors-easy to prepare follow-on forecasts, ideal for situations where many forecasts must be prepared, several different forms are used depending on presence of trend or cyclical variations. In short, an ES is an averaging technique that uses unequal weights; however, the weights applied to past observations decline in an exponential manner.

Single Exponential Smoothing: It calculates the smoothed series as a damping coefficient times the actual series plus 1 minus the damping coefficient times the lagged value of the smoothed series. The extrapolated smoothed series is a constant, equal to the last value of the smoothed series during the period when actual data on the underlying series are available. While the simple Moving Average method is a special case of the ES, the ES is more parsimonious in its data usage.

Ft+1 = a Dt + (1 – a) Ft

where:

  • Dt is the actual value
  • Ft is the forecasted value
  • a is the weighting factor, which ranges from 0 to 1
  • t is the current time period.

Notice that the smoothed value becomes the forecast for period t + 1.

A small a provides a detectable and visible smoothing. While a large a provides a fast response to the recent changes in the time series but provides a smaller amount of smoothing. Notice that the exponential smoothing and simple moving average techniques will generate forecasts having the same average age of information if moving average of order n is the integer part of (2-a)/a.

An exponential smoothing over an already smoothed time series is called double-exponential smoothing. In some cases, it might be necessary to extend it even to a triple-exponential smoothing. While simple exponential smoothing requires stationary condition, the double-exponential smoothing can capture linear trends, and triple-exponential smoothing can handle almost all other business time series.

Double Exponential Smoothing: It applies the process described above three to account for linear trend. The extrapolated series has a constant growth rate, equal to the growth of the smoothed series at the end of the data period.

Triple Double Exponential Smoothing: It applies the process described above three to account for nonlinear trend.

Exponenentially Weighted Moving Average: Suppose each day’s forecast value is based on the previous day’s value so that the weight of each observation drops exponentially the further back (k) in time it is. The weight of any individual is

a(1 – a)k,    where a is the smoothing constant.

An exponenentially weighted moving average with a smoothing constant a, corresponds roughly to a simple moving average of length n, where a and n are related by

a = 2/(n+1)    OR    n = (2 – a)/a.

Thus, for example, an exponenentially weighted moving average with a smoothing constant equal to 0.1 would correspond roughly to a 19 day moving average. And a 40-day simple moving average would correspond roughly to an exponentially weighted moving average with a smoothing constant equal to 0.04878.

This approximation is helpful, however, it is harder to update, and may not correspond to an optimal forecast.

Smoothing techniques, such as the Moving Average, Weighted Moving Average, and Exponential Smoothing, are well suited for one-period-ahead forecasting as implemented in the following JavaScript: Forecasting by Smoothing.

Holt’s Linear Exponential Smoothing Technique: Suppose that the series { yt } is non-seasonal but does display trend. Now we need to estimate both the current level and the current trend. Here we define the trend Tt at time t as the difference between the current and previous level.

The updating equations express ideas similar to those for exponential smoothing. The equations are:

Lt = a yt + (1 – a) Ft

for the level and

Tt = b ( Lt – Lt-1 ) + (1 – b) Tt-1

for the trend. We have two smoothing parameters a and b; both must be positive and less than one. Then the forecasting for k periods into the future is:

Fn+k = Ln + k. Tn

Given that the level and trend remain unchanged, the initial (starting) values are

T2 = y2 – y1,        L2 = y2,     and      F3 = L2 + T2

An Application: A company’s credit outstanding has been increasing at a relatively constant rate over time:

Applying the Holt’s techniques with smoothing with parameters a = 0.7 and b = 0.6, a graphical representation of the time series, its forecasts, together wit a few-step ahead forecasts, are depicted below:

Year-end Past credit

Yearcredit (in millions)
1133
2155
3165
4171
5194
6231
7274
8312
9313
10333
11343

K-Period Ahead Forecast

KForecast (in millions)
1359.7
2372.6
3385.4
4398.3

Demonstration of the calculation procedure, with a = 0.7 and b = 0.6

L2 = y2 = 155,   
T2 = y2 – y1 = 155 –133 = 22

L3 = .7 y3 + (1 – .7) F3,   
T3 = .6 ( L3 – L2 ) + (1 – .6) T2

F4 = L3 + T3,    
F3 = L2 + T2

L3 = .7 y3 + (1 – .7) F3,    
T3 = .6 ( L3 – L2 ) + (1 – .6) T2 ,    
F4 = L3 + T3

The Holt-Winters’ Forecasting Technique: Now in addition to Holt parameters, suppose that the series exhibits multiplicative seasonality and let St be the multiplicative seasonal factor at time t. Suppose also that there are s periods in a year, so s=4 for quarterly data and s=12 for monthly data. St-s is the seasonal factor in the same period last year.

In some time series, seasonal variation is so strong it obscures any trends or cycles, which are very important for the understanding of the process being observed. Winters’ smoothing method can remove seasonality and makes long term fluctuations in the series stand out more clearly. A simple way of detecting trend in seasonal data is to take averages over a certain period. If these averages change with time we can say that there is evidence of a trend in the series.
The updating equations are:

Lt = a (Lt-1 + Tt-1) + (1 – a) yt / St-s

for the level,

Tt = b ( Lt – Lt-1 ) + (1 – b) Tt-1

for the trend, and

St = g St-s + (1- g) yt / Lt

for the seasonal factor.

We now have three smoothing parameters a , b, and g all must be positive and less than one.

To obtain starting values, one may use the first a few year data. For example for quarterly data, to estimate the level, one may use a centered 4-point moving average:

L10 = (y8 + 2y9 + 2y10 + 2y11 + y12) / 8

as the level estimate in period 10. This will extract the seasonal component from a series with 4 measurements over each year.

T10 = L10 – L9

as the trend estimate for period 10.

S7 = (y7 / L7 + y3 / L3 ) / 2

as the seasonal factor in period 7. Similarly,

S8 = (y8 / L8 + y4 / L4 ) / 2,    

S9 = (y9 / L9 + y5 / L5 ) / 2,    

S10 = (y10 / L10 + y6 / L6 ) / 2

For Monthly Data, the correspondingly we use a centered 12-point moving average:

L30 = (y24 + 2y25 + 2y26 +…..+ 2y35 + y36) / 24

as the level estimate in period 30.

T30 = L30 – L29

as the trend estimate for period 30.

S19 = (y19 / L19 + y7 / L7 ) / 2

as the estimate of the seasonal factor in period 19, and so on, up to 30:

S30 = (y30 / L30 + y18 / L18 ) / 2

Then the forecasting k periods into the future is:

Fn+k = (Ln + k. Tn ) St+k-s,    for k = 1, 2, ….,s

Forecasting by the Z-Chart

Another method of short-term forecasting is the use of a Z-Chart. The name Z-Chart arises from the fact that the pattern on such a graph forms a rough letter Z. For example, in a situation where the sales volume figures for one product or product group for the first nine months of a particular year are available, it is possible, using the Z-Chart, to predict the total sales for the year, i.e. to make a forecast for the next three months. It is assumed that basic trading conditions do not alter, or alter on anticipated course and that any underlying trends at present being experienced will continue. In addition to the monthly sales totals for the nine months of the current year, the monthly sales figures for the previous year are also required and are shown in following table:

Year

Month

2003
$

2004
$

January

940

520

February

580

380

March

690

480

April

680

490

May

710

370

June

660

390

July

630

350

August

470

440

September

480

360

October

590

November

450

December

430

Total Sales 2003

7310

The monthly sales for the first nine months of a particular year together with the monthly sales for the previous year.

From the data in the above table, another table can be derived and is shown as follows:

The first column in Table 18 relates to actual sales; the seconds to the cumulative total which is found by adding each month’s sales to the total of preceding sales. Thus, January 520 plus February 380 produces the February cumulative total of 900; the March cumulative total is found by adding the March sales of 480 to the previous cumulative total of 900 and is, therefore, 1,380.

The 12 months moving total is found by adding the sales in the current to the total of the previous 12 months and then subtracting the corresponding month for last year.

Month
2004

Actual
Sales

$

Cumulative
Total

$

12 months
moving total
$

January

520

520

6890

February

380

900

6690

March

480

1380

6480

April

490

1870

6290

May

370

2240

5950

June

390

2630

5680

July

350

2980

5400

August

440

3420

5370

September

360

3780

5250

Showing processed monthly sales data, producing a cumulative total and a 12 months moving total.

For example, the 12 months moving total for 2003 is 7,310 (see the above first table). Add to this the January 2004 item 520 which totals 7,830 subtract the corresponding month last year, i.e. the January 2003 item of 940 and the result is the January 2004, 12 months moving total, 6,890.

The 12 months moving total is particularly useful device in forecasting because it includes all the seasonal fluctuations in the last 12 months period irrespective of the month from which it is calculated. The year could start in June and end the next July and contain all the seasonal patterns.

The two groups of data, cumulative totals and the 12 month moving totals shown in the above table are then plotted (A and B), along a line that continues their present trend to the end of the year where they meet:

Forecasting by the Z-Chart

Forecasting by the Z-Chart
Click on the image to enlarge it

In the above figure, A and B represent the 12 months moving total,and the cumulative data, respectively, while their projections into future are shown by the doted lines.

Notice that, the 12 months accumulation of sales figures is bound to meet the 12 months moving total as they represent different ways of obtaining the same total. In the above figure these lines meet at $4,800, indicating the total sales for the year and forming a simple and approximate method of short-term forecasting.

The above illustrative monthly numerical example approach might be adapted carefully to your set of time series data with any equally spaced intervals.

As an alternative to graphical method, one may fit a linear regression based on the data of lines A and/or B available from the above table, and then extrapolate to obtain short-term forecasting with a desirable confidence level.

Concluding Remarks: A time series is a sequence of observations which are ordered in time. Inherent in the collection of data taken over time is some form of random variation. There exist methods for reducing of canceling the effect due to random variation. Widely used techniques are “smoothing”. These techniques, when properly applied, reveals more clearly the underlying trends. In other words, smoothing techniques are used to reduce irregularities (random fluctuations) in time series data. They provide a clearer view of the true underlying behavior of the series.

Exponential smoothing has proven through the years to be very useful in many forecasting situations. Holt first suggested it for non-seasonal time series with or without trends. Winters generalized the method to include seasonality, hence the name: Holt-Winters Method. Holt-Winters method has 3 updating equations, each with a constant that ranges from (0 to 1). The equations are intended to give more weight to recent observations and less weight to observations further in the past. This form of exponential smoothing can be used for less-than-annual periods (e.g., for monthly series). It uses smoothing parameters to estimate the level, trend, and seasonality. Moreover, there are two different procedures, depending on whether seasonality is modeled in an additive or multiplicative way. We will present its multiplicative version; the additive can be applied on an ant-logarithmic function of the data.

The single exponential smoothing emphasizes the short-range perspective; it sets the level to the last observation and is based on the condition that there is no trend. The linear regression, which fits a least squares line to the historical data (or transformed historical data), represents the long range, which is conditioned on the basic trend. Holt’s linear exponential smoothing captures information about recent trend. The parameters in Holt’s model are the levels-parameter which should be decreased when the amount of data variation is large, and trends-parameter should be increased if the recent trend direction is supported by the causal some factors.

Since finding three optimal, or even near optimal, parameters for updating equations is not an easy task, an alternative approach to Holt-Winters methods is to deseasonalize the data and then use exponential smoothing. Moreover, in some time series, seasonal variation is so strong it obscures any trends or cycles, which are very important for the understanding of the process being observed. Smoothing can remove seasonality and makes long term fluctuations in the series stand out more clearly. A simple way of detecting trend in seasonal data is to take averages over a certain period. If these averages change with time we can say that there is evidence of a trend in the series.

How to compare several smoothing methods: Although there are numerical indicators for assessing the accuracy of the forecasting technique, the most widely approach is in using visual comparison of several forecasts to assess their accuracy and choose among the various forecasting methods. In this approach, one must plot (using, e.g., Excel) on the same graph the original values of a time series variable and the predicted values from several different forecasting methods, thus facilitating a visual comparison.

You may like using Forecasting by Smoothing Techniques JavaScript.

Further Reading:

Yar, M and C. Chatfield (1990), Prediction intervals for the Holt-Winters forecasting procedure, International Journal of Forecasting 6, 127-137.

Filtering Techniques: Often on must filters an entire, e.g., financial time series with certain filter specifications to extract useful information by a transfer function expression. The aim of a filter function is to filter a time series in order to extract useful information hidden in the data, such as cyclic component. The filter is a direct implementation of and input-output function.

Data filtering is widely used as an effective and efficient time series modeling tool by applying an appropriate transformation technique. Most time series analysis techniques involve some form of filtering out noise in order to make the pattern more salient.

Differencing: A special type of filtering which is particularly useful for removing a trend, is simply to difference a given time series until it becomes stationary. This method is useful in Box-Jenkins modeling. For non-seasonal data, first order differencing is usually sufficient to attain apparent stationarity, so that the new series is formed from the original series.

Adaptive Filtering Any smoothing techniques such as moving average which includes a method of learning from past errors can respond to changes in the relative importance of trend, seasonal, and random factors. In the adaptive exponential smoothing method, one may adjust a to allow for shifting patterns.

Hodrick-Prescott Filter:
The Hodrick-Prescott filter or H-P filter is an algorithm for choosing smoothed values for a time series. The H-P filter chooses smooth values {st} for the series {xt} of T elements (t = 1 to T) that solve the following minimization problem:

min { {(xt-st)2 … etc. }

the positive parameter l is the penalty on variation, where variation is measured by the average squared second difference. A larger value of l makes the resulting {st} series smoother; less high-frequency noise. The commonly applied value of l is 1600.

For the study of business cycles one uses not the smoothed series, but the jagged series of residuals from it. H-P filtered data shows less fluctuation than first-differenced data, since the H-P filter pays less attention to high frequency movements. H-P filtered data also shows more serial correlation than first-differenced data.

This is a smoothing mechanism used to obtain a long term trend component in a time series. It is a way to decompose a given series into stationary and non-stationary components in such a way that their sum of squares of the series from the non-stationary component is minimum with a penalty on changes to the derivatives of the non-stationary component.

Kalman Filter: The Kalman filter is an algorithm for sequentially updating a linear projection for a dynamic system that is in state-space representation. Application of the Kalman filter transforms a system of the following two-equation kind into a more solvable form:

x t+1=Axt+Cw t+1, and yt=Gxt+vt in which: A, C, and G are matrices known as functions of a parameter q about which inference is desired where: t is a whole number, usually indexing time; xt is a true state variable, hidden from the econometrician; yt is a measurement of x with scaling factor G, and measurement errors vt, wt are innovations to the hidden xt process, E(wt+1wt’)=1 by normalization (where, ‘ means the transpose), E(vtvt)=R, an unknown matrix, estimation of which is necessary but ancillary to the problem of interest, which is to get an estimate of q. The Kalman filter defines two matrices St and Kt such that the system described above can be transformed into the one below, in which estimation and inference about q and R is more straightforward; e.g., by regression analysis:

zt+1=Azt+Kat, and yt=Gzt+at where zt is defined to be Et-1xt, at is defined to be yt-E(yt-1yt, K is defined to be limit Kt as t approaches infinity.

The definition of those two matrices St and Kt is itself most of the definition of the Kalman filters:
Kt=AStG'(GStG’+R)-1, and St-1=(A-KtG)St (A-KtG)’+CC’+Kt RKt’ , Kt is often called the Kalman gain.

Further Readings:

Hamilton J, Time Series Analysis, Princeton University Press, 1994.
Harvey A., Forecasting, Structural Time Series Models and the Kalman Filter, Cambridge University Press, 1991.

Cardamone E., From Kalman to Hodrick-Prescott Filter, 2006.
Mills T., The Econometric Modelling of Financial Time Series, Cambridge University Press, 1995.

Neural Network: For time series forecasting, the prediction model of order p, has the general form:

Dt = f (Dt-1, Dt-1,…, Dt-p) + et

Neural network architectures can be trained to predict the future values of the dependent variables. What is required are design of the network paradigm and its parameters. The multi-layer feed-forward neural network approach consists of an input layer, one or several hidden layers and an output layer. Another approach is known as the partially recurrent neural network that can learn sequences as time evolves and responds to the same input pattern differently at different times, depending on the previous input patterns as well. None of these approaches is superior to the other in all cases; however, an additional dampened feedback, that possesses the characteristics of a dynamic memory, will improve the performance of both approaches.

Outlier Considerations: Outliers are a few observations that are not well fitted by the “best” available model. In practice, any observation with standardized residual greater than 2.5 in absolute value is a candidate for being an outlier. In such case, one must first investigate the source of data. If there is no doubt about the accuracy or veracity of the observation, then it should be removed, and the model should be refitted.

Whenever data levels are thought to be too high or too low for “business as usual”, we call such points the outliers. A mathematical reason to adjust for such occurrences is that the majority of forecast techniques are based on averaging. It is well known that arithmetic averages are very sensitive to outlier values; therefore, some alteration should be made in the data before continuing. One approach is to replace the outlier by the average of the two sales levels for the periods, which immediately come before and after the period in question and put this number in place of the outlier. This idea is useful if outliers occur in the middle or recent part of the data. However, if outliers appear in the oldest part of the data, we may follow a second alternative, which is to simply throw away the data up to and including the outlier.

In light of the relative complexity of some inclusive but sophisticated forecasting techniques, we recommend that management go through an evolutionary progression in adopting new forecast techniques. That is to say, a simple forecast method well understood is better implemented than one with all inclusive features but unclear in certain facets.

Modeling and Simulation: Dynamic modeling and simulation is the collective ability to understand the system and implications of its changes over time including forecasting. System Simulation is the mimicking of the operation of a real system, such as the day-to-day operation of a bank, or the value of a stock portfolio over a time period. By advancing the simulation run into the future, managers can quickly find out how the system might behave in the future, therefore making decisions as they deem appropriate.

In the field of simulation, the concept of “principle of computational equivalence” has beneficial implications for the decision-maker. Simulated experimentation accelerates and replaces effectively the “wait and see” anxieties in discovering new insight and explanations of future behavior of the real system.

Probabilistic Models: Uses probabilistic techniques, such as Marketing Research Methods, to deal with uncertainty, gives a range of possible outcomes for each set of events. For example, one may wish to identify the prospective buyers of a new product within a community of size N. From a survey result, one may estimate the probability of selling p, and then estimate the size of sales as Np with some confidence level.

An Application: Suppose we wish to forecast the sales of new toothpaste in a community of 50,000 housewives. A free sample is given to 3,000 selected randomly, and then 1,800 indicated that they would buy the product.

Using the binomial distribution with parameters (3000, 1800/3000), the standard error is 27, and the expected sale is 50000(1800/3000) = 30000. The 99.7% confidence interval is within 3 times standard error 3(27) = 81 times the total population ratio 50000/3000; i.e., 1350. In other words, the range (28650, 31350) contains the expected sales.

Event History Analysis: Sometimes data on the exact time of a particular event (or events) are available, for example on a group of patients. Examples of events could include asthma attack; epilepsy attack; myocardial infections; hospital admissions. Often, occurrence (and non-occurrence) of an event is available on a regular basis, e.g., daily and the data can then be thought of as having a repeated measurements structure. An objective may be to determine whether any concurrent events or measurements have influenced the occurrence of the event of interest. For example, daily pollen counts may influence the risk of asthma attacks; high blood pressure might precede a myocardial infarction. One may use PROC GENMOD available in SAS for the event history analysis.

Predicting Market Response: As applied researchers in business and economics, faced with the task of predicting market response, we seldom know the functional form of the response. Perhaps market response is a nonlinear monotonic, or even a non-monotonic function of explanatory variables. Perhaps it is determined by interactions of explanatory variable. Interaction is logically independent of its components.

When we try to represent complex market relationships within the context of a linear model, using appropriate transformations of explanatory and response variables, we learn how hard the work of statistics can be. Finding reasonable models is a challenge, and justifying our choice of models to
our peers can be even more of a challenge. Alternative specifications abound.

Modern regression methods, such as generalized additive models, multivariate adaptive regression splines, and regression trees, have one clear advantage: They can be used without specifying a functional form in advance. These data-adaptive, computer- intensive methods offer a more flexible approach to modeling than traditional statistical methods. How well do modern regression methods perform in predicting market response? Some perform quite well based on the results of simulation studies.

Delphi Analysis: Delphi Analysis is used in the decision making process, in particular in forecasting. Several “experts” sit together and try to compromise on something upon which they cannot agree.

System Dynamics Modeling: System dynamics (SD) is a tool for scenario analysis. Its main modeling tools are mainly the dynamic systems of differential equations and
simulation. The SD approach to modeling is an important one for the following, not the least of which is that e.g., econometrics is the established methodology of system dynamics. However, from a philosophy of social science perspective, SD is deductive and econometrics is inductive. SD is less tightly bound to actuarial data and thus is free to expand out and examine more complex, theoretically informed, and postulated relationships. Econometrics is more tightly bound to the data and the models it explores, by comparison, are simpler. This is not to say the one is better than the other: properly understood and combined, they are complementary. Econometrics examines historical relationships through correlation and least squares regression model to compute the fit. In contrast, consider a simple growth scenario analysis; the initial growth portion of say, population is driven by the amount of food available. So
there is a correlation between population level and food. However, the usual econometrics techniques are limited in their scope. For example, changes in the direction of the growth
curve for a time population is hard for an econometrics model to capture.

Further Readings:

Delbecq, A., Group Techniques for Program Planning, Scott Foresman, 1975.
Gardner H.S., Comparative Economic Systems, Thomson Publishing, 1997.
Hirsch M., S. Smale, and R. Devaney, Differential Equations, Dynamical Systems, and an Introduction to Chaos, Academic Press, 2004.
Lofdahl C., Environmental Impacts of Globalization and Trade: A Systems Study, MIT Press, 2002.

Combination of Forecasts: Combining forecasts merges several separate sets of forecasts to form a better composite forecast. The main question is “how to find the optimal combining weights?” The widely used approach is to change the weights from time to time for a better forecast rather than using a fixed set of weights on a regular basis or otherwise.

All forecasting models have either an implicit or explicit error structure, where error is defined as the difference between the model prediction and the “true” value. Additionally, many data snooping methodologies within the field of statistics need to be applied to data supplied to a forecasting model. Also, diagnostic checking, as defined within the field of statistics, is required for any model which uses data.

Using any method for forecasting one must use a performance measure to assess the quality of the method. Mean Absolute Deviation (MAD), and Variance are the most useful measures. However, MAD does not lend itself to making further inferences, but the standard error does. For error analysis purposes, variance is preferred since variances of independent (uncorrelated) errors are additive; however, MAD is not additive.

Regression and Moving Average: When a time series is not a straight line one may use the moving average (MA) and break-up the time series into several intervals with common straight line with positive trends to achieve linearity for the whole time series. The process involves transformation based on slope and then a moving average within that interval. For most business time series, one the following transformations might be effective:

  • slope/MA,
  • log (slope),
  • log(slope/MA),
  • log(slope) – 2 log(MA).

Further Readings:

Armstrong J., (Ed.), Principles of Forecasting: A Handbook for Researchers and Practitioners, Kluwer Academic Publishers, 2001.
Arsham H., Seasonal and cyclic forecasting in small firm, American Journal of Small Business, 9, 46-57, 1985.
Brown H., and R. Prescott, Applied Mixed Models in Medicine, Wiley, 1999.
Cromwell J., W. Labys, and M. Terraza, Univariate Tests for Time Series Models, Sage Pub., 1994.
Ho S., M. Xie, and T. Goh, A comparative study of neural network and Box-Jenkins ARIMA modeling in time series prediction, Computers & Industrial Engineering, 42, 371-375, 2002.
Kaiser R., and A. Maravall, Measuring Business Cycles in Economic Time Series, Springer, 2001. Has a good coverage on Hodrick-Prescott Filter among other related topics.
Kedem B., K. Fokianos, Regression Models for Time Series Analysis, Wiley, 2002.
Kohzadi N., M. Boyd, B. Kermanshahi, and I. Kaastra , A comparison of artificial neural network and time series models for forecasting commodity prices, Neurocomputing, 10, 169-181, 1996.
Krishnamoorthy K., and B. Moore, Combining information for prediction in linear regression, Metrika, 56, 73-81, 2002.
Schittkowski K., Numerical Data Fitting in Dynamical Systems: A Practical Introduction with Applications and Software, Kluwer Academic Publishers, 2002. Gives an overview of numerical methods that are needed to compute parameters of a dynamical model by a least squares fit.

How to Do Forecasting by Regression Analysis

Introduction

Regression is the study of relationships among variables, a principal purpose of which is to predict, or estimate the value of one variable from known or assumed values of other variables related to it.

Variables of Interest: To make predictions or estimates, we must identify the effective predictors of the variable of interest: which variables are important indicators? and can be measured at the least cost? which carry only a little information? and which are redundant?

Predicting the Future Predicting a change over time or extrapolating from present conditions to future conditions is not the function of regression analysis. To make estimates of the future, use time series analysis.

Experiment: Begin with a hypothesis about how several variables might be related to another variable and the form of the relationship.

Simple Linear Regression: A regression using only one predictor is called a simple regression.

Multiple Regressions: Where there are two or more predictors, multiple regressions analysis is employed.

Data: Since it is usually unrealistic to obtain information on an entire population, a sample which is a subset of the population is usually selected. For example, a sample may be either randomly selected or a researcher may choose the x-values based on the capability of the equipment utilized in the experiment or the experiment design. Where the x-values are pre-selected, usually only limited inferences can be drawn depending upon the particular values chosen. When both x and y are randomly drawn, inferences can generally be drawn over the range of values in the sample.

Scatter Diagram: A graphical representation of the pairs of data called a scatter diagram can be drawn to gain an overall view of the problem. Is there an apparent relationship? Direct? Inverse? If the points lie within a band described by parallel lines, we can say there is a linear relationship between the pair of x and y values. If the rate of change is generally not constant, then the relationship is curvilinear.

The Model: If we have determined there is a linear relationship between t and y we want a linear equation stating y as a function of x in the form Y = a + bx + e where a is the intercept, b is the slope and e is the error term accounting for variables that affect y but are not included as predictors, and/or otherwise unpredictable and uncontrollable factors.

Least-Squares Method: To predict the mean y-value for a given x-value, we need a line which passes through the mean value of both x and y and which minimizes the sum of the distance between each of the points and the predictive line. Such an approach should result in a line which we can call a “best fit” to the sample data. The least-squares method achieves this result by calculating the minimum average squared deviations between the sample y points and the estimated line. A procedure is used for finding the values of a and b which reduces to the solution of simultaneous linear equations. Shortcut formulas have been developed as an alternative to the solution of simultaneous equations.

Solution Methods: Techniques of Matrix Algebra can be manually employed to solve simultaneous linear equations. When performing manual computations, this technique is especially useful when there are more than two equations and two unknowns.

Several well-known computer packages are widely available and can be utilized to relieve the user of the computational problem, all of which can be used to solve both linear and polynomial equations: the BMD packages (Biomedical Computer Programs) from UCLA; SPSS (Statistical Package for the Social Sciences) developed by the University of Chicago; and SAS (Statistical Analysis System). Another package that is also available is IMSL, the International Mathematical and Statistical Libraries, which contains a great variety of standard mathematical and statistical calculations. All of these software packages use matrix algebra to solve simultaneous equations.

Use and Interpretation of the Regression Equation: The equation developed can be used to predict an average value over the range of the sample data. The forecast is good for short to medium ranges.

Measuring Error in Estimation: The scatter or variability about the mean value can be measured by calculating the variance, the average squared deviation of the values around the mean. The standard error of estimate is derived from this value by taking the square root. This value is interpreted as the average amount that actual values differ from the estimated mean.

Confidence Interval: Interval estimates can be calculated to obtain a measure of the confidence we have in our estimates that a relationship exists. These calculations are made using t-distribution tables. From these calculations we can derive confidence bands, a pair of non-parallel lines narrowest at the mean values which express our confidence in varying degrees of the band of values surrounding the regression equation.

Assessment: How confident can we be that a relationship actually exists? The strength of that relationship can be assessed by statistical tests of that hypothesis, such as the null hypothesis, which are established using t-distribution, R-squared, and F-distribution tables. These calculations give rise to the standard error of the regression coefficient, an estimate of the amount that the regression coefficient b will vary from sample to sample of the same size from the same population. An Analysis of Variance (ANOVA) table can be generated which summarizes the different components of variation.

When you want to compare models of different size (different numbers of independent variables and/or different sample sizes) you must use the Adjusted R-Squared, because the usual R-Squared tends to grow with the number of independent variables.

The Standard Error of Estimate, i.e. square root of error mean square, is a good indicator of the “quality” of a prediction model since it “adjusts” the Mean Error Sum of Squares (MESS) for the number of predictors in the model as follow:

MESS = Error Sum of Squares/(N – Number of Linearly Independent Predictors)

If one keeps adding useless predictors to a model, the MESS will become less and less stable. R-squared is also influenced by the range of your dependent value; so, if two models have the same residual mean square but one model has a much narrower range of values for the dependent variable that model will have a higher R-squared. This explains the fact that both models will do as well for prediction purposes.

You may like using the Regression Analysis with Diagnostic Tools JavaScript to check your computations, and to perform some numerical experimentation for a deeper understanding of these concepts.

Predictions by Regression

The regression analysis has three goals: predicting, modeling, and characterization. What would be the logical order in which to tackle these three goals such that one task leads to and /or and justifies the other tasks? Clearly, it depends on what the prime objective is. Sometimes you wish to model in order to get better prediction. Then the order is obvious. Sometimes, you just want to understand and explain what is going on. Then modeling is again the key, though out-of-sample predicting may be used to test any model. Often modeling and predicting proceed in an iterative way and there is no ‘logical order’ in the broadest sense. You may model to get predictions, which enable better control, but iteration is again likely to be present and there are sometimes special approaches to control problems.

The following contains the main essential steps during modeling and analysis of regression model building, presented in the context of an applied numerical example.

Formulas and Notations:

  • = Sx /n
    This is just the mean of the x values.
  • = Sy /n
    This is just the mean of the y values.
  • Sxx = SSxx = S(x(i)
    )2 = Sx2 – ( Sx)2 / n
  • Syy = SSyy = S(y(i) – )2 = Sy2 – ( Sy) 2 / n
  • Sxy = SSxy = S(x(i) – )(y(i) – ) = Sx ×y – (Sx) × (Sy) / n
  • Slope m = SSxy / SSxx
  • Intercept, b = – m .
  • y-predicted = yhat(i) = m×x(i) + b.
  • Residual(i) = Error(i) = y – yhat(i).
  • SSE = Sres = SSres = SSerrors = S[y(i) – yhat(i)]2.
  • Standard deviation of residuals = s = Sres = Serrors = [SSres / (n-2)]1/2.
  • Standard error of the slope (m) = Sres / SSxx1/2.
  • Standard error of the intercept (b) = Sres[(SSxx + n. 2) /(n × SSxx] 1/2.

An Application: A taxicab company manager believes that the monthly repair costs (Y) of cabs are related to age (X) of the cabs. Five cabs are selected randomly and from their records we obtained the following data: (x, y) = {(2, 2), (3, 5), (4, 7), (5, 10), (6, 11)}. Based on our practical knowledge and the scattered diagram of the data, we hypothesize a linear relationship between predictor X, and the cost Y.

Now the question is how we can best (i.e., least square) use the sample information to estimate the unknown slope (m) and the intercept (b)? The first step in finding the least square line is to construct a sum of squares table to find the sums of x values (Sx), y values (Sy), the squares of the x values (Sx2), the squares of the x values (Sy2), and the cross-product of the corresponding x and y values (Sxy), as shown in the following table:

 

x

y

x2

xy

y2

2

2

4

4

4

3

5

9

15

25

4

7

16

28

49

5

10

25

50

100

6

11

36

66

121

SUM

20

35

90

163

299

The second step is to substitute the values of Sx, Sy, Sx2, Sxy, and Sy2 into the following formulas:

SSxy = Sxy – (Sx)(Sy)/n = 163 – (20)(35)/5 = 163 – 140 = 23

SSxx = Sx2 – (Sx)2/n = 90 – (20)2/5 = 90- 80 = 10

SSyy = Sy2 – (Sy)2/n = 299 – 245 = 54

Use the first two values to compute the estimated slope:

Slope = m = SSxy / SSxx = 23 / 10 = 2.3

To estimate the intercept of the least square line, use the fact that the graph of the least square line always pass through (, ) point, therefore,

The intercept = b = – (m)() = (Sy)/ 5 – (2.3) (Sx/5) = 35/5 – (2.3)(20/5) = -2.2

Therefore the least square line is:

y-predicted = yhat = mx + b = -2.2 + 2.3x.

After estimating the slope and the intercept the question is how we determine statistically if the model is good enough, say for prediction. The standard error of slope is:

Standard error of the slope (m)= Sm = Sres / Sxx1/2,

and its relative precision is measured by statistic

tslope = m / Sm.

For our numerical example, it is:

tslope = 2.3 / [(0.6055)/ (101/2)] = 12.01

which is large enough, indication that the fitted model is a “good” one.

You may ask, in what sense is the least squares line the “best-fitting” straight line to 5 data points. The least squares criterion chooses the line that minimizes the sum of square vertical deviations, i.e., residual = error = y – yhat:

SSE = S (y – yhat)2 = S(error)2 = 1.1

The numerical value of SSE is obtained from the following computational table for our numerical example.

 

x
Predictor

-2.2+2.3x
y-predicted

y
observed

error
y

squared
errors

2

2.4

2

-0.4

0.16

3

4.7

5

0.3

0.09

4

7

7

0

0

5

9.3

10

0.7

0.49

6

11.6

11

-0.6

0.36

Sum=0

Sum=1.1

Alternately, one may compute SSE by:

SSE = SSyy – m SSxy = 54 – (2.3)(23) = 54 – 52.9 = 1.1,

as expected

Notice that this value of SSE agrees with the value directly computed from the above table. The numerical value of SSE gives the estimate of variation of the errors s2:

s2 = SSE / (n -2) = 1.1 / (5 – 2) = 0.36667

The estimate the value of the error variance is a measure of variability of the y values about the estimated line. Clearly, we could also compute the estimated standard deviation s of the residuals by taking the square roots of the variance s2.

As the last step in the model building, the following Analysis of Variance (ANOVA) table is then constructed to assess the overall goodness-of-fit using the F-statistics:

Analysis of Variance Components

Source

DF

Sum of
Squares

Mean
Square

F Value

Prob > F

Model

1

52.90000

52.90000

144.273

0.0012

Error

3

SSE = 1.1

0.36667

Total

4

SSyy = 54

For practical proposes, the fit is considered acceptable if the F-statistic is more than five-times the F-value from the F distribution tables at the back of your textbook. Note that, the criterion that the F-statistic must be more than five-times the F-value from the F distribution tables is independent of the sample size.

Notice also that there is a relationship between the two statistics that assess the quality of the fitted line, namely the T-statistics of the slope and the F-statistics in the ANOVA table. The relationship is:

t2slope = F

This relationship can be verified for our computational example.

Predictions by Regression: After we have statistically checked the goodness of-fit of the model and the residuals conditions are satisfied, we are ready to use the model for prediction with confidence. Confidence interval provides a useful way of assessing the quality of prediction. In prediction by regression often one or more of the following constructions are of interest:

  1. A confidence interval for a single future value of Y corresponding to a chosen value of X.
  2. A confidence interval for a single pint on the line.
  3. A confidence region for the line as a whole.

Confidence Interval Estimate for a Future Value: A confidence interval of interest can be used to evaluate the accuracy of a single (future) value of y corresponding to a chosen value of X (say, X0). This JavaScript provides confidence interval for an estimated value Y corresponding to X0 with a desirable confidence level 1 – a.

Yp ± Se . tn-2, a/2 {1/n + (X0 – )2/ Sx}1/2

Confidence Interval Estimate for a Single Point on the Line: If a particular value of the predictor variable (say, X0) is of special importance, a confidence interval on the value of the criterion variable (i.e. average Y at X0) corresponding to X0 may be of interest. This JavaScript provides confidence interval on the estimated value of Y corresponding to X0 with a desirable confidence level 1 – a.

Yp ± Se . tn-2, a/2 { 1 + 1/n + (X0 – )2/ Sx}1/2

It is of interest to compare the above two different kinds of confidence interval. The first kind has larger confidence interval that reflects the less accuracy resulting from the estimation of a single future value of y rather than the mean value computed for the second kind confidence interval. The second kind of confidence interval can also be used to identify any outliers in the data.

Confidence Region the Regression Line as the Whole: When the entire line is of interest, a confidence region permits one to simultaneously make confidence statements about estimates of Y for a number of values of the predictor variable X. In order that region adequately covers the range of interest of the predictor variable X; usually, data size must be more than 10 pairs of observations.

Yp ± Se { (2 F2, n-2, a) . [1/n + (X0 – )2/ Sx]}1/2

In all cases the JavaScript provides the results for the nominal (x) values. For other values of X one may use computational methods directly, graphical method, or using linear interpolations to obtain approximated results. These approximation are in the safe directions i.e., they are slightly wider that the exact values.

Planning, Development, and Maintenance of a Linear Model