Moving Average Model with n Delays

April 22, 2025

Problem Description

We consider a system with a single input and a single output. The output of the system depends on nn past inputs received at different times 1,2,โ€ฆ,N+11, 2, \dots, N+1. We express this as:

yk=h1xkโˆ’n+1+h2xkโˆ’n+2+โ‹ฏ+hnโˆ’1xkโˆ’1+hnxk,y_k = h_1 x_{k - n + 1} + h_2 x_{k - n + 2} + \dots + h_{n - 1} x_{k - 1} + h_n x_k,

where h={h1,h2,โ€ฆ,hn}h = \{ h_1, h_2, \dots, h_n \}, for all k=nโˆ’1,n,โ€ฆ,Nk = n - 1, n, \dots, N. We are given N+1N + 1 inputs x0,x1,โ€ฆ,xNx_0, x_1, \dots, x_N and corresponding outputs y0,y1,โ€ฆ,yNy_0, y_1, \dots, y_N. However, not all outputs can be represented by the above formula. We refer to the outputs that can be represented as valid outputs. A valid output requires exactly nn consecutive past inputs, as specified by the formula.

The number of valid outputs is:

Nโˆ’(nโˆ’1)+1=Nโˆ’n+2.N - (n - 1) + 1 = N - n + 2.

Thus, we obtain a system of Nโˆ’n+2N - n + 2 equations. Assuming that Nโˆ’n+2โ‰ซnN - n + 2 \gg n, we interpret this as an overdetermined system, meaning we have more equations than unknowns. Such a system can have either zero, one, or infinitely many solutions, with the first case being the most common.

Our objective is to minimize the sum of squared errors between predicted and actual outputs:

minโกhโˆฅAhโˆ’yโˆฅ2,\min_h \| A h - y \|^2,

where the solution is given by the least-squares estimate:

h=A+y.h = A^+ y.

The system can be expressed more concisely as:

Ah=y,A h = y,

where:

  • AA is the system matrix,
  • h=[h1,h2,โ€ฆ,hn]Th = [h_1, h_2, \dots, h_n]^T is the vector of unknowns,
  • yy is the right-hand side vector.

Since we will later work in the Julia programming language, we adopt 1-based indexing for the following representation:

A=[x1x2โ€ฆxnx2x3โ€ฆxn+1โ‹ฎโ‹ฎโ‹ฑโ‹ฎxNโˆ’n+2xNโˆ’n+3โ€ฆxN+1],A = \begin{bmatrix} x_1 & x_2 & \dots & x_n \\ x_2 & x_3 & \dots & x_{n+1} \\ \vdots & \vdots & \ddots & \vdots \\ x_{N - n + 2} & x_{N - n + 3} & \dots & x_{N + 1} \end{bmatrix}, h=[h1h2โ‹ฎhn],y=[ynyn+1โ‹ฎyN+1].h = \begin{bmatrix} h_1 \\ h_2 \\ \vdots \\ h_n \end{bmatrix}, \quad y = \begin{bmatrix} y_n \\ y_{n+1} \\ \vdots \\ y_{N+1} \end{bmatrix}.

If hh is known, valid outputs can be predicted from the input sequence x0,x1,โ€ฆ,xmx_0, x_1, \dots, x_m using the original formula (1). However, only the valid outputs (which have sufficient past data) can be predicted.

Analysis of the Special Case: Constant Inputs and Outputs

Suppose that all inputs and outputs are constant, i.e., xi=yi=1x_i = y_i = 1. We aim to find the solution hh for arbitrary NN and nn. The system can be represented as:

Ah=y,A h = y,

where:

  • AA is an (Nโˆ’n+2)ร—n(N - n + 2) \times n matrix of ones,
  • hh is an nร—1n \times 1 vector,
  • yy is an (Nโˆ’n+2)ร—1(N - n + 2) \times 1 vector of ones.

We observe that each row of AA imposes the same constraint h1+h2+โ‹ฏ+hn=1h_1 + h_2 + \dots + h_n = 1 and remember we seek coefficients that minimize the sum of squared errors:

minโกhโˆฅAhโˆ’yโˆฅ2.\min_h \| A h - y \|^2.

We know that the Moore-Penrose pseudoinverse minimizes this quantity (see proof in source, page 37).

Since every row of AA is identical, we can express it as the outer product of two vectors:

A=uvT,A = u v^T,

where:

  • AโˆˆR(Nโˆ’n+2)ร—nA \in \mathbb{R}^{(N - n + 2) \times n},
  • u=[11โ€ฆ1]TโˆˆR(Nโˆ’n+2)u = \begin{bmatrix} 1 & 1 & \dots & 1 \end{bmatrix}^T \in \mathbb{R}^{(N - n + 2)},
  • v=[11โ€ฆ1]TโˆˆRnv = \begin{bmatrix} 1 & 1 & \dots & 1 \end{bmatrix}^T \in \mathbb{R}^{n}.

Using Singular Value Decomposition (SVD), we express AA as:

A=UฮฃVT,A = U \Sigma V^T,

with the pseudoinverse given by:

A+=Vฮฃ+UT.A^+ = V \Sigma^+ U^T.

To find ATAโˆˆRnร—nA^T A \in \mathbb{R}^{n \times n}, we compute:

ATA=vuTuvT=โˆฅuโˆฅ2vvT=โˆฅuโˆฅ2โˆฅvโˆฅ2=(Nโˆ’n+2)nI,A^T A = v u^T u v^T = \| u \|^2 v v^T = \| u \|^2 \| v \|^2 = (N - n + 2) n I,

which has a single nonzero eigenvalue:

ฮป1=(Nโˆ’n+2)n.\lambda_1 = (N - n + 2) n.

Thus, the only nonzero singular value is:

ฯƒ1=(Nโˆ’n+2)n.\sigma_1 = \sqrt{(N - n + 2) n}.

Next, we use the definition of the eigenvalue equation ATAz=ฮปzA^T A z = \lambda z where zโˆˆRnz \in \mathbb{R}^n is any nonzero vector:

ATAz=โˆฅuโˆฅ2vvTz.A^T A z = \| u \|^2 v v^T z.

Now let z=vz = v, for which we know it is a nonzero vector full of ones:

ATAv=โˆฅuโˆฅ2vvTv=โˆฅuโˆฅ2โˆฅvโˆฅ2v=ATAv.A^T A v = \| u \|^2 v v^T v = \| u \|^2 \| v \|^2 v = A^T A v.

Therefore, vv is an eigenvector of ATAA^T A. To obtain the right singular vector, we normalize vv:

v1=vโˆฅvโˆฅ=1nv.v_1 = \frac{v}{\| v \|} = \frac{1}{\sqrt{n}} v.

Similarly, the left singular vector is obtained as:

u1=Av1ฯƒ1=1nโ‹…uvTv(Nโˆ’n+2)n=n(Nโˆ’n+2)nu.u_1 = \frac{A v_1}{\sigma_1} = \frac{1}{\sqrt{n}} \cdot \frac{u v^T v}{\sqrt{(N - n + 2) n}} = \frac{\sqrt{n}}{\sqrt{(N - n + 2) n}} u.

Thus, the pseudoinverse is given by:

A+=Vฮฃ+UT=1nvโ‹…1(Nโˆ’n+2)nโ‹…n(Nโˆ’n+2)nuT=1(Nโˆ’n+2)nAT.A^+ = V \Sigma^+ U^T = \frac{1}{\sqrt{n}} v \cdot \frac{1}{\sqrt{(N - n + 2) n}} \cdot \frac{\sqrt{n}}{\sqrt{(N - n + 2) n}} u^T = \frac{1}{(N - n + 2) n} A^T.

Finally, we can express hh as:

h=1(Nโˆ’n+2)nATy=1(Nโˆ’n+2)nโ‹…(Nโˆ’n+2)โ‹…v=1nv.h = \frac{1}{(N - n + 2) n} A^T y = \frac{1}{(N - n + 2) n} \cdot (N - n + 2) \cdot v = \frac{1}{n} v.

Test Case: Constant Inputs and Outputs

The code implementation for this test case is as follows:

function test_case_constant(N::Int, n::Int)
    println("Test case (constant case)")
    u = ones(N - n + 2)
    v = ones(n)
    y = ones(N - n + 2)

    A = u * v'

    h_derived = (1 / n) * v

    h_movavg = movavg(u, y, n)

    println("h_derived: $h_derived")
    println("h_movavg: $h_movavg")
    println("Difference: ", h_derived - h_movavg')
    println("Test result: ", isapprox(h_derived, h_movavg', atol = sqrt(eps())))

    @test isapprox(h_derived, h_movavg', atol = sqrt(eps()))
end

Results

Test case (constant)
h_derived: [0.5, 0.5]
h_movavg: [0.4999999999999997 0.4999999999999997]
Difference: [2.7755575615628914e-16, 2.7755575615628914e-16]
Test result: true

Interpretation of Results

The results show that the derived solution and the one computed using the moving average filter are extremely close, with differences on the order of 10โˆ’1610^{-16}, which is due to floating-point precision. Specifically:

hderived=[0.5,0.5],hmovavg=[0.5,0.5],h_{\text{derived}} = [0.5, 0.5], \quad h_{\text{movavg}} = [0.5, 0.5],

indicating that both methods yield nearly identical coefficients. The minimal difference can be attributed to the precision of the numerical computations. Therefore, we conclude that the solution distributes the outputs evenly across the coefficients, resembling a moving average filter.

This confirms that the approach works as expected for this special case, where the inputs and outputs are constant. The system's behavior leads to an even distribution of the output values across the coefficients, as expected from the problem's structure.

Source Code Description

In this section, we provide an overview of the source code implemented in the Julia file accompanying this document. We explain the logic behind the most important functions and their mathematical foundations.

The core functions, movavg(X, Y, n) and prediction(X, h), serve as the foundation for the entire implementation. Therefore, we first derive these two functions.

Moving Average Computation

The function movavg(X, Y, n) takes three parameters:

  • XX โ€” the input data,
  • YY โ€” the corresponding output data,
  • nn โ€” the number of past inputs each output depends on.

The key idea follows from the least squares solution using the Moore-Penrose pseudoinverse. The function ensures that the vectors XX, YY, and hh are row vectors by using the transpose operator (') in Julia. We then construct the matrix AโˆˆR(Nโˆ’n+2)ร—nA \in \mathbb{R}^{(N - n + 2) \times n} as specified in Section 1 and extract only the valid outputs from nn to N+1N + 1. The coefficients hh are computed as:

h=A+Yvalid,h = A^{+} Y_{\text{valid}},

where A+A^{+} denotes the Moore-Penrose pseudoinverse of AA. The function returns hh as a row vector. In essence, movavg estimates the coefficients when given input data and corresponding outputs. We assume that XX and YY have the same length, N+1N + 1, with nn being much smaller than Nโˆ’n+2N - n + 2 (the number of available equations).

Prediction Function

The function prediction(X, h) follows a similar approach. However, instead of estimating coefficients, it assumes the coefficients are already known and stored in a row vector hh. Given an input vector XX, we construct the matrix AA in the same way as in movavg and compute the predicted outputs as:

y=Ah.y = A h.

The function returns the predicted values as a row vector.

Error Evaluation

To evaluate our predictions, we implement the evaluate_prediction(Y_pred, Y_test, n) function, which compares our predicted outputs against the actual outputs from a test set. We use the mean absolute error (MAE) as the metric for evaluation:

MAE=1Nโˆ’n+2โˆ‘i=nN+1โˆฃYtest,iโˆ’Ypred,iโˆฃ.\text{MAE} = \frac{1}{N - n + 2} \sum_{i = n}^{N + 1} \left| Y_{\text{test}, i} - Y_{\text{pred}, i} \right|.

This function returns the computed MAE value.

Interpretation of Results

In this section, we summarize our findings by analyzing the effect of the parameter nn on the prediction accuracy. The model was tested on the datasets io-train.txt (training set) and io-test.txt (test set) for various values of nn. The primary metric used for evaluation is the Mean Absolute Error (MAE), which measures the average deviation between predicted and actual values.

Impact of nn on Prediction Accuracy

The choice of nn directly influences the performance of the model. Our observations indicate that:

  • For very small values of nn (e.g., n=1,2,3n = 1, 2, 3), the MAE is quite high, suggesting poor predictive performance.
  • There is a noticeable drop in MAE at intermediate values of nn, reaching a local minimum around nโ‰ˆ10n \approx 10.
  • As nn increases further, the MAE stabilizes and converges to approximately 4.24.2.

To illustrate this behavior, we present two tables: the first focusing on small values of nn, and the second capturing the overall trend up to n=100n = 100.

MAE for Small Values of nn

nnMAE
14.316
25.208
34.420
558.361
101.714

Long-Term Behavior of MAE

nnMAE
101.714
203.707
303.916
504.469
704.421
1004.270

As observed in the second table, the MAE fluctuates initially but eventually stabilizes around 4.24.2, indicating diminishing returns for large nn. This suggests that increasing nn beyond a certain point does not significantly improve prediction accuracy and may introduce noise or overfitting.

These figures confirm the observed trends and provide a visual understanding of the modelโ€™s performance across different values of nn. As a conclusion, selecting an optimal nn is crucial; while very small values perform poorly, excessively large values offer diminishing returns and may lead to overfitting.


Datasets

๐Ÿ“Ž io-train.txt

๐Ÿ“Ž io-test.txt

๐Ÿ“Ž code.jl