Moving Average Model with n Delays

Problem Description

We consider a system with a single input and a single output. The output of the system depends on $n$ past inputs received at different times $1, 2, \dots, N+1$ . We express this as:

y_k = h_1 x_{k - n + 1} + h_2 x_{k - n + 2} + \dots + h_{n - 1} x_{k - 1} + h_n x_k,

where $h = \{ h_1, h_2, \dots, h_n \}$ , for all $k = n - 1, n, \dots, N$ . We are given $N + 1$ inputs $x_0, x_1, \dots, x_N$ and corresponding outputs $y_0, y_1, \dots, y_N$ . However, not all outputs can be represented by the above formula. We refer to the outputs that can be represented as valid outputs. A valid output requires exactly $n$ consecutive past inputs, as specified by the formula.

The number of valid outputs is:

N - (n - 1) + 1 = N - n + 2.

Thus, we obtain a system of $N - n + 2$ equations. Assuming that $N - n + 2 \gg n$ , we interpret this as an overdetermined system, meaning we have more equations than unknowns. Such a system can have either zero, one, or infinitely many solutions, with the first case being the most common.

Our objective is to minimize the sum of squared errors between predicted and actual outputs:

\min_h \| A h - y \|^2,

where the solution is given by the least-squares estimate:

h = A^+ y.

The system can be expressed more concisely as:

A h = y,

where:

$A$ is the system matrix,
$h = [h_1, h_2, \dots, h_n]^T$ is the vector of unknowns,
$y$ is the right-hand side vector.

Since we will later work in the Julia programming language, we adopt 1-based indexing for the following representation:

A = \begin{bmatrix} x_1 & x_2 & \dots & x_n \\ x_2 & x_3 & \dots & x_{n+1} \\ \vdots & \vdots & \ddots & \vdots \\ x_{N - n + 2} & x_{N - n + 3} & \dots & x_{N + 1} \end{bmatrix},

h = \begin{bmatrix} h_1 \\ h_2 \\ \vdots \\ h_n \end{bmatrix}, \quad y = \begin{bmatrix} y_n \\ y_{n+1} \\ \vdots \\ y_{N+1} \end{bmatrix}.

If $h$ is known, valid outputs can be predicted from the input sequence $x_0, x_1, \dots, x_m$ using the original formula (1). However, only the valid outputs (which have sufficient past data) can be predicted.

Analysis of the Special Case: Constant Inputs and Outputs

Suppose that all inputs and outputs are constant, i.e., $x_i = y_i = 1$ . We aim to find the solution $h$ for arbitrary $N$ and $n$ . The system can be represented as:

A h = y,

where:

$A$ is an $(N - n + 2) \times n$ matrix of ones,
$h$ is an $n \times 1$ vector,
$y$ is an $(N - n + 2) \times 1$ vector of ones.

We observe that each row of $A$ imposes the same constraint $h_1 + h_2 + \dots + h_n = 1$ and remember we seek coefficients that minimize the sum of squared errors:

\min_h \| A h - y \|^2.

We know that the Moore-Penrose pseudoinverse minimizes this quantity (see proof in source, page 37).

Since every row of $A$ is identical, we can express it as the outer product of two vectors:

A = u v^T,

where:

$A \in \mathbb{R}^{(N - n + 2) \times n}$ ,
$u = \begin{bmatrix} 1 & 1 & \dots & 1 \end{bmatrix}^T \in \mathbb{R}^{(N - n + 2)}$ ,
$v = \begin{bmatrix} 1 & 1 & \dots & 1 \end{bmatrix}^T \in \mathbb{R}^{n}$ .

Using Singular Value Decomposition (SVD), we express $A$ as:

A = U \Sigma V^T,

with the pseudoinverse given by:

A^+ = V \Sigma^+ U^T.

To find $A^T A \in \mathbb{R}^{n \times n}$ , we compute:

A^T A = v u^T u v^T = \| u \|^2 v v^T = \| u \|^2 \| v \|^2 = (N - n + 2) n I,

which has a single nonzero eigenvalue:

\lambda_1 = (N - n + 2) n.

Thus, the only nonzero singular value is:

\sigma_1 = \sqrt{(N - n + 2) n}.

Next, we use the definition of the eigenvalue equation $A^T A z = \lambda z$ where $z \in \mathbb{R}^n$ is any nonzero vector:

A^T A z = \| u \|^2 v v^T z.

Now let $z = v$ , for which we know it is a nonzero vector full of ones:

A^T A v = \| u \|^2 v v^T v = \| u \|^2 \| v \|^2 v = A^T A v.

Therefore, $v$ is an eigenvector of $A^T A$ . To obtain the right singular vector, we normalize $v$ :

v_1 = \frac{v}{\| v \|} = \frac{1}{\sqrt{n}} v.

Similarly, the left singular vector is obtained as:

u_1 = \frac{A v_1}{\sigma_1} = \frac{1}{\sqrt{n}} \cdot \frac{u v^T v}{\sqrt{(N - n + 2) n}} = \frac{\sqrt{n}}{\sqrt{(N - n + 2) n}} u.

Thus, the pseudoinverse is given by:

A^+ = V \Sigma^+ U^T = \frac{1}{\sqrt{n}} v \cdot \frac{1}{\sqrt{(N - n + 2) n}} \cdot \frac{\sqrt{n}}{\sqrt{(N - n + 2) n}} u^T = \frac{1}{(N - n + 2) n} A^T.

Finally, we can express $h$ as:

h = \frac{1}{(N - n + 2) n} A^T y = \frac{1}{(N - n + 2) n} \cdot (N - n + 2) \cdot v = \frac{1}{n} v.

Test Case: Constant Inputs and Outputs

The code implementation for this test case is as follows:

function test_case_constant(N::Int, n::Int)
    println("Test case (constant case)")
    u = ones(N - n + 2)
    v = ones(n)
    y = ones(N - n + 2)

    A = u * v'

    h_derived = (1 / n) * v

    h_movavg = movavg(u, y, n)

    println("h_derived: $h_derived")
    println("h_movavg: $h_movavg")
    println("Difference: ", h_derived - h_movavg')
    println("Test result: ", isapprox(h_derived, h_movavg', atol = sqrt(eps())))

    @test isapprox(h_derived, h_movavg', atol = sqrt(eps()))
end

Results

Test case (constant)
h_derived: [0.5, 0.5]
h_movavg: [0.4999999999999997 0.4999999999999997]
Difference: [2.7755575615628914e-16, 2.7755575615628914e-16]
Test result: true

Interpretation of Results

The results show that the derived solution and the one computed using the moving average filter are extremely close, with differences on the order of $10^{-16}$ , which is due to floating-point precision. Specifically:

h_{\text{derived}} = [0.5, 0.5], \quad h_{\text{movavg}} = [0.5, 0.5],

indicating that both methods yield nearly identical coefficients. The minimal difference can be attributed to the precision of the numerical computations. Therefore, we conclude that the solution distributes the outputs evenly across the coefficients, resembling a moving average filter.

This confirms that the approach works as expected for this special case, where the inputs and outputs are constant. The system's behavior leads to an even distribution of the output values across the coefficients, as expected from the problem's structure.

Source Code Description

In this section, we provide an overview of the source code implemented in the Julia file accompanying this document. We explain the logic behind the most important functions and their mathematical foundations.

The core functions, movavg(X, Y, n) and prediction(X, h), serve as the foundation for the entire implementation. Therefore, we first derive these two functions.

Moving Average Computation

The function movavg(X, Y, n) takes three parameters:

$X$ — the input data,
$Y$ — the corresponding output data,
$n$ — the number of past inputs each output depends on.

The key idea follows from the least squares solution using the Moore-Penrose pseudoinverse. The function ensures that the vectors $X$ , $Y$ , and $h$ are row vectors by using the transpose operator (') in Julia. We then construct the matrix $A \in \mathbb{R}^{(N - n + 2) \times n}$ as specified in Section 1 and extract only the valid outputs from $n$ to $N + 1$ . The coefficients $h$ are computed as:

h = A^{+} Y_{\text{valid}},

where $A^{+}$ denotes the Moore-Penrose pseudoinverse of $A$ . The function returns $h$ as a row vector. In essence, movavg estimates the coefficients when given input data and corresponding outputs. We assume that $X$ and $Y$ have the same length, $N + 1$ , with $n$ being much smaller than $N - n + 2$ (the number of available equations).

Prediction Function

The function prediction(X, h) follows a similar approach. However, instead of estimating coefficients, it assumes the coefficients are already known and stored in a row vector $h$ . Given an input vector $X$ , we construct the matrix $A$ in the same way as in movavg and compute the predicted outputs as:

y = A h.

The function returns the predicted values as a row vector.

Error Evaluation

To evaluate our predictions, we implement the evaluate_prediction(Y_pred, Y_test, n) function, which compares our predicted outputs against the actual outputs from a test set. We use the mean absolute error (MAE) as the metric for evaluation:

\text{MAE} = \frac{1}{N - n + 2} \sum_{i = n}^{N + 1} \left| Y_{\text{test}, i} - Y_{\text{pred}, i} \right|.

This function returns the computed MAE value.

Interpretation of Results

In this section, we summarize our findings by analyzing the effect of the parameter $n$ on the prediction accuracy. The model was tested on the datasets io-train.txt (training set) and io-test.txt (test set) for various values of $n$ . The primary metric used for evaluation is the Mean Absolute Error (MAE), which measures the average deviation between predicted and actual values.

Impact of $n$ on Prediction Accuracy

The choice of $n$ directly influences the performance of the model. Our observations indicate that:

For very small values of $n$ (e.g., $n = 1, 2, 3$ ), the MAE is quite high, suggesting poor predictive performance.
There is a noticeable drop in MAE at intermediate values of $n$ , reaching a local minimum around $n \approx 10$ .
As $n$ increases further, the MAE stabilizes and converges to approximately $4.2$ .

To illustrate this behavior, we present two tables: the first focusing on small values of $n$ , and the second capturing the overall trend up to $n = 100$ .

MAE for Small Values of $n$

$n$	MAE
1	4.316
2	5.208
3	4.420
5	58.361
10	1.714

Long-Term Behavior of MAE

$n$	MAE
10	1.714
20	3.707
30	3.916
50	4.469
70	4.421
100	4.270

As observed in the second table, the MAE fluctuates initially but eventually stabilizes around $4.2$ , indicating diminishing returns for large $n$ . This suggests that increasing $n$ beyond a certain point does not significantly improve prediction accuracy and may introduce noise or overfitting.

These figures confirm the observed trends and provide a visual understanding of the model’s performance across different values of $n$ . As a conclusion, selecting an optimal $n$ is crucial; while very small values perform poorly, excessively large values offer diminishing returns and may lead to overfitting.

Datasets

📎 io-train.txt

📎 io-test.txt

📎 code.jl

Moving Average Model with n Delays

Problem Description

Analysis of the Special Case: Constant Inputs and Outputs

Test Case: Constant Inputs and Outputs

Results

Interpretation of Results

Source Code Description

Moving Average Computation

Prediction Function

Error Evaluation

Interpretation of Results

Impact of nnn on Prediction Accuracy

MAE for Small Values of nnn

Long-Term Behavior of MAE

Datasets

Impact of $n$ on Prediction Accuracy

MAE for Small Values of $n$