Problem Description
We consider a system with a single input and a single output. The output of the system depends on past inputs received at different times . We express this as:
where , for all . We are given inputs and corresponding outputs . However, not all outputs can be represented by the above formula. We refer to the outputs that can be represented as valid outputs. A valid output requires exactly consecutive past inputs, as specified by the formula.
The number of valid outputs is:
Thus, we obtain a system of equations. Assuming that , we interpret this as an overdetermined system, meaning we have more equations than unknowns. Such a system can have either zero, one, or infinitely many solutions, with the first case being the most common.
Our objective is to minimize the sum of squared errors between predicted and actual outputs:
where the solution is given by the least-squares estimate:
The system can be expressed more concisely as:
where:
- is the system matrix,
- is the vector of unknowns,
- is the right-hand side vector.
Since we will later work in the Julia programming language, we adopt 1-based indexing for the following representation:
If is known, valid outputs can be predicted from the input sequence using the original formula (1). However, only the valid outputs (which have sufficient past data) can be predicted.
Analysis of the Special Case: Constant Inputs and Outputs
Suppose that all inputs and outputs are constant, i.e., . We aim to find the solution for arbitrary and . The system can be represented as:
where:
- is an matrix of ones,
- is an vector,
- is an vector of ones.
We observe that each row of imposes the same constraint and remember we seek coefficients that minimize the sum of squared errors:
We know that the Moore-Penrose pseudoinverse minimizes this quantity (see proof in source, page 37).
Since every row of is identical, we can express it as the outer product of two vectors:
where:
- ,
- ,
- .
Using Singular Value Decomposition (SVD), we express as:
with the pseudoinverse given by:
To find , we compute:
which has a single nonzero eigenvalue:
Thus, the only nonzero singular value is:
Next, we use the definition of the eigenvalue equation where is any nonzero vector:
Now let , for which we know it is a nonzero vector full of ones:
Therefore, is an eigenvector of . To obtain the right singular vector, we normalize :
Similarly, the left singular vector is obtained as:
Thus, the pseudoinverse is given by:
Finally, we can express as:
Test Case: Constant Inputs and Outputs
The code implementation for this test case is as follows:
function test_case_constant(N::Int, n::Int)
println("Test case (constant case)")
u = ones(N - n + 2)
v = ones(n)
y = ones(N - n + 2)
A = u * v'
h_derived = (1 / n) * v
h_movavg = movavg(u, y, n)
println("h_derived: $h_derived")
println("h_movavg: $h_movavg")
println("Difference: ", h_derived - h_movavg')
println("Test result: ", isapprox(h_derived, h_movavg', atol = sqrt(eps())))
@test isapprox(h_derived, h_movavg', atol = sqrt(eps()))
end
Results
Test case (constant)
h_derived: [0.5, 0.5]
h_movavg: [0.4999999999999997 0.4999999999999997]
Difference: [2.7755575615628914e-16, 2.7755575615628914e-16]
Test result: true
Interpretation of Results
The results show that the derived solution and the one computed using the moving average filter are extremely close, with differences on the order of , which is due to floating-point precision. Specifically:
indicating that both methods yield nearly identical coefficients. The minimal difference can be attributed to the precision of the numerical computations. Therefore, we conclude that the solution distributes the outputs evenly across the coefficients, resembling a moving average filter.
This confirms that the approach works as expected for this special case, where the inputs and outputs are constant. The system's behavior leads to an even distribution of the output values across the coefficients, as expected from the problem's structure.
Source Code Description
In this section, we provide an overview of the source code implemented in the Julia file accompanying this document. We explain the logic behind the most important functions and their mathematical foundations.
The core functions, movavg(X, Y, n)
and prediction(X, h)
, serve as the foundation for the entire implementation. Therefore, we first derive these two functions.
Moving Average Computation
The function movavg(X, Y, n)
takes three parameters:
- โ the input data,
- โ the corresponding output data,
- โ the number of past inputs each output depends on.
The key idea follows from the least squares solution using the Moore-Penrose pseudoinverse. The function ensures that the vectors , , and are row vectors by using the transpose operator ('
) in Julia. We then construct the matrix as specified in Section 1 and extract only the valid outputs from to . The coefficients are computed as:
where denotes the Moore-Penrose pseudoinverse of . The function returns as a row vector. In essence, movavg
estimates the coefficients when given input data and corresponding outputs. We assume that and have the same length, , with being much smaller than (the number of available equations).
Prediction Function
The function prediction(X, h)
follows a similar approach. However, instead of estimating coefficients, it assumes the coefficients are already known and stored in a row vector . Given an input vector , we construct the matrix in the same way as in movavg
and compute the predicted outputs as:
The function returns the predicted values as a row vector.
Error Evaluation
To evaluate our predictions, we implement the evaluate_prediction(Y_pred, Y_test, n)
function, which compares our predicted outputs against the actual outputs from a test set. We use the mean absolute error (MAE) as the metric for evaluation:
This function returns the computed MAE value.
Interpretation of Results
In this section, we summarize our findings by analyzing the effect of the parameter on the prediction accuracy. The model was tested on the datasets io-train.txt
(training set) and io-test.txt
(test set) for various values of . The primary metric used for evaluation is the Mean Absolute Error (MAE), which measures the average deviation between predicted and actual values.
Impact of on Prediction Accuracy
The choice of directly influences the performance of the model. Our observations indicate that:
- For very small values of (e.g., ), the MAE is quite high, suggesting poor predictive performance.
- There is a noticeable drop in MAE at intermediate values of , reaching a local minimum around .
- As increases further, the MAE stabilizes and converges to approximately .
To illustrate this behavior, we present two tables: the first focusing on small values of , and the second capturing the overall trend up to .
MAE for Small Values of
MAE | |
---|---|
1 | 4.316 |
2 | 5.208 |
3 | 4.420 |
5 | 58.361 |
10 | 1.714 |
Long-Term Behavior of MAE
MAE | |
---|---|
10 | 1.714 |
20 | 3.707 |
30 | 3.916 |
50 | 4.469 |
70 | 4.421 |
100 | 4.270 |
As observed in the second table, the MAE fluctuates initially but eventually stabilizes around , indicating diminishing returns for large . This suggests that increasing beyond a certain point does not significantly improve prediction accuracy and may introduce noise or overfitting.
These figures confirm the observed trends and provide a visual understanding of the modelโs performance across different values of . As a conclusion, selecting an optimal is crucial; while very small values perform poorly, excessively large values offer diminishing returns and may lead to overfitting.