17.2.1 Least-squares polynomial approximation
The fitpoly command is used
for replacing tabular data or a function by a polynomial.
Smoothing data with polynomial regression.
fitpoly can be used for finding a polynomial that fits
a given tabular data.
-
fitpoly takes one mandatory argument and two optional arguments:
-
data, a two-column matrix or a sequence of two lists,
representing the x- and y-values of data points
(x0,y0),(x1,y1),…,(xm,ym).
- Optionally, x[=a..b] or a..b,
which is a variable x and optionally its domain [a,b] or a segment [a,b]
(by default, unset).
- Optionally, a sequence of options, each of which may be one of:
-
limit=N, a nonnegative integer,
which is the maximal degree for the resulting polynomial (by default, N=15).
- degree=d, where d is a nonnegative integer,
which is the desired degree of the resulting polynomial (by default, unset).
- threshold=tol, a positive real number,
which is used for finding an appropriate fitting degree,
as described below
(by default, threshold=0.01, ignored when d is set).
- length=l, a positive integer,
which is used for finding an appropriate fitting degree,
as described below (by default, length=5, ignored when d is set).
- If d is set, then fitpoly returns the
polynomial of degree min{d,N} which best fits the data
in the least-squares sense.
- If d is not set,
then a polynomial pn of modest degree n≤ N but a good error suppression
(thus representing the trend of the data)
is chosen using tol and l
such that raising the degree does not make a significant improvement to
the approximation. Precisely, n is the first nonnegative integer such that
stddev[σn2,σn+12,…,σn+l−12] |
|
mean[σ02,σ12,…,σn+l−12] |
| ≤tol, |
where σk2 is the sum of squared residuals ∑i=0m(yi−pk(xi))2
and pk is the best-fitting polynomial of degree k. Ideally, n is the
smallest degree for which σn2≈σn+12≈σn+22≈⋯, meaning that
higher-order polynomials would overfit on the noise.
- Lowering the parameter length, as well as increasing
threshold, would make the algorithm more greedy,
effectively lowering the degree of the output polynomial. Note that when
degree is set, these two parameters are ignored.
- If no variable is specified, then fitpoly returns the list of
polynomial coefficients.
- If a segment [a,b] is given, then only the data points (xk,yk) for which
a≤ xk≤ b are considered.
Example
We find a polynomial which best approximates data infected by noise.
Data is obtained by evaluating the polynomial f(x)=1/100x4−1/5x3+6x
at 100 points between x=1 and x=20.8. The noise is generated by a random,
normally-distributed variable.
f(x):=x^4/100-x^3/5+6x:;
X:=linspace(1.0,20.8):;
Y:=apply(f,x)+randvector(100,randvar(normal,mean=0,stddev=10)):;
p:=fitpoly(X,Y) |
|
[0.011665,−0.266955,0.844543,2.44967,3.94846]
| | | | | | | | | | |
|
We obtain the list of polynomial coefficients, starting with the leading coefficient.
Note that the polynomial of degree 4 is returned (it has five coefficients),
which is, in this case, obviously optimal for data smoothing.
To visualize the fit, enter:
scatterplot(X,Y); plot(r2e(p,x),x=0..20,color=blue+line_width_2) |
When approximating only the data for 1≤ x≤ 10, you obtain a polynomial of
9th degree. This is called overfitting. Namely, data curvature
at the restriction is small, which makes the noise more prominent.
p1:=fitpoly(X,Y,x=1..10):; degree(p1) |
To make the approximating polynomial less sensitive to noise,
you can increase the threshold value tol.
p2:=fitpoly(X,Y,x=1..10,threshold=0.05) |
|
−0.256298728063 x3+2.89166276754 x2−9.93723278591 x+19.4201078899
| | | | | | | | | | |
|
(Alternatively, you could set the parameter length to a smaller value, e.g. 3.)
The command lines below show the curves p1 and p2 against data points
(the overfitted one is drawn in grey).
scatterplot(X,Y);
plot(p1,x=1..10,color=grey);
plot(p2,x=1..10,color=blue+line_width_2) |
Approximating functions by polynomials.
fitpoly can be used for approximating a continuous function
f:[a,b]→ℝ by a polynomial pn of certain degree n which
minimizes the error in the sense of ℓ2 norm.
-
fitpoly takes two mandatory arguments and one optional argument:
-
f, an univariate symbolic expression representing the function to be approximated.
- An interval a..b, which is the domain [a,b] of f.
- Optionally, opts, a sequence of options each of which may be one of:
-
limit=N, as before.
- degree=d, as before.
- ε or threshold=ε,
which is a positive real number such that for the resulting polynomial pn
the following holds:
||f−pn||2= | ∫ | | (f(x)−pn(x))2dx≤ε
(1) |
(by default, ε=10−8). This parameter is ignored when d is set.
- fitpoly(f,a..b ⟨,opts ⟩)
finds pn with smallest n≤ N such that (1) holds.
If such n does not exist, then pN is returned.
- If the parameter d is set, then pd which minimizes ||f−pd|| among all
polynomials of degree d is returned.
- Polynomial approximation is fast and robust even for large d and N resp. for
small ε.
Examples
fitpoly(cos(x),0..pi/2,1e-2) |
|
−0.244320054366 x3−0.336364730985 x2+1.13004492821 x−0.0107731059645
| | | | | | | | | | |
|
f:=exp(-7x)*5x:;
g:=fitpoly(f,0..1,degree=8) |
| | −21.717636069 x8+107.930784832 x7−232.831655404 x6 | | | | | | | | | |
| +286.778708741 x5−222.236631985 x4+111.004732684 x3 | | | | | | | | | |
| −33.8769709361 x2+4.95239715728 x+0.000500886757642
| | | | | | | | | |
|
The mean absolute error of the above approximation can be estimated as follows:
sqrt(romberg((f-g)^2,x=0..1)) |