The least squares method is based on minimization. Least Squares in Excel

It has many applications, as it allows an approximate representation of a given function by other simpler ones. LSM can be extremely useful in processing observations, and it is actively used to estimate some quantities from the results of measurements of others containing random errors. In this article, you will learn how to implement calculations using the method least squares in Excel.

Statement of the problem on a specific example

Suppose there are two indicators X and Y. Moreover, Y depends on X. Since OLS is of interest to us from the point of view of regression analysis (in Excel, its methods are implemented using built-in functions), we should immediately proceed to consider a specific problem.

So, let X be the selling area of a grocery store, measured in square meters, and Y is the annual turnover, defined in millions of rubles.

It is required to make a forecast of what turnover (Y) the store will have if it has one or another retail space. Obviously, the function Y = f (X) is increasing, since the hypermarket sells more goods than the stall.

A few words about the correctness of the initial data used for prediction

Let's say we have a table built with data for n stores.

According to mathematical statistics, the results will be more or less correct if the data on at least 5-6 objects are examined. Also, "anomalous" results cannot be used. In particular, an elite small boutique can have a turnover many times greater than the turnover of large outlets of the “masmarket” class.

The essence of the method

The table data can be displayed on the Cartesian plane as points M 1 (x 1, y 1), ... M n (x n, y n). Now the solution of the problem will be reduced to the selection of an approximating function y = f (x), which has a graph passing as close as possible to the points M 1, M 2, .. M n .

Of course, you can use a high degree polynomial, but this option is not only difficult to implement, but simply incorrect, since it will not reflect the main trend that needs to be detected. The most reasonable solution is to search for a straight line y = ax + b, which best approximates the experimental data, and more precisely, the coefficients - a and b.

Accuracy score

For any approximation, the assessment of its accuracy is of particular importance. Denote by e i the difference (deviation) between the functional and experimental values for the point x i , i.e. e i = y i - f (x i).

Obviously, to assess the accuracy of the approximation, you can use the sum of the deviations, i.e., when choosing a straight line for an approximate representation of the dependence of X on Y, preference should be given to the one that has the smallest value of the sum e i at all points under consideration. However, not everything is so simple, since along with positive deviations, there will practically be negative ones.

You can solve the problem using the deviation modules or their squares. The latter method is the most widely used. It is used in many areas, including regression analysis (in Excel, its implementation is carried out using two built-in functions), and has long been proven to be effective.

Least square method

In Excel, as you know, there is a built-in autosum function that allows you to calculate the values of all values located in the selected range. Thus, nothing will prevent us from calculating the value of the expression (e 1 2 + e 2 2 + e 3 2 + ... e n 2).

In mathematical notation, this looks like:

Since the decision was initially made to approximate using a straight line, we have:

Thus, the task of finding a straight line that best describes a specific relationship between X and Y amounts to calculating the minimum of a function of two variables:

This requires equating to zero partial derivatives with respect to new variables a and b, and solving a primitive system consisting of two equations with 2 unknowns of the form:

After simple transformations, including dividing by 2 and manipulating the sums, we get:

Solving it, for example, by Cramer's method, we obtain a stationary point with certain coefficients a * and b * . This is the minimum, i.e. to predict what turnover the store will have for a certain area, the straight line y = a * x + b * is suitable, which is a regression model for the example in question. Of course, it will not allow you to find the exact result, but it will help you get an idea of \u200b\u200bwhether buying a store on credit for a particular area will pay off.

How to implement the least squares method in Excel

Excel has a function for calculating the value of the least squares. It has the following form: TREND (known Y values; known X values; new X values; constant). Let's apply the formula for calculating the OLS in Excel to our table.

To do this, in the cell in which the result of the calculation using the least squares method in Excel should be displayed, enter the “=” sign and select the “TREND” function. In the window that opens, fill in the appropriate fields, highlighting:

range of known values for Y (in this case data for turnover);
range x 1 , …x n , i.e. the size of retail space;
and known and unknown values of x, for which you need to find out the size of the turnover (for information about their location on the worksheet, see below).

In addition, there is a logical variable "Const" in the formula. If you enter 1 in the field corresponding to it, then this will mean that calculations should be carried out, assuming that b \u003d 0.

If you need to know the forecast for more than one x value, then after entering the formula, you should not press "Enter", but you need to type the combination "Shift" + "Control" + "Enter" ("Enter") on the keyboard.

Some Features

Regression analysis can be accessible even to dummies. The Excel formula for predicting the value of an array of unknown variables - "TREND" - can be used even by those who have never heard of the least squares method. It is enough just to know some features of its work. In particular:

If you arrange the range of known values of the variable y in one row or column, then each row (column) with known values of x will be perceived by the program as a separate variable.
If the range with known x is not specified in the TREND window, then in the case of using the function in Excel, the program will consider it as an array consisting of integers, the number of which corresponds to the range with the given values of the variable y.
To output an array of "predicted" values, the trend expression must be entered as an array formula.
If no new x values are specified, then the TREND function considers them equal to the known ones. If they are not specified, then array 1 is taken as an argument; 2; 3; 4;…, which is commensurate with the range with already given parameters y.
The range containing the new x values must have the same or more rows or columns as the range with the given y values. In other words, it must be proportionate to the independent variables.
An array with known x values can contain multiple variables. However, if we are talking about only one, then it is required that the ranges with the given values of x and y be commensurate. In the case of several variables, it is necessary that the range with the given y values fit in one column or one row.

FORECAST function

It is implemented using several functions. One of them is called "PREDICTION". It is similar to TREND, i.e. it gives the result of calculations using the least squares method. However, only for one X, for which the value of Y is unknown.

Now you know the Excel formulas for dummies that allow you to predict the value of the future value of an indicator according to a linear trend.

3. Approximation of functions using the method

least squares

The least squares method is used when processing the results of the experiment for approximations (approximations) experimental data analytical formula. The specific form of the formula is chosen, as a rule, from physical considerations. These formulas can be:

and others.

The essence of the least squares method is as follows. Let the measurement results be presented in the table:

Table 4
				x n
				y n

(3.1)

where f is a known function, a 0 , a 1 , …, a m - unknown constant parameters, the values of which must be found. In the least squares method, the approximation of function (3.1) to the experimental dependence is considered to be the best if the condition

(3.2)

that is amounts a squared deviations of the desired analytical function from the experimental dependence should be minimal .

Note that the function Q called inviscid.

Since the discrepancy

then it has a minimum. A necessary condition for the minimum of a function of several variables is the equality to zero of all partial derivatives of this function with respect to the parameters. Thus, finding the best values of the parameters of the approximating function (3.1), that is, those values for which Q = Q (a 0 , a 1 , …, a m ) is minimal, reduces to solving the system of equations:

(3.3)

The method of least squares can be given the following geometric interpretation: among an infinite family of lines of a given type, one line is found for which the sum of the squared differences in the ordinates of the experimental points and the corresponding ordinates of the points found by the equation of this line will be the smallest.

Finding the parameters of a linear function

Let the experimental data be represented by a linear function:

It is required to choose such values a and b , for which the function

(3.4)

will be minimal. The necessary conditions for the minimum of the function (3.4) are reduced to the system of equations:

After transformations, we obtain a system of two linear equations with two unknowns:

(3.5)

solving which , we find the desired values of the parameters a and b .

Finding the parameters of a quadratic function

If the approximating function is a quadratic dependence

then its parameters a , b , c find from the minimum condition of the function:

(3.6)

The minimum conditions for the function (3.6) are reduced to the system of equations:

After transformations, we obtain a system of three linear equations with three unknowns:

(3.7)

at solving which we find the desired values of the parameters a , b and c .

Example . Let the following table of values be obtained as a result of the experiment x and y :

Table 5

y i	0,705	0,495	0,426	0,357	0,368	0,406	0,549	0,768

It is required to approximate the experimental data by linear and quadratic functions.

Solution. Finding the parameters of the approximating functions reduces to solving systems of linear equations (3.5) and (3.7). To solve the problem, we use a spreadsheet processor excel.

1. First we link sheets 1 and 2. Enter the experimental values x i and y i into columns A and B, starting from the second line (in the first line we put the column headings). Then we calculate the sums for these columns and put them in the tenth row.

In columns C–G place the calculation and summation respectively

2. Unhook the sheets. Further calculations will be carried out in a similar way for the linear dependence on Sheet 1 and for the quadratic dependence on Sheet 2.

3. Under the resulting table, we form a matrix of coefficients and a column vector of free terms. Let's solve the system of linear equations according to the following algorithm:

To calculate the inverse matrix and multiply matrices, we use Master functions and functions MOBR and MUMNOZH.

4. In the cell block H2: H 9 based on the obtained coefficients, we calculate values of the approximating polynomialy i calc., in block I 2: I 9 - deviations D y i = y i exp. - y i calc., in column J - the discrepancy:

Tables obtained and built using Chart Wizards graphs are shown in figures 6, 7, 8.

Rice. 6. Table for calculating the coefficients of a linear function,

approximating experimental data.

Rice. 7. Table for calculating the coefficients of a quadratic function,

approximatingexperimental data.

Rice. 8. Graphical representation of the results of the approximation

experimental data linear and quadratic functions.

Answer. Approximate experimental data linear dependence y = 0,07881 x + 0,442262 with residual Q = 0,165167 and quadratic dependence y = 3,115476 x 2 – 5,2175 x + 2,529631 with residual Q = 0,002103 .

Tasks. Approximate the function given by tabular, linear and quadratic functions.

Table 6

№0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

3,030

3,142

3,358

3,463

3,772

3,251

3,170

3,665

№ 1

3,314

3,278

3,262

3,292

3,332

3,397

3,487

3,563

№ 2

1,045

1,162

1,264

1,172

1,070

0,898

0,656

0,344

№ 3

6,715

6,735

6,750

6,741

6,645

6,639

6,647

6,612

№ 4

2,325

2,515

2,638

2,700

2,696

2,626

2,491

2,291

№ 5

1.752

1,762

1,777

1,797

1,821

1,850

1,884

1,944

№ 6

1,924

1,710

1,525

1,370

1,264

1,190

1,148

1,127

№ 7

1,025

1,144

1,336

1,419

1,479

1,530

1,568

1,248

№ 8

5,785

5,685

5,605

5,545

5,505

5,480

5,495

5,510

№ 9

4,052

4,092

4,152

4,234

4,338

4,468

4,599

Least squares method (OLS, eng. Ordinary Least Squares, OLS) - mathematical method, used to solve various problems, based on minimizing the sum of squared deviations of some functions from the desired variables. It can be used to "solve" overdetermined systems of equations (when the number of equations exceeds the number of unknowns), to find a solution in the case of ordinary (not overdetermined) nonlinear systems of equations, to approximate the point values of some function. OLS is one of the basic methods of regression analysis for estimating unknown parameters of regression models from sample data.

Encyclopedic YouTube

1 / 5

✪ Least squares method. Topic

✪ Least squares, lesson 1/2. Linear function

✪ Econometrics. Lecture 5. Least squares method

✪ Mitin I. V. - Processing the results of physical. experiment - Least squares method (Lecture 4)

✪ Econometrics: The essence of the method of least squares #2

Subtitles

Story

Until the beginning of the XIX century. scientists did not have certain rules for solving a system of equations in which the number of unknowns is less than the number of equations; Until that time, particular methods were used, depending on the type of equations and on the ingenuity of the calculators, and therefore different calculators, starting from the same observational data, came to different conclusions. Gauss (1795) is credited with the first application of the method, and Legendre (1805) independently discovered and published it under its modern name (fr. Methode des moindres quarres) . Laplace connected the method with the theory of probabilities, and the American mathematician Adrain (1808) considered its probabilistic applications. The method is widespread and improved by further research by Encke, Bessel, Hansen and others.

The essence of the method of least squares

Let x (\displaystyle x)- kit n (\displaystyle n) unknown variables (parameters), f i (x) (\displaystyle f_(i)(x)), , m > n (\displaystyle m>n)- set of functions from this set of variables. The problem is to choose such values x (\displaystyle x) so that the values of these functions are as close as possible to some values y i (\displaystyle y_(i)). In essence, we are talking about the “solution” of the overdetermined system of equations f i (x) = y i (\displaystyle f_(i)(x)=y_(i)), i = 1 , … , m (\displaystyle i=1,\ldots ,m) in the indicated sense of the maximum proximity of the left and right parts systems. The essence of LSM is to choose as a "measure of proximity" the sum of the squared deviations of the left and right parts | f i (x) − y i | (\displaystyle |f_(i)(x)-y_(i)|). Thus, the essence of the LSM can be expressed as follows:

∑ i e i 2 = ∑ i (y i − f i (x)) 2 → min x (\displaystyle \sum _(i)e_(i)^(2)=\sum _(i)(y_(i)-f_( i)(x))^(2)\rightarrow \min _(x)).

If the system of equations has a solution, then the minimum of the sum of squares will be equal to zero and exact solutions of the system of equations can be found analytically or, for example, by various numerical optimization methods. If the system is overdetermined, that is, loosely speaking, the number of independent equations is greater than the number of unknown variables, then the system does not have an exact solution and the least squares method allows us to find some "optimal" vector x (\displaystyle x) in the sense of the maximum proximity of the vectors y (\displaystyle y) and f (x) (\displaystyle f(x)) or the maximum proximity of the deviation vector e (\displaystyle e) to zero (proximity is understood in the sense of Euclidean distance).

Example - system of linear equations

In particular, the least squares method can be used to "solve" the system of linear equations

A x = b (\displaystyle Ax=b),

where A (\displaystyle A) rectangular size matrix m × n , m > n (\displaystyle m\times n,m>n)(i.e. the number of rows of matrix A is greater than the number of required variables).

Such a system of equations generally has no solution. Therefore, this system can be "solved" only in the sense of choosing such a vector x (\displaystyle x) to minimize the "distance" between vectors A x (\displaystyle Ax) and b (\displaystyle b). To do this, you can apply the criterion for minimizing the sum of squared differences of the left and right parts of the equations of the system, that is (A x − b) T (A x − b) → min x (\displaystyle (Ax-b)^(T)(Ax-b)\rightarrow \min _(x)). It is easy to show that the solution of this minimization problem leads to the solution of the following system of equations

A T A x = A T b ⇒ x = (A T A) − 1 A T b (\displaystyle A^(T)Ax=A^(T)b\Rightarrow x=(A^(T)A)^(-1)A^ (T)b).

OLS in regression analysis (data approximation)

Let there be n (\displaystyle n) values of some variable y (\displaystyle y)(this may be the results of observations, experiments, etc.) and the corresponding variables x (\displaystyle x). The challenge is to make the relationship between y (\displaystyle y) and x (\displaystyle x) approximate by some function known up to some unknown parameters b (\displaystyle b), that is, actually find best values parameters b (\displaystyle b), maximally approximating the values f (x , b) (\displaystyle f(x,b)) to actual values y (\displaystyle y). In fact, this reduces to the case of "solution" of an overdetermined system of equations with respect to b (\displaystyle b):

F (x t , b) = y t , t = 1 , … , n (\displaystyle f(x_(t),b)=y_(t),t=1,\ldots ,n).

In regression analysis, and in particular in econometrics, probabilistic models of the relationship between variables are used.

Y t = f (x t , b) + ε t (\displaystyle y_(t)=f(x_(t),b)+\varepsilon _(t)),

where ε t (\displaystyle \varepsilon _(t))- so called random errors models.

Accordingly, the deviations of the observed values y (\displaystyle y) from model f (x , b) (\displaystyle f(x,b)) already assumed in the model itself. The essence of LSM (ordinary, classical) is to find such parameters b (\displaystyle b), at which the sum of squared deviations (errors, for regression models they are often called regression residuals) e t (\displaystyle e_(t)) will be minimal:

b ^ O L S = arg ⁡ min b R S S (b) (\displaystyle (\hat (b))_(OLS)=\arg \min _(b)RSS(b)),

where R S S (\displaystyle RSS)- English. Residual Sum of Squares is defined as:

R S S (b) = e T e = ∑ t = 1 n e t 2 = ∑ t = 1 n (y t − f (x t , b)) 2 (\displaystyle RSS(b)=e^(T)e=\sum _ (t=1)^(n)e_(t)^(2)=\sum _(t=1)^(n)(y_(t)-f(x_(t),b))^(2) ).

In the general case, this problem can be solved by numerical methods of optimization (minimization). In this case, one speaks of nonlinear least squares(NLS or NLLS - eng. Non-Linear Least Squares). In many cases, an analytical solution can be obtained. To solve the minimization problem, it is necessary to find the stationary points of the function R S S (b) (\displaystyle RSS(b)), differentiating it with respect to unknown parameters b (\displaystyle b), equating the derivatives to zero and solving the resulting system of equations:

∑ t = 1 n (y t − f (x t , b)) ∂ f (x t , b) ∂ b = 0 (\displaystyle \sum _(t=1)^(n)(y_(t)-f(x_ (t),b))(\frac (\partial f(x_(t),b))(\partial b))=0).

LSM in the case of linear regression

Let the regression dependence be linear:

y t = ∑ j = 1 k b j x t j + ε = x t T b + ε t (\displaystyle y_(t)=\sum _(j=1)^(k)b_(j)x_(tj)+\varepsilon =x_( t)^(T)b+\varepsilon _(t)).

Let y is the column vector of observations of the variable being explained, and X (\displaystyle X)- this is (n × k) (\displaystyle ((n\times k)))- matrix of observations of factors (rows of the matrix - vectors of factor values in a given observation, by columns - vector of values of a given factor in all observations). The matrix representation of the linear model has the form:

y = Xb + ε (\displaystyle y=Xb+\varepsilon ).

Then the vector of estimates of the explained variable and the vector of regression residuals will be equal to

y ^ = X b , e = y − y ^ = y − X b (\displaystyle (\hat (y))=Xb,\quad e=y-(\hat (y))=y-Xb).

accordingly, the sum of the squares of the regression residuals will be equal to

R S S = e T e = (y − X b) T (y − X b) (\displaystyle RSS=e^(T)e=(y-Xb)^(T)(y-Xb)).

Differentiating this function with respect to the parameter vector b (\displaystyle b) and equating the derivatives to zero, we obtain a system of equations (in matrix form):

(X T X) b = X T y (\displaystyle (X^(T)X)b=X^(T)y).

In the deciphered matrix form, this system of equations looks like this:

(∑ x t 1 2 ∑ x t 1 x t 2 ∑ x t 1 x t 3 … ∑ x t 1 x t k ∑ x t 2 x t 1 ∑ x t 2 2 ∑ x t 2 x t 3 … ∑ x t 2 x t k ∑ x t 3 x t 1 ∑ x t x t 3 2 … ∑ x t 3 x t k ⋮ ⋮ ⋮ ⋱ ⋮ ∑ x t k x t 1 ∑ x t k x t 2 ∑ x t k x t 3 … ∑ x t k 2) (b 1 b 2 b 3 ⋮ b k) = (∑ x t 1 y x t ∑ 3 y t ⋮ ∑ x t k y t) , (\displaystyle (\begin(pmatrix)\sum x_(t1)^(2)&\sum x_(t1)x_(t2)&\sum x_(t1)x_(t3)&\ldots &\sum x_(t1)x_(tk)\\\sum x_(t2)x_(t1)&\sum x_(t2)^(2)&\sum x_(t2)x_(t3)&\ldots &\ sum x_(t2)x_(tk)\\\sum x_(t3)x_(t1)&\sum x_(t3)x_(t2)&\sum x_(t3)^(2)&\ldots &\sum x_ (t3)x_(tk)\\\vdots &\vdots &\vdots &\ddots &\vdots \\\sum x_(tk)x_(t1)&\sum x_(tk)x_(t2)&\sum x_ (tk)x_(t3)&\ldots &\sum x_(tk)^(2)\\\end(pmatrix))(\begin(pmatrix)b_(1)\\b_(2)\\b_(3 )\\\vdots \\b_(k)\\\end(pmatrix))=(\begin(pmatrix)\sum x_(t1)y_(t)\\\sum x_(t2)y_(t)\\ \sum x_(t3)y_(t)\\\vdots \\\sum x_(tk)y_(t)\\\end(pmatrix))) where all sums are taken over all admissible values t (\displaystyle t).

If a constant is included in the model (as usual), then x t 1 = 1 (\displaystyle x_(t1)=1) for all t (\displaystyle t), therefore, in the upper left corner of the matrix of the system of equations is the number of observations n (\displaystyle n), and in the remaining elements of the first row and first column - just the sum of the values of the variables: ∑ x t j (\displaystyle \sum x_(tj)) and the first element of the right side of the system - ∑ y t (\displaystyle \sum y_(t)).

The solution of this system of equations gives the general formula for the least squares estimates for the linear model:

b ^ O L S = (X T X) − 1 X T y = (1 n X T X) − 1 1 n X T y = V x − 1 C x y (\displaystyle (\hat (b))_(OLS)=(X^(T )X)^(-1)X^(T)y=\left((\frac (1)(n))X^(T)X\right)^(-1)(\frac (1)(n ))X^(T)y=V_(x)^(-1)C_(xy)).

For analytical purposes, the last representation of this formula turns out to be useful (in the system of equations when divided by n, arithmetic means appear instead of sums). If the data in the regression model centered, then in this representation the first matrix has the meaning of the sample covariance matrix of factors, and the second one is the vector of covariances of factors with dependent variable. If, in addition, the data is also normalized at the SKO (that is, ultimately standardized), then the first matrix has the meaning of the sample correlation matrix of factors, the second vector - the vector of sample correlations of factors with the dependent variable.

An important property of LLS estimates for models with a constant- the line of the constructed regression passes through the center of gravity of the sample data, that is, the equality is fulfilled:

y ¯ = b 1 ^ + ∑ j = 2 k b ^ j x ¯ j (\displaystyle (\bar (y))=(\hat (b_(1)))+\sum _(j=2)^(k) (\hat (b))_(j)(\bar (x))_(j)).

In particular, in the extreme case, when the only regressor is a constant, we find that the OLS estimate of a single parameter (the constant itself) is equal to the mean value of the variable being explained. That is, the arithmetic mean, known for its good properties from the laws of large numbers, is also an least squares estimate - it satisfies the criterion for the minimum sum of squared deviations from it.

The simplest special cases

In the case of pairwise linear regression y t = a + b x t + ε t (\displaystyle y_(t)=a+bx_(t)+\varepsilon _(t)), when the linear dependence of one variable on another is estimated, the calculation formulas are simplified (you can do without matrix algebra). The system of equations has the form:

(1 x ¯ x ¯ x 2 ¯) (a b) = (y ¯ x y ¯) (\displaystyle (\begin(pmatrix)1&(\bar (x))\\(\bar (x))&(\bar (x^(2)))\\\end(pmatrix))(\begin(pmatrix)a\\b\\\end(pmatrix))=(\begin(pmatrix)(\bar (y))\\ (\overline(xy))\\\end(pmatrix))).

From here it is easy to find estimates for the coefficients:

( b ^ = Cov ⁡ (x , y) Var ⁡ (x) = x y ¯ − x ¯ y ¯ x 2 ¯ − x ¯ 2 , a ^ = y ¯ − b x ¯ . (\displaystyle (\begin(cases) (\hat (b))=(\frac (\mathop (\textrm (Cov)) (x,y))(\mathop (\textrm (Var)) (x)))=(\frac ((\overline (xy))-(\bar (x))(\bar (y)))((\overline (x^(2)))-(\overline (x))^(2))),\\( \hat (a))=(\bar (y))-b(\bar (x)).\end(cases)))

Despite the fact that, in general, models with a constant are preferable, in some cases it is known from theoretical considerations that the constant a (\displaystyle a) should be equal to zero. For example, in physics, the relationship between voltage and current has the form U = I ⋅ R (\displaystyle U=I\cdot R); measuring voltage and current, it is necessary to estimate the resistance. In this case, we are talking about a model y = b x (\displaystyle y=bx). In this case, instead of a system of equations, we have a single equation

(∑ x t 2) b = ∑ x t y t (\displaystyle \left(\sum x_(t)^(2)\right)b=\sum x_(t)y_(t)).

Therefore, the formula for estimating a single coefficient has the form

B ^ = ∑ t = 1 n x t y t ∑ t = 1 n x t 2 = x y ¯ x 2 ¯ (\displaystyle (\hat (b))=(\frac (\sum _(t=1)^(n)x_(t )y_(t))(\sum _(t=1)^(n)x_(t)^(2)))=(\frac (\overline (xy))(\overline (x^(2)) ))).

The case of a polynomial model

If the data is fitted by a polynomial regression function of one variable f (x) = b 0 + ∑ i = 1 k b i x i (\displaystyle f(x)=b_(0)+\sum \limits _(i=1)^(k)b_(i)x^(i)), then, perceiving degrees x i (\displaystyle x^(i)) as independent factors for each i (\displaystyle i) it is possible to estimate the parameters of the model based on the general formula for estimating the parameters of the linear model. To do this, it suffices to take into account in the general formula that with such an interpretation x t i x t j = x t i x t j = x t i + j (\displaystyle x_(ti)x_(tj)=x_(t)^(i)x_(t)^(j)=x_(t)^(i+j)) and x t j y t = x t j y t (\displaystyle x_(tj)y_(t)=x_(t)^(j)y_(t)). Therefore, the matrix equations in this case will take the form:

(n ∑ n x t ... ∑ n x t k ∑ n x t ∑ n x t 2 ... ∑ n x t k + 1 ⋮ ⋱ ⋮ ∑ n x t k ∑ n x t k + 1 ... ∑ n x t 2 k) [b 0 b 1 ⋮ b k] = [∑ n y t ∑ n x t y t ⋮ n x t k y t ] . (\displaystyle (\begin(pmatrix)n&\sum \limits _(n)x_(t)&\ldots &\sum \limits _(n)x_(t)^(k)\\\sum \limits _( n)x_(t)&\sum \limits _(n)x_(t)^(2)&\ldots &\sum \limits _(n)x_(t)^(k+1)\\\vdots & \vdots &\ddots &\vdots \\\sum \limits _(n)x_(t)^(k)&\sum \limits _(n)x_(t)^(k+1)&\ldots &\ sum \limits _(n)x_(t)^(2k)\end(pmatrix))(\begin(bmatrix)b_(0)\\b_(1)\\\vdots \\b_(k)\end( bmatrix))=(\begin(bmatrix)\sum \limits _(n)y_(t)\\\sum \limits _(n)x_(t)y_(t)\\\vdots \\\sum \limits _(n)x_(t)^(k)y_(t)\end(bmatrix)).)

Statistical Properties of OLS Estimates

First of all, we note that for linear models, the least squares estimates are linear estimates, as follows from the above formula. For the unbiasedness of the least squares estimates, it is necessary and sufficient to fulfill the most important condition of regression analysis: the mathematical expectation of a random error conditional on the factors must be equal to zero. This condition is satisfied, in particular, if

the mathematical expectation of random errors is zero, and
factors and random errors are independent random values.

The second condition - the condition of exogenous factors - is fundamental. If this property is not satisfied, then we can assume that almost any estimates will be extremely unsatisfactory: they will not even be consistent (that is, even a very large amount of data does not allow obtaining qualitative estimates in this case). In the classical case, a stronger assumption is made about the determinism of factors, in contrast to a random error, which automatically means that the exogenous condition is satisfied. In the general case, for the consistency of estimates, it is sufficient to satisfy the exogeneity condition together with the convergence of the matrix V x (\displaystyle V_(x)) to some nonsingular matrix as the sample size increases to infinity.

In order for, in addition to consistency and unbiasedness, the (ordinary) least squares estimates to be also effective (the best in the class of linear unbiased estimates), additional properties of a random error must be satisfied:

These assumptions can be formulated for the covariance matrix of the vector of random errors V (ε) = σ 2 I (\displaystyle V(\varepsilon)=\sigma ^(2)I).

A linear model that satisfies these conditions is called classical. OLS estimates for classical linear regression are unbiased, consistent and most efficient estimates in the class of all linear unbiased estimates (in English literature, the abbreviation is sometimes used blue (Best Linear Unbiased Estimator) is the best linear unbiased estimate; in domestic literature, the Gauss - Markov theorem is more often cited). As it is easy to show, the covariance matrix of the coefficient estimates vector will be equal to:

V (b ^ O L S) = σ 2 (X T X) − 1 (\displaystyle V((\hat (b))_(OLS))=\sigma ^(2)(X^(T)X)^(-1 )).

Efficiency means that this covariance matrix is "minimal" (any linear combination of coefficients, and in particular the coefficients themselves, have a minimum variance), that is, in the class of linear unbiased estimates, the OLS estimates are the best. The diagonal elements of this matrix - the variances of the estimates of the coefficients - are important parameters of the quality of the obtained estimates. However, it is not possible to calculate the covariance matrix because the random error variance is unknown. It can be proved that the unbiased and consistent (for the classical linear model) estimate of the variance of random errors is the value:

S 2 = R S S / (n − k) (\displaystyle s^(2)=RSS/(n-k)).

Substituting this value into the formula for the covariance matrix, we obtain an estimate of the covariance matrix. The resulting estimates are also unbiased and consistent. It is also important that the estimate of the error variance (and hence the variances of the coefficients) and the estimates of the model parameters are independent random variables, which makes it possible to obtain test statistics for testing hypotheses about the model coefficients.

It should be noted that if the classical assumptions are not met, the least squares parameter estimates are not the most efficient and, where W (\displaystyle W) is some symmetric positive definite weight matrix. Ordinary least squares is a special case of this approach, when the weight matrix is proportional to the identity matrix. As is known, for symmetric matrices (or operators) there is a decomposition W = P T P (\displaystyle W=P^(T)P). Therefore, this functional can be represented as follows e T P T P e = (P e) T P e = e ∗ T e ∗ (\displaystyle e^(T)P^(T)Pe=(Pe)^(T)Pe=e_(*)^(T)e_( *)), that is, this functional can be represented as the sum of the squares of some transformed "residuals". Thus, we can distinguish a class of least squares methods - LS-methods (Least Squares).

It is proved (Aitken's theorem) that for a generalized linear regression model (in which no restrictions are imposed on the covariance matrix of random errors), the most effective (in the class of linear unbiased estimates) are estimates of the so-called. generalized OLS (OMNK, GLS - Generalized Least Squares)- LS-method with a weight matrix equal to the inverse covariance matrix of random errors: W = V ε − 1 (\displaystyle W=V_(\varepsilon )^(-1)).

It can be shown that the formula for the GLS-estimates of the parameters of the linear model has the form

B ^ G L S = (X T V − 1 X) − 1 X T V − 1 y (\displaystyle (\hat (b))_(GLS)=(X^(T)V^(-1)X)^(-1) X^(T)V^(-1)y).

The covariance matrix of these estimates, respectively, will be equal to

V (b ^ G L S) = (X T V − 1 X) − 1 (\displaystyle V((\hat (b))_(GLS))=(X^(T)V^(-1)X)^(- one)).

In fact, the essence of the OLS lies in a certain (linear) transformation (P) of the original data and the application of the usual least squares to the transformed data. The purpose of this transformation is that for the transformed data, the random errors already satisfy the classical assumptions.

Weighted least squares

In the case of a diagonal weight matrix (and hence the covariance matrix of random errors), we have the so-called weighted least squares (WLS - Weighted Least Squares). In this case, the weighted sum of squares of the residuals of the model is minimized, that is, each observation receives a “weight” that is inversely proportional to the variance of the random error in this observation: e T W e = ∑ t = 1 n e t 2 σ t 2 (\displaystyle e^(T)We=\sum _(t=1)^(n)(\frac (e_(t)^(2))(\ sigma _(t)^(2)))). In fact, the data is transformed by weighting the observations (dividing by an amount proportional to the assumed standard deviation of the random errors), and normal least squares is applied to the weighted data.

ISBN 978-5-7749-0473-0.

Econometrics. Textbook / Ed. Eliseeva I. I. - 2nd ed. - M. : Finance and statistics, 2006. - 576 p. - ISBN 5-279-02786-3.

Alexandrova N.V. History of mathematical terms, concepts, designations: a dictionary-reference book. - 3rd ed. - M. : LKI, 2008. - 248 p. - ISBN 978-5-382-00839-4. I.V. Mitin, Rusakov V.S. Analysis and processing of experimental data - 5th edition - 24p.

After alignment, we get a function of the following form: g (x) = x + 1 3 + 1 .

We can approximate this data with a linear relationship y = a x + b by calculating the appropriate parameters. To do this, we will need to apply the so-called least squares method. You will also need to make a drawing to check which line will best align the experimental data.

What exactly is OLS (least squares method)

The main thing we need to do is to find such linear dependence coefficients at which the value of the function of two variables F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2 will be the smallest. In other words, for certain values of a and b, the sum of the squared deviations of the presented data from the resulting straight line will have a minimum value. This is the meaning of the least squares method. All we have to do to solve the example is to find the extremum of the function of two variables.

How to derive formulas for calculating coefficients

In order to derive formulas for calculating the coefficients, it is necessary to compose and solve a system of equations with two variables. To do this, we calculate the partial derivatives of the expression F (a , b) = ∑ i = 1 n (y i - (a x i + b)) 2 with respect to a and b and equate them to 0 .

δ F (a , b) δ a = 0 δ F (a , b) δ b = 0 ⇔ - 2 ∑ i = 1 n (y i - (a x i + b)) x i = 0 - 2 ∑ i = 1 n ( y i - (a x i + b)) = 0 ⇔ a ∑ i = 1 n x i 2 + b ∑ i = 1 n x i = ∑ i = 1 n x i y i a ∑ i = 1 n x i + ∑ i = 1 n b = ∑ i = 1 n y i ⇔ a ∑ i = 1 n x i 2 + b ∑ i = 1 n x i = ∑ i = 1 n x i y i a ∑ i = 1 n x i + n b = ∑ i = 1 n y i

To solve a system of equations, you can use any methods, such as substitution or Cramer's method. As a result, we should get formulas that calculate the coefficients using the least squares method.

n ∑ i = 1 n x i y i - ∑ i = 1 n x i ∑ i = 1 n y i n ∑ i = 1 n - ∑ i = 1 n x i 2 b = ∑ i = 1 n y i - a ∑ i = 1 n x i n

We have calculated the values of the variables for which the function
F (a , b) = ∑ i = 1 n (y i - (a x i + b)) 2 will take the minimum value. In the third paragraph, we will prove why it is so.

This is the application of the least squares method in practice. His formula, which is used to find the parameter a , includes ∑ i = 1 n x i , ∑ i = 1 n y i , ∑ i = 1 n x i y i , ∑ i = 1 n x i 2 , and the parameter
n - it denotes the amount of experimental data. We advise you to calculate each amount separately. The coefficient value b is calculated immediately after a .

Let's go back to the original example.

Example 1

Here we have n equal to five. To make it more convenient to calculate the required amounts included in the coefficient formulas, we fill out the table.

	i = 1	i = 2	i = 3	i = 4	i = 5	∑ i = 1 5
x i	0	1	2	4	5	12
y i	2 , 1	2 , 4	2 , 6	2 , 8	3	12 , 9
x i y i	0	2 , 4	5 , 2	11 , 2	15	33 , 8
x i 2	0	1	4	16	25	46

Solution

The fourth row contains the data obtained by multiplying the values from the second row by the values of the third for each individual i . The fifth line contains the data from the second squared. The last column shows the sums of the values of the individual rows.

Let's use the least squares method to calculate the coefficients a and b we need. To do this, substitute the desired values from the last column and calculate the sums:

n ∑ i = 1 n x i y i - ∑ i = 1 n x i ∑ i = 1 n y i n ∑ i = 1 n - ∑ i = 1 n x i 2 b = ∑ i = 1 n y i - a ∑ i = 1 n x i n ⇒ a = 5 33 , 8 - 12 12, 9 5 46 - 12 2 b = 12, 9 - a 12 5 ⇒ a ≈ 0, 165 b ≈ 2, 184

We got that the desired approximating straight line will look like y = 0 , 165 x + 2 , 184 . Now we need to determine which line will best approximate the data - g (x) = x + 1 3 + 1 or 0 , 165 x + 2 , 184 . Let's make an estimate using the least squares method.

To calculate the error, we need to find the sums of squared deviations of the data from the lines σ 1 = ∑ i = 1 n (y i - (a x i + b i)) 2 and σ 2 = ∑ i = 1 n (y i - g (x i)) 2 , the minimum value will correspond to a more suitable line.

σ 1 = ∑ i = 1 n (y i - (a x i + b i)) 2 = = ∑ i = 1 5 (y i - (0 , 165 x i + 2 , 184)) 2 ≈ 0 , 019 σ 2 = ∑ i = 1 n (y i - g (x i)) 2 = = ∑ i = 1 5 (y i - (x i + 1 3 + 1)) 2 ≈ 0 , 096

Answer: since σ 1< σ 2 , то прямой, наилучшим образом аппроксимирующей исходные данные, будет
y = 0 , 165 x + 2 , 184 .

The least squares method is clearly shown in the graphic illustration. The red line marks the straight line g (x) = x + 1 3 + 1, the blue line marks y = 0, 165 x + 2, 184. Raw data are marked with pink dots.

Let us explain why exactly approximations of this type are needed.

They can be used in problems that require data smoothing, as well as in those where the data needs to be interpolated or extrapolated. For example, in the problem discussed above, one could find the value of the observed quantity y at x = 3 or at x = 6 . We have devoted a separate article to such examples.

Proof of the LSM method

For the function to take the minimum value when a and b are calculated, it is necessary that at a given point the matrix of the quadratic form of the differential of the function of the form F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2 be positive definite. Let's show you how it should look.

Example 2

We have a second-order differential of the following form:

d 2 F (a ; b) = δ 2 F (a ; b) δ a 2 d 2 a + 2 δ 2 F (a ; b) δ a δ b d a d b + δ 2 F (a ; b) δ b 2 d 2b

Solution

δ 2 F (a ; b) δ a 2 = δ δ F (a ; b) δ a δ a = = δ - 2 ∑ i = 1 n (y i - (a x i + b)) x i δ a = 2 ∑ i = 1 n (x i) 2 δ 2 F (a ; b) δ a δ b = δ δ F (a ; b) δ a δ b = = δ - 2 ∑ i = 1 n (y i - (a x i + b) ) x i δ b = 2 ∑ i = 1 n x i δ 2 F (a ; b) δ b 2 = δ δ F (a ; b) δ b δ b = δ - 2 ∑ i = 1 n (y i - (a x i + b)) δ b = 2 ∑ i = 1 n (1) = 2 n

In other words, it can be written as follows: d 2 F (a ; b) = 2 ∑ i = 1 n (x i) 2 d 2 a + 2 2 ∑ x i i = 1 n d a d b + (2 n) d 2 b .

We have obtained a matrix of quadratic form M = 2 ∑ i = 1 n (x i) 2 2 ∑ i = 1 n x i 2 ∑ i = 1 n x i 2 n .

In this case, the values of individual elements will not change depending on a and b . Is this matrix positive definite? To answer this question, let's check if its angular minors are positive.

Calculate the first order angular minor: 2 ∑ i = 1 n (x i) 2 > 0 . Since the points x i do not coincide, the inequality is strict. We will keep this in mind in further calculations.

We calculate the second-order angular minor:

d e t (M) = 2 ∑ i = 1 n (x i) 2 2 ∑ i = 1 n x i 2 ∑ i = 1 n x i 2 n = 4 n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2

After that, we proceed to the proof of the inequality n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 > 0 using mathematical induction.

Let's check whether this inequality is valid for arbitrary n . Let's take 2 and calculate:

2 ∑ i = 1 2 (x i) 2 - ∑ i = 1 2 x i 2 = 2 x 1 2 + x 2 2 - x 1 + x 2 2 = = x 1 2 - 2 x 1 x 2 + x 2 2 = x 1 + x 2 2 > 0

We got the correct equality (if the values x 1 and x 2 do not match).

Let's make the assumption that this inequality will be true for n , i.e. n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 > 0 – true.
Now let's prove the validity for n + 1 , i.e. that (n + 1) ∑ i = 1 n + 1 (x i) 2 - ∑ i = 1 n + 1 x i 2 > 0 if n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 > 0 .

We calculate:

(n + 1) ∑ i = 1 n + 1 (x i) 2 - ∑ i = 1 n + 1 x i 2 = = (n + 1) ∑ i = 1 n (x i) 2 + x n + 1 2 - ∑ i = 1 n x i + x n + 1 2 = = n ∑ i = 1 n (x i) 2 + n x n + 1 2 + ∑ i = 1 n (x i) 2 + x n + 1 2 - - ∑ i = 1 n x i 2 + 2 x n + 1 ∑ i = 1 n x i + x n + 1 2 = = ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 + n x n + 1 2 - x n + 1 ∑ i = 1 n x i + ∑ i = 1 n (x i) 2 = = ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 + x n + 1 2 - 2 x n + 1 x 1 + x 1 2 + + x n + 1 2 - 2 x n + 1 x 2 + x 2 2 + . . . + x n + 1 2 - 2 x n + 1 x 1 + x n 2 = = n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 + + (x n + 1 - x 1) 2 + (x n + 1 - x 2) 2 + . . . + (x n - 1 - x n) 2 > 0

The expression enclosed in curly braces will be greater than 0 (based on what we assumed in step 2), and the rest of the terms will be greater than 0 because they are all squares of numbers. We have proven the inequality.

Answer: found a and b will match the smallest value functions F (a , b) \u003d ∑ i \u003d 1 n (y i - (a x i + b)) 2, which means that they are the desired parameters of the least squares method (LSM).

If you notice a mistake in the text, please highlight it and press Ctrl+Enter

Least square method

Least square method ( MNK, OLS, Ordinary Least Squares) - one of the basic methods of regression analysis for estimating unknown parameters of regression models from sample data. The method is based on minimizing the sum of squares of regression residuals.

It should be noted that the least squares method itself can be called a method for solving a problem in any area if the solution consists of or satisfies a certain criterion for minimizing the sum of squares of some functions of the unknown variables. Therefore, the least squares method can also be used for an approximate representation (approximation) of a given function by other (simpler) functions, when finding a set of quantities that satisfy equations or restrictions, the number of which exceeds the number of these quantities, etc.

The essence of the MNC

Let some (parametric) model of probabilistic (regression) dependence between the (explained) variable y and many factors (explanatory variables) x

where is the vector of unknown model parameters

- Random model error.

Let there also be sample observations of the values of the indicated variables. Let be the observation number (). Then are the values of the variables in the -th observation. Then, for given values of the parameters b, it is possible to calculate the theoretical (model) values of the explained variable y:

The value of the residuals depends on the values of the parameters b.

The essence of LSM (ordinary, classical) is to find such parameters b for which the sum of the squares of the residuals (eng. Residual Sum of Squares) will be minimal:

In the general case, this problem can be solved by numerical methods of optimization (minimization). In this case, one speaks of nonlinear least squares(NLS or NLLS - English. Non Linear Least Squares). In many cases, an analytical solution can be obtained. To solve the minimization problem, it is necessary to find the stationary points of the function by differentiating it with respect to the unknown parameters b, equating the derivatives to zero, and solving the resulting system of equations:

If the random errors of the model are normally distributed, have the same variance, and are not correlated with each other, the least squares parameter estimates are the same as the maximum likelihood method (MLM) estimates.

LSM in the case of a linear model

Let the regression dependence be linear:

Let y- column vector of observations of the explained variable, and - matrix of factor observations (rows of the matrix - vectors of factor values in a given observation, by columns - vector of values of a given factor in all observations). The matrix representation of the linear model has the form:

Then the vector of estimates of the explained variable and the vector of regression residuals will be equal to

accordingly, the sum of the squares of the regression residuals will be equal to

Differentiating this function with respect to the parameter vector and equating the derivatives to zero, we obtain a system of equations (in matrix form):

The solution of this system of equations gives the general formula for the least squares estimates for the linear model:

For analytical purposes, the last representation of this formula turns out to be useful. If the data in the regression model centered, then in this representation the first matrix has the meaning of the sample covariance matrix of factors, and the second one is the vector of covariances of factors with dependent variable. If, in addition, the data is also normalized at the SKO (that is, ultimately standardized), then the first matrix has the meaning of the sample correlation matrix of factors, the second vector - the vector of sample correlations of factors with the dependent variable.

Example: simple (pairwise) regression

In the case of paired linear regression, the calculation formulas are simplified (you can do without matrix algebra):

Properties of OLS estimates

First of all, we note that for linear models, the least squares estimates are linear estimates, as follows from the above formula. For unbiased OLS estimates, it is necessary and sufficient to fulfill the most important condition of regression analysis: the mathematical expectation of a random error conditional on the factors must be equal to zero. This condition is satisfied, in particular, if

the mathematical expectation of random errors is zero, and
factors and random errors are independent random variables.

The second condition - the condition of exogenous factors - is fundamental. If this property is not satisfied, then we can assume that almost any estimates will be extremely unsatisfactory: they will not even be consistent (that is, even a very large amount of data does not allow obtaining qualitative estimates in this case). In the classical case, a stronger assumption is made about the determinism of factors, in contrast to a random error, which automatically means that the exogenous condition is satisfied. In the general case, for the consistency of estimates, it is sufficient to fulfill the exogeneity condition together with the convergence of the matrix to some non-singular matrix with an increase in the sample size to infinity.

These assumptions can be formulated for the covariance matrix of the random error vector

A linear model that satisfies these conditions is called classical. OLS estimates for classical linear regression are unbiased, consistent and most efficient estimates in the class of all linear unbiased estimates (in English literature, the abbreviation is sometimes used blue (Best Linear Unbaised Estimator) is the best linear unbiased estimate; in domestic literature, the Gauss-Markov theorem is more often cited). As it is easy to show, the covariance matrix of the coefficient estimates vector will be equal to:

Generalized least squares

The method of least squares allows for a wide generalization. Instead of minimizing the sum of squares of the residuals, one can minimize some positive definite quadratic form of the residual vector , where is some symmetric positive definite weight matrix. Ordinary least squares is a special case of this approach, when the weight matrix is proportional to the identity matrix. As is known from the theory of symmetric matrices (or operators), there is a decomposition for such matrices. Therefore, the specified functional can be represented as follows, that is, this functional can be represented as the sum of the squares of some transformed "residuals". Thus, we can distinguish a class of least squares methods - LS-methods (Least Squares).

It can be shown that the formula for the GLS-estimates of the parameters of the linear model has the form

The covariance matrix of these estimates, respectively, will be equal to

Weighted least squares

In the case of a diagonal weight matrix (and hence the covariance matrix of random errors), we have the so-called weighted least squares (WLS - Weighted Least Squares). In this case, the weighted sum of squares of the residuals of the model is minimized, that is, each observation receives a "weight" that is inversely proportional to the variance of the random error in this observation: . In fact, the data is transformed by weighting the observations (dividing by an amount proportional to the assumed standard deviation of the random errors), and normal least squares is applied to the weighted data.

Some special cases of application of LSM in practice

Linear Approximation

Consider the case when, as a result of studying the dependence of a certain scalar quantity on a certain scalar quantity (This can be, for example, the dependence of voltage on current strength: , where is a constant value, the resistance of the conductor), these quantities were measured, as a result of which the values \u200b\u200band were obtained their corresponding values. Measurement data should be recorded in a table.

Table. Measurement results.

Measurement No.
1
2
3
4
5
6

The question sounds like this: what value of the coefficient can be chosen to best describe the dependence ? According to the least squares, this value should be such that the sum of the squared deviations of the values from the values

was minimal

The sum of squared deviations has one extremum - a minimum, which allows us to use this formula. Let's find the value of the coefficient from this formula. To do this, we transform its left side as follows:

The last formula allows us to find the value of the coefficient , which was required in the problem.

Story

Until the beginning of the XIX century. scientists did not have certain rules for solving a system of equations in which the number of unknowns is less than the number of equations; Until that time, particular methods were used, depending on the type of equations and on the ingenuity of the calculators, and therefore different calculators, starting from the same observational data, came to different conclusions. Gauss (1795) is credited with the first application of the method, and Legendre (1805) independently discovered and published it under its modern name (fr. Methode des moindres quarres ) . Laplace related the method to the theory of probability, and the American mathematician Adrain (1808) considered its probabilistic applications. The method is widespread and improved by further research by Encke, Bessel, Hansen and others.

Alternative use of MNCs

The idea of the least squares method can also be used in other cases not directly related to regression analysis. The fact is that the sum of squares is one of the most common proximity measures for vectors (the Euclidean metric in finite-dimensional spaces).

One application is "solving" systems of linear equations in which the number of equations is greater than the number of variables

where the matrix is not square, but rectangular.

Such a system of equations, in the general case, has no solution (if the rank is actually greater than the number of variables). Therefore, this system can be "solved" only in the sense of choosing such a vector in order to minimize the "distance" between the vectors and . To do this, you can apply the criterion for minimizing the sum of squared differences of the left and right parts of the equations of the system, that is, . It is easy to show that the solution of this minimization problem leads to the solution of the following system of equations