# METHOD OF LEAST SQUARES || CURVE-FITTING

*What is empirical law*?

In various branches of Applied Mathematics, it is required to express a given data obtained from observations, in the form of a law connecting the two variables involved. Such a law inferred by some scheme, is known as the *empirical law.*

**Example: **If we need to obtain a law connecting the length and the temperature of a metal bar. The length of metal bar is measured at various temperatures. Then by different methods (Curve fitting, Scatter diagram, etc), a law is obtained that represents the relationship existing between temperature and length of metal bar for the observed values. Then this relationship can be used to predict the length at an arbitrary temperature.

*What is curve fitting?*

To find a relationship between the set of paired observations (say) x and y, we plot their corresponding values on the graph, taking one of the values along x-axis and other along the y-axis. To find the equation of the curve of ‘best fit’ which may be the most suitable for predicting the unknown values. The process of finding such an equation of ‘best fit’ is known as **curve-fitting.**

*Principle of Least Squares.*

The Principle of Least Squares was suggested by a * French Mathematician Adrien Marie Legendre in 1806.* It states that ‘ The curve of best fit is that for which e’s (errors) are as small as possible i.e., the sum of the squares of the errors is a minimum.

The principle of least squares, provides an elegant procedure of fitting a unique curve to a given data.

Let the curve **y=a + bx+ cx _{2} + …….+kx_{m }…………..(1)**

be fitted to the set of data points

**(x**_{1}, y_{1}), (x_{2}, y_{2}), ………, (x_{n}, y_{n}).Now we have to determine the constants

*such that it represents the curve of best fit. In case*

**a, b, c, …., k***on substituting the values*

**n=m,***in*

**(x**_{i}, y_{i})**(1),**we get ‘

**equations from which a unique set of ‘**

*n’***constants can be found. But when**

*n’**, we obtain*

**n>m****equations which are more than the**

*n**constants and hence cannot be solved for these constants. So we try to determine the values of*

**m***which satisfy all the equations as nearly as possible and thus may give the best fit. In such cases, we apply the principle of least squares.*

**a, b, c, ….., k**At ** x=x_{i}**, the observed(experimental) value of the ordinate is

**and the corresponding value on the fitting curve (1) is**

*y*_{i}*which is the expected (or calculated) value (see figure). The difference fo the observed and the expected values i.e.,*

**a + bx**_{i}+ cx_{i}^{2}+ …….+kx_{i}^{m }(=Ƞ_{i}, say)*is called the error at*

**y**_{i}– Ƞ_{i }(= e_{i})*. Clearly some of the errors*

**x=x**_{i}

**e**_{1}, e_{2}, ….., e_{n}_{ }will be positive and others negative. Thus to make the sign of each error equal, we square each of them and form their sum i.e.

*E= e*_{1}^{2}+ e_{2}^{2 }+ … + e_{n}^{2}.So when E is minimum the curve is the curve of

**‘best fit’.****Method of Least squares**.

Now we learn how to use Least squares method, suppose it is required to fit the curve** y= a + bx + cx2**

to a given set of observations** (x _{1},y_{1}), (x_{2},y_{2}), …. , (x_{5},y_{5}). **

For any

*x*_{i}, the observed value is

**and the expected value is**

*y*_{i}

**Ƞ**_{i }= a + bx_{i }+ cx_{i}so that the errors

**e**_{i }= y_{i }– Ƞ_{i}.Therefore, The sum of the squares of these errors is

**E = e**_{1}^{2}+ e_{2}^{2}+ … + e_{5}^{2}

*= [y*_{1}-(a + bx_{1}+ cx_{1}^{2})]^{2}+ [y_{2}-(a + bx_{2}+ cx_{2}^{2})]^{2}+ ……… + [y_{5}-(a + bx_{5}+ cx_{5}^{2})]^{2}For E to be minimum, we have

Equation ** (1) **simplifies to

**y**_{1}+ y_{2 }+ … + y_{5}= 5a + b(x_{1}+ x_{2}+ … + x_{5}) + c( x_{1}^{2 }+ x_{2}^{2 }+ … + x_{5}^{2})

**Σy**_{i}= 5a + b Σx_{i}+ c Σx_{i}^{2 }………………..(4)similarly (2) and (3) becomes

**Σx**

Σx_{i}y_{i}= aΣx_{i}+ bΣx_{i}^{2}+ cΣx_{i}^{3}………………..(5)Σx

_{i}^{2}y_{i}= aΣx_{i}^{2}+ bΣx_{i}^{3}+ cΣx_{i}^{4}………………..(6)The equations

**are known as**

*(4), (5) and (6)**and can be solved as simultaneous equations in*

**Normal equations***. The values of these constants when substituted in (1) give the desired curve of best fit.*

**a, b, c**