» Comparative analysis of econometric regression models. Specification of Paired Regression Models Specification of Paired Regression Models

Comparative analysis of econometric regression models. Specification of Paired Regression Models Specification of Paired Regression Models

One of the basic assumptions for constructing a qualitative model is the correct (good) specification of the regression equation. The correct specification of a regression equation means that it generally correctly reflects the relationship between the variable under study and the explanatory factors involved in the model. This is a necessary prerequisite for further qualitative evaluation of the regression model.

The wrong choice of functional form or set of explanatory variables is called specification errors, the main types of which are

  • 1. Discarding a significant variable. The essence of this error and its consequences are clearly illustrated by the following example. Let the theoretical model reflecting the considered economic dependence have the form

This model corresponds to the following empirical regression equation:

For some reason (lack of information, superficial knowledge about the subject of research, etc.), the researcher believes that the variable Y is really affected only by the variable X y It is limited to considering the model

At the same time, he does not consider the variable X 2 as an explanatory variable, making the mistake of discarding an essential variable.

Let the empirical regression equation corresponding to the theoretical equation (9.28) have the form

The consequences of this error are quite serious. Estimates obtained using LSM according to equation (9.29) are biased (M[y* 0 ] F b 0 , M[y*] F b d) and untenable even with an infinite number of trials. Therefore, possible interval estimates and the results of testing the corresponding hypotheses will be unreliable.

The consequences of this error will not be as serious as in the previous case. Estimates for y 0 , coefficients found for model (9.30) remain, as a rule, unbiased (M = b 0 , M[y* 1] = b 1) and wealthy. However, their accuracy will decrease, while increasing the standard errors, i.e., the estimates become inefficient, which will affect their stability. This conclusion logically follows from the calculation of the variances of the estimates of the regression coefficients for these equations:

Here rXiX2- correlation coefficient between explanatory variables X 1 and X 2.

Therefore, moreover, the equal sign is possible

only when

An increase in the variance of estimates can lead to erroneous results of testing hypotheses regarding the values ​​of the regression coefficients, an expansion of interval estimates.

3. Choosing the wrong functional form. The essence of the error is illustrated by the following example. Let the correct regression model have the form

Any other dependence with the same variables, but having a different functional form, leads to a distortion of the true dependence. For example, in the following equations

an error was made in choosing the wrong functional form of the regression equation. The consequences of this mistake will be very serious. Typically, such an error leads either to obtaining biased estimates or to a deterioration in the statistical properties of estimates of regression coefficients and other indicators of the quality of the equation. This is primarily due to the violation of the Gauss-Markov conditions for deviations. The predictive quality of the model in this case is very low.

When constructing regression equations, especially at the initial stages, specification errors are made quite often due to superficial knowledge about the studied economic processes, or due to insufficiently developed theory, or due to errors in the collection and processing of statistical data when constructing an empirical regression equation. It is important to be able to detect and correct these errors. The complexity of the detection procedure is determined by the type of error and our knowledge of the object under study.

If there is one non-significant variable in the regression equation, then it will reveal itself in a low t-statistic. In the future, this variable is excluded from consideration.

If there are several statistically insignificant explanatory variables in the equation, then another regression equation should be constructed without these insignificant variables. Then, using F-statistics, the coefficients of determination for the initial and additional regression equations are compared

where n is the number of observations;

ha is the number of explanatory variables in the original equation;

to-- the number of explanatory variables discarded from the original equation.

Possible reasoning and conclusions for this situation are given in section 6.7.2.

However, carrying out these checks makes sense only if the type (functional form) of the regression equation is correctly selected, which can be done if it is consistent with the theory. For example, when constructing a Phillips curve that establishes the relationship between wages Y and unemployment x, is the reverse. The following models are possible:

It should be noted that the choice of a model is by no means always carried out unambiguously, and in the future it is required to compare the model with both theoretical and empirical data and improve it. Recall that when determining the quality of a model, the following parameters are usually analyzed:

  • a) adjusted coefficient of determination R;
  • b) t-statistics;
  • c) Durbin-Watson statistics DW;
  • d) consistency of the signs of the coefficients with the theory;
  • e) predictive qualities (errors) of the model.

If all these indicators are satisfactory, then this model can be proposed to describe the real process under study. If any of the characteristics described above is not satisfactory, that is, there is reason to doubt the quality of this model (the functional form of the equation is incorrectly chosen; an important explanatory variable is not taken into account; there is an explanatory variable that does not have a significant effect on the dependent variable).

  • Adding an insignificant variable. In some cases, too many explanatory variables are included in the regression equations, and not always justifiably. For example, the theoretical model has the following form. Let the researcher replace it with a more complex model: while adding an explanatory variable X2 that does not have a real impact on Y. In this case, the error of adding an insignificant variable is made.

The basis of econometrics is the construction of an econometric model and the determination of the possibilities of using this model to describe the analysis and forecasting of real economic processes. The goals of the course project are the development of design solutions for information and methodological support of research in the field of econometric modeling, as well as obtaining practical skills in building and researching econometric models. The ultimate applied goal of econometric modeling of real socio-economic processes in this ...


Share work on social networks

If this work does not suit you, there is a list of similar works at the bottom of the page. You can also use the search button


MINISTRY OF EDUCATION AND SCIENCE OF RUSSIA

federal state budgetary educational institution

higher vocational education

"Tver State Technical University"

(TVGTU)

Institute of Additional Professional Education

Department of "Accounting, analysis and audit"

course project

By discipline: "Econometrics"

On the topic: "Comparative analysis of econometric regression models"

COMPLETED: 3rd year student

Institute of DPO and P

RBAiA-37-12 groups

Zamyatin

Christina Dmitrievna

(full name of the student)

CHECKED:

Konovalova A.S.

(Full name of teacher)

Rzhev 2015

INTRODUCTION

CHAPTER 1. ANALYTICAL PART

Fundamentals of econometric research of regression models.

Technology of econometric study of regression models.

CHAPTER 2. DESIGN PART

2.1 Information and methodological support

econometric research

Pair and multiple regression.

CONCLUSION

List of USED SOURCES

INTRODUCTION

Econometrics is a science whose subject of study is quantitative patterns and interdependencies in the economy based on methods mathematical statistics. The basis of econometrics is the construction of an econometric model and the determination of the possibilities of using this model to describe, analyze and predict real economic processes.

Econometric analysis is the basis of economic analysis and forecasting by creating the opportunity to make sound economic decisions.

In any field of economics, the activity of a specialist requires the use of modern methods of work based on econometric models, concepts and techniques.

As the subject of econometric research in the course project, the number of people who arrived in the EU countries for permanent residence was chosen. Migration processes are an extremely important factor for assessing the prospects for the development of society, therefore the relevance of the research topic determines the growth of the social significance of these processes in the modern world.

An economic study of migration processes is an essential factor in increasing the efficiency of countries' development. The history of human development is inextricably linked with changes in population dynamics. In Europe fast growth population is primarily due to socio-economic changes, i.e. follows economic growth and changes in the social sphere.

The objectives of the course project are the development of design solutions for information and methodological support of research in the field of econometric modeling, as well as obtaining practical skills in building and researching econometric models.

The objective of the course project is the practical use of knowledge and skills in the construction and study of econometric models for econometric data analysis.

The ultimate applied goal of econometric modeling of real socio-economic processes in this course project is the forecast of economic and socio-economic indicators that characterize the state and development of the analyzed system, that is, the determination of trends in migration processes in the EU countries and their dependence on the available factors taken into account when constructing an econometric models.

CHAPTER 1. ANALYTICAL PART

1.1. Fundamentals of econometric research of regression models.

An economic discipline concerned with the development and application statistical methods to measure relationships between econometric variables, is econometrics, which is a combination of economic theory, statistics and mathematics.

Econometric data are not the results of a controlled experiment. Econometrics deals with specific economic data and deals with the quantitative description of specific relationships, that is, it replaces coefficients presented in a general form with specific numerical values. In econometrics, special methods of analysis are developed to reduce the influence of measurement errors on the results obtained.

The main tool of econometrics is an econometric model, that is, a formalized description of quantitative relationships between variables. The modeling methodology contains great opportunities for self-development, since modeling is a cyclic process, each cycle can be followed by the next one, and knowledge about the object under study is expanded and refined, the original model is gradually improved. Deficiencies found after the previous modeling cycle, due to little knowledge of the object and errors in the model building, can be corrected in subsequent cycles.

There are three classes of econometric models:

Temporal data model;

Regression model with one equation;

System of simultaneous equations.

Classification of tasks solved with the help of the econometric model: 1) according to the final applied goals:

Forecast of econometric and socio-economic indicators characterizing the state and development of the analyzed system;

Imitation of possible scenarios for the socio-economic development of the system.

2) by hierarchy level:

Macro-level tasks (country as a whole);

Meso-level tasks (regions, industries, corporations);

Microlevel (family, enterprise, firm).

3) according to the profile of the econometric system, aimed at studying:

Market;

Investment, financial or social policy;

Pricing;

distribution relations;

Demand and consumption;

complex of problems.

Main stages of econometric modeling:

Stage 1 - staging. Determination of the final goals of the model, a set of factors and indicators participating in it, their role. The main objectives of the research are: analysis of the state and behavior of an economic object, forecast of its economic indicators, simulation of the development of an object, development of management decisions.

Stage 2 - a priori. Analysis of the essence of the object under study, the formation and formalization of information known before the start of modeling.

Stage 3 - parameterization. The choice of the general view of the model, the composition and form of its constituent links. The main task of this stage is the choice of the function f(X).

Stage 4 - informational. Collection of necessary statistical information.

Stage 5 - model identification. Statistical analysis of the model and estimation of its parameters. The main body of econometric research.

Stage 6 - verification of the model. Checking the adequacy of the model, assessing the accuracy of model data. It turns out how successfully the problems of specification and identification are solved, what is the accuracy of calculations for this model. It is checked how much the constructed model corresponds to the simulated real one. economic entity or process.

When modeling economic processes in econometric models, they use:

1. Spatial data - a set of information on different objects taken over the same period of time.

2. Temporal data - a set of information characterizing the same object, but for different periods of time.

A set of information is a set of features that characterize the object of study. Features can act in one of two roles: the role of the resulting feature and the role of the factor feature.

Variables are divided into:

Exogenous, the values ​​of which are set from the outside;

Endogenous, the values ​​of which are determined within the model;

Lag - endogenous or exogenous variables of the econometric model, dated from previous points in time and in the equation with current variables;

Predefined - exogenous variables tied to past, current and future points in time and lag endogenous variables already known to present moment time.

Econometrics mainly considers model specification errors, assuming that measurement errors are kept to a minimum.

Model specification - selection of the type of functional dependence (regression equations). The magnitude of random errors will not be the same for model specifications, and keeping the remainder to a minimum allows you to choose the best specification.

In addition to the choice of model specification, it is equally important to correctly describe the structure of the model. The value of the resulting attribute may not depend on the actual value of the explanatory variable, but on the value that was expected in the previous period.

The simplest regression model with only two variables is included in the class of regression models with one equation, in which one explained variable is represented as a function of several independent (explanatory) variables and parameters. This class includes multiple regression models.

More simple are time series models that explain the behavior of a time series based only on its previous values, these are models:

trend,

seasonality,

adaptive forecast,

moving average, etc.

More general are systems of simultaneous equations, in which, in addition to explanatory variables, the right-hand sides can also contain explanatory variables from other equations, i.e. different from the explained variable on the left side of this equation.

When using separate regression equations, it is assumed that the factors can be changed independently of each other, although in reality their changes are not independent, and a change in one variable most often entails changes in the entire system of signs, because. they are interconnected. It is necessary to be able to describe the structure of relationships between variables using a system of simultaneous (structural) equations.

Statistical and mathematical models of economic phenomena and processes are determined by the specifics of a particular area of ​​economic research. The theory and practice of expert assessments is an important section of econometrics, since expert assessments are used to solve a number of economic problems.

Better known in theoretical and educational publications are various econometric models designed to predict macroeconomic indicators. These are usually models aimed at predicting a multivariate time series. They represent a system of linear dependencies between past and present values ​​of variables. In such tasks, the structure of the model is evaluated, i.e. the form of the relationship between the values ​​of the known coordinates of the vector at the previous moments of time and their values ​​at the predicted moment, as well as the coefficients included in this relationship. The structure of such a model is an object of non-numerical nature. Each area of ​​economic research has its own econometric models.

1.2. Technology of econometric study of regression models.

The study and quantitative assessment of objectively existing relationships and dependencies between economic phenomena is the main task of econometrics.

A causal relationship is such a relationship between phenomena in which a change in one of them, called the cause, leads to a change in the other, called the effect. Therefore, the cause always precedes the effect.

Cause-and-effect relationships between phenomena are of the greatest interest to the researcher, which makes it possible to identify the factors that have the main influence on the variation of the phenomena and processes under study.

Causal relationships in socio-economic phenomena have the following features:

1. cause X and effect Y do not interact directly, but through intermediate factors that are omitted in the analysis.

2. socio-economic phenomena develop and form as a result of the simultaneous influence of a large number of factors. One of the main problems in studying these phenomena is the task of identifying the main causes and abstracting from secondary ones.

According to the direction of communication change, they are divided into:

1. direct (the change in the effective and factor signs occurs in the same direction),

2. reverse (the change in the effective and factor signs occurs in opposite directions).

According to the nature of the manifestation, they distinguish:

1. functional relationship - a relationship in which a certain value of a factor attribute corresponds to one and only one value of the resultant attribute, manifests itself in all cases of observation and for each specific unit of the studied population, is studied mainly in the natural sciences.

2. stochastic dependence - a causal dependence that does not manifest itself in each individual case, but in general, with a large number of observations, and the same values ​​​​of factor signs, as a rule, correspond to different values ​​\u200b\u200bof the resultant sign, but, considering the entire set of observations, one can note the presence of a certain relationship between the values ​​of the features. A special case of a stochastic connection is a correlation, in which the change in the average value of the effective attribute is due to a change in factor signs.

According to the analytical expression, the connections are distinguished:

1. linear: the change in the resultant attribute is directly proportional to the change in the factor attributes.

2. non-linear.

Analytically, a linear stochastic relationship between phenomena can be represented by an equation of a straight line on a plane, or by an equation of a hyperplane in an n-dimensional space (in the presence of n factorial variables).

Building an econometric model is the basis of econometric research. The degree of reliability of the analysis results and their applicability depends on how well the resulting model describes the studied patterns between economic processes.

The construction of an econometric model begins with the specification of the model, which consists in obtaining an answer to two questions:

1) what economic indicators should be included in the model;

2) what form does the analytical relationship between the selected features have.

In studies devoted to the development of methods for forecasting such financial indicators as exchange rates, securities, indices, models based on the assumption that the dynamics of these processes are completely determined by internal conditions are widely used.

After identifying the set of variables under consideration, the next step is to determine the specific type of model that best suits the phenomenon under study.

According to the nature of the relationships between factors and the variable, the models are divided into linear and non-linear. According to the properties of their parameters, models are divided into models with a constant and variable structure.

Systems of interrelated econometric equations constitute a special kind of models.

If, on the basis of a preliminary qualitative analysis of the phenomenon under consideration, it is not possible to unambiguously select the most appropriate type of model, then several alternative models are considered, among which, in the process of research, the one that best suits the phenomenon under study is selected.

In the general case, the procedure for constructing an econometric model can be represented as the following steps:

1. Specification of the model, i.e., the choice of a class of models that are most suitable for describing the phenomena and processes under study.

This stage involves the solution of two tasks:

a) selection of significant factors for their subsequent inclusion in the model;

b) the choice of the type of model, i.e. the choice of the type of analytical dependence linking the variables included in the model.

2. Estimation of the model parameters, i.e. obtaining numerical values ​​of the model constants. In this case, a previously obtained array of initial data is used.

3. Checking the quality of the constructed model and substantiating the possibility of its further use. The most complex and time-consuming in econometric research is the stage of estimating model parameters, where methods of probability theory and mathematical statistics are applied.

When solving the problem of choosing the type of analytical dependence, various considerations can be used:

The conclusions of analytical studies on the qualitative nature of dependence,

Description of the properties of various analytical dependencies,

The goals of building a model.

The choice of the type of econometric model is based, first of all, on the results of a preliminary qualitative or meaningful analysis carried out by methods of economic theory. The nature of the alleged dependence is substantiated on the basis of theoretical assumptions about the nature of the pattern of development of the phenomenon or process under study.

Another approach is based on the analysis of the array of initial data, which makes it possible to identify some characteristics of the expected dependencies and, on this basis, to formulate, as a rule, several assumptions about the form of the analytical relationship. The constructed model is used to formulate assumptions about the nature of the regularity in the development of the phenomenon under study, which are verified in the course of further research.

Linear models have found the greatest use in econometrics.

This is due to several reasons:

Exist effective methods building such models.

In a small range of factor values, linear models can approximate real nonlinear dependencies with sufficient accuracy.

The model parameters have a clear economic interpretation.

Forecasts based on linear models are characterized by a lower risk of significant forecast error.

An important component of the process of building an econometric model is the selection of factors that significantly affect the indicator under study and are to be included in the model being developed. The optimal set of factors is determined on the basis of a qualitative and quantitative analysis.

At the stage of setting the task and meaningful economic analysis of the economic model, factors are selected whose influence should be taken into account when building the model. In some cases, the set of factors is determined unambiguously or with a high degree of certainty. In more complex cases, at the next stage, using formal statistical methods, the expediency of including each factor in the model is checked. First of all, the factors are checked for the presence of a close linear correlation between them, the existence of which leads to unreliable estimates of the model parameters.

To overcome a strong interfactorial correlation, apply:

– exclusion from the model of one or more factors. Of the two correlated factors, the one that is more correlated with the other factors is excluded;

- transformation of factors, in which the correlation between them decreases.

One of the criteria for including factors in the model is the degree of their isolated influence on the resulting attribute.

Two methods for determining the optimal set of factors:

1. enable method. A regression equation is built with one most influencing factor, then the following factors are sequentially introduced into it and a pair of the most influencing factors is determined, then one more factor is added to the first two and the best three factors are determined, etc. At each step, a regression model is built and checked the significance of the factors. Only significant factors are included in the model. To test the significance of a factor, either Student's test or Fisher's private test can be used. The process ends when there are no factors left to include in the model.

2. exclusion method. A regression equation is constructed with a full set of factors, from which the insignificant or least significant factors are then sequentially excluded. At each step, only one factor is excluded, since after the exclusion of a factor, another factor, which was previously insignificant, may become significant. The process ends when there are no factors left to be eliminated.

Inclusion and exclusion methods do not guarantee the determination of the optimal set of factors, but in most cases give results that are either optimal or close to them. It is not recommended to include a very large number of factors in the model, as this may make it difficult to identify qualitative patterns and the risk of including insignificant random factors in the model increases. To obtain reliable parameter estimates, it is desirable that the number of observations exceed the number of determined parameters by at least 6-7 times.

After selecting the factors and choosing the type of analytical dependence, the model parameters are evaluated. When estimating the model parameters, a previously prepared array of observations is used as the initial data. The quality of estimates is determined by the presence of such properties as unbiasedness, consistency, and efficiency. An estimate of a parameter is said to be unbiased if its mathematical expectation is equal to the parameter being estimated. An estimate of a parameter is called consistent if it converges in probability to the estimated parameter as the number of observations increases. A parameter estimate is said to be efficient if it has the smallest variance among possible unbiased parameter estimates calculated from samples of the same size n.

CHAPTER 2. DESIGN PART

2.1 Information and methodological support of econometric research.

The methodology of econometric research includes the following stages: specification; parametrization, verification, additional research.

1. The specification of the pair and multiple regression equation models includes an analysis of the correlation dependence of the dependent variable on each explanatory variable. Based on the results of the analysis, a conclusion is made about the model of the regression equation. As a result of the stage, the model of the regression equation is determined.

2. Parametrization of the pair regression equation involves the evaluation of the regression parameters and their socio-economic interpretation. For parameterization, it is recommended to use the "Regression" tool as part of the "Data Analysis" add-ins of MsExcel. Based on the results of automated regression analysis, regression parameters are determined, and their interpretation is also given.

Thus, the econometric study of paired regression includes calculation of the parameters of regression equations, estimation of error variances and variances of model parameters, estimation of the strength of the connection between the factor and the result using the elasticity coefficient, estimation of the tightness of the connection, assessment of the quality of the equation using the average approximation error, estimation statistical reliability regression equations using Fisher's F-test.

To build and analyze a paired regression, a list of the twenty largest countries was selected from the statistical yearbook European Union, namely the number of people who arrived in the country for permanent residence and the nominal annual wages of employees.

The correlation coefficient is calculated by the formula:

Where

The correlation coefficient shows the tightness of the connection between the studied phenomena.

To construct a paired regression equation, it is necessary to consider possible regression equations:

  1. linear dependence
  2. exponential dependence
  3. quadratic dependence
  4. cubic dependency

To estimate the regression parameters for all these models, we apply the method least squares(MNK).

The idea of ​​the method is to obtain the best approximation of a set of observations x i , y i , i = 1,…, n linear function in the sense of minimizing the functional:

To calculate parameters a and b linear regression, a system of equations is solved with respect to a and b .

from which we can determine parameter estimates a and b .

t – Student's criterion.

A hypothesis is put forward H0 about the random nature of the indicator, i.e. insignificant difference from zero. H0:=0

The construction of the equation of the exponential curve is preceded by the procedure of linearization of variables when taking the logarithm of both parts of the equation:

The parameters of the model equation are found by the following formulas:

A linear equation is obtained.

X , you can get the theoretical value results. According to them, an indicator of the closeness of communication is calculated - the correlation index.

This coefficient is checked for significance using t – Student's criterion.

Calculation of estimates of error variances and variances of model parameters is carried out according to the following formulas:

The quadratic curve equation is constructed by replacing

Substituting the actual values ​​into the equation X

This coefficient is checked for significance using t – Student's criterion.

Calculation of estimates of error variances and variances of model parameters is carried out according to the following formulas:

The cubic curve equation is constructed by replacing

It turns out a linear equation

Substituting in this equation the actual values X , you can get the theoretical value results. Based on them, we calculate the indicator of the tightness of the connection - the correlation index.

This coefficient is checked for significance using t – Student's criterion.

Calculation of estimates of error variances and variances of model parameters is carried out according to the following formulas:

The average coefficient of elasticity shows how many percent on average the result y will change from its average value when the factor x changes by 1% from its average value:

The coefficient of determination gives an estimate of the quality of the constructed model. The coefficient of determination characterizes the proportion of the variance of the resulting feature y, explained by regression, in the total variance of the resulting feature.

The coefficient of determination is equal to the square of the correlation index. The closer to unity, the better the quality of the fit, i.e. more accurately approximates y.

The average approximation error is the average deviation of the calculated values ​​from the actual ones:

Permissible limit of values ​​- no more than 8-10%.

The assessment of the significance of the regression equation is carried out using F - Fisher's criterion. In this case, the null hypothesis is put forward about the equality of the actual and residual variances, and therefore, the factor x has no effect on y , i.e.

H 0 : D fact = D rest

To do this, a comparison is made between the actual and critical (table) values F - Fisher's criterion. is determined from the ratio of the values ​​of the factorial and residual variances:

The maximum possible value of the criterion under the influence of random factors for given degrees of freedom and significance level. Significance level - the probability of rejecting a correct hypothesis, provided that it is true.

If<, то отклоняется и признается статистическая значимость и надежность уравнения регрессии, иначе - принимается и делается вывод о не значимости уравнения регрессии.

3. Parametrization of the multiple regression equation involves the evaluation of regression parameters and their socio-economic interpretation. For parameterization, it is recommended to use the "Regression" tool as part of the "Data Analysis" add-ins of MsExcel. Based on the results of automated regression analysis, regression parameters are determined, and their interpretation is also given.

Verification of the regression equation is carried out on the basis of the results of automated regression analysis.

Thus, the econometric study of multiple regression includes the construction of a multiple regression equation, the calculation of elasticity coefficients for each factor and a comparative assessment of the strength of the relationship of each factor with the result, the economic interpretation of the constructed model, the construction of a correlation matrix, the calculation of the multiple correlation coefficient, the calculation of estimates of model error variances and estimates of model parameters, building confidence intervals for the coefficients of the model with a selected level of significance, checking the significance of each coefficient, assessing the closeness of the relationship, assessing the statistical reliability of the regression equation using the F - Fisher criterion.

To build and analyze multiple regression, several more indicators are introduced into the model, allowing to take into account several factors that affect the number of people who arrived in the country for permanent residence. Namely, such factors as the number of unemployed and the country's GDP.

Multiple regression - a link equation with several unknown variables:

where y - dependent variable (resultant sign),

Independent variables (factors).

To construct a multiple regression equation, a linear function written in matrix form is used:

where,

To estimate the parameters of the multiple regression equation, the least squares method is used:

The following system of equations is constructed, the solution of which allows obtaining estimates of the regression parameters:

Its explicit solution is usually written in matrix form, otherwise it becomes too cumbersome.

Model parameter estimates in matrix form are determined by the expression:

X – matrix of values ​​of explanatory variables;

Y is the vector of values ​​of the dependent variable.

To identify the dependence of the number of arrivals for permanent residence on the nominal annual salary of hired workers, the number of unemployed and the level of GDP, we construct a multiple regression equation in the form:

To characterize the relative strength of the influence of factors on y calculate the average coefficients of elasticity. The average coefficients of elasticity for linear regression are calculated by the formulas:

With a linear dependence, the multiple correlation coefficient can be determined through the matrix of paired correlation coefficients:

where is the determinant of the matrix of paired correlation coefficients;

Determinant of the interfactorial correlation matrix.

Matrix of paired correlation coefficients:

Interfactor correlation matrix:

Calculation of estimates of error variances and variances of model parameters is carried out according to the following formulas:

To assess the statistical significance of the regression coefficients, we calculate t -Student's criterion and confidence intervals for each of the parameters. A hypothesis is put forward about the random nature of the indicators, i.e. about their insignificant difference from zero. We get a set of hypotheses:

: b 0 =0; b 1 =0; b2=0; b3=0

t -Student's test is carried out by comparing their values ​​with a tabular value calculated as a quantile of the Student's distribution, where the significance level is the probability of rejecting the correct hypothesis, provided that it is true.

The following formula is used to calculate confidence intervals:

The quality of the constructed model as a whole is estimated by the coefficient of determination. The coefficient of multiple determination is calculated as the square of the multiple correlation index: .

The adjusted index of multiple determination contains a correction for the number of degrees of freedom and is calculated by the formula:

where n is the number of observations;

m is the number of factors.

The significance of the multiple regression equation as a whole, as well as in paired regression, is assessed using F- Fisher's criterion:

At the same time, a hypothesis is put forward about the insignificance of the regression equation:

In conclusion, a judgment is made about the quality of the regression equation.

4. A comparative analysis of regression models is carried out.

2.2. An example of an econometric study.

On the basis of statistical data, an econometric study is carried out in accordance with the methodology of clause 2.1.

All necessary calculations are carried out using MS Excel, manually calculating, using the functions of the Regression data analysis package, the results are verified.

The linear pair correlation coefficient is:

0,504652547

The correlation coefficient has a positive value and is equal, a moderate direct relationship between the indicator y and factor x : with an increase in the average annual salary of the country's workers, the number of people who arrived in the country increases.

2. Construction and analysis of paired regression is carried out. The initial data are presented in Table 1.

Table 1. Initial data for the construction and analysis of paired regression

y - the number of people who arrived in the country for permanent residence, thousand people;

As a result of the analysis, it is necessary to establish how the wages of hired workers in the country affect the number of people who arrived in the country for permanent residence.

Parameter Estimation a and b .

Regression equation:

Regression coefficient b=4.279 shows the average change in the result with a change in the factor by one unit: with an increase in the annual salary of employees by 1 thousand euros. the number of arrivals for permanent residence will increase by an average of 4.279 thousand people. A positive value of the regression coefficient shows the direct direction of the relationship.

The linear pair correlation coefficient is:

0,504652547

Communication is direct and moderate.

2.47 T tabl (0.05; 18) = 2.101

> T table , the coefficient is significant.

Estimates of error variances and variances of model parameters are calculated. Intermediate calculations are presented in Table 2.

10765,218 = 1477,566815 = 2,976774696

Construction of the equation of the exponential curve.

The values ​​of the regression parameters were

0,068027 = 1,68049

A linear equation is obtained: .

After potentiation:

Correlation index.

This coefficient is checked for significance.

2.15 T table (0.05; 18) = 2.101

> T table , the coefficient is significant.

Estimates of error variances and variances of model parameters are calculated. Intermediate calculations are presented in Table 3.

As a result, the following values ​​are obtained:

11483,75 = 452,87517 = 3,1754617

Table 2. Calculation of values ​​for the linear model

Table 3. Calculation of values ​​for the exponential model

An equation of a quadratic curve is constructed.

Equation parameters:

Correlation index.

This coefficient is checked for significance.

3.41 T tabl (0.05; 18) = 2.101

> T table , the coefficient is significant.

Estimates of error variances and variances of model parameters are calculated. Intermediate calculations are presented in Table 4.

As a result, the following values ​​are obtained:

8760,35808 = 743,283328 = 0,00123901

The equation of a cubic curve is constructed.

Equation parameters:

The regression equation takes the form:

Correlation index.

This coefficient is checked for significance.

4.38 T tabl (0.05; 18) = 2.101

> T table , the coefficient is significant.

Estimates of error variances and variances of model parameters are calculated. Intermediate calculations are presented in Table 5.

As a result, the following values ​​are obtained:

6978.45007 = 514.7649432 = 5.9851E-07

The highest degree of connection of variables in a model with a cubic dependence, since the correlation coefficient in the cubic model is closest to unity, and the lowest in the exponential model. The variances of errors and model parameters take the minimum values ​​in the cubic.

Table 4. Calculation of values ​​for the quadratic model

Table 5. Calculation of values ​​for the cubic model

The average coefficient of elasticity is found.

Linear dependency

1,250028395 %.

exponential dependence

1,2083965

With an increase in the annual wages of hired workers by 1%, the number of people who arrived in the country for permanent residence increases by 1,2083965 % .

Quadratic dependency

With an increase in the annual wages of hired workers by 1%, the number of people who arrived in the country for permanent residence increases by 1,24843054 % .

Cubic dependency

0,938829224

With an increase in the annual wages of hired workers by 1%, the number of people who arrived in the country for permanent residence increases by 0,938829224 % .

The values ​​of elasticity coefficients are given in Table 6.

All the constructed models confirm that the amount of wages of hired workers is a factor in increasing the number of people arriving in the country for permanent residence. The elasticity coefficient shows that the annual wages of hired workers have a greater effect on the number of people who arrived in the country for permanent residence with linear and quadratic dependencies. To a lesser extent, this relationship can be traced in the cubic dependence.

The coefficient of determination is found.

Linear dependency

The regression equation explains 25% of the variance of the effective feature, and the remaining factors account for 75% of its variance.

The linear dependency model does not approximate the original data well.

Exponential dependence =

The relationship between the indicators is as weak as in the linear model. Variation have only 20% explained by the variation X , and the share of other factors accounts for 80%. The connection in this model is the weakest. Therefore, the quality of the model is unsatisfactory.

Quadratic dependency

The relationship between the indicators is slightly better than in the exponential and linear models. The variation in y is only 40% due to the variation in x. This model is also not desirable to use for forecasting.

Cubic dependency

The relationship between the indicators is better than in previous models. The 52% variation in y is explained by the variation in x.

The values ​​of the coefficients of determination are presented in Table 6.

Table 6. Calculation of parameters and characteristics of models.

The quality of the built models is low, the model with a cubic dependence has the highest quality score, since the share of the explained variation was 52%.

The average approximation error is determined - the average deviation of the calculated values ​​from the actual ones:

Linear Model = 1153,261 %

On average, the calculated values ​​deviate from the actual ones by 1153,261 %, which indicates a very large approximation error.

Exponential dependence = 396,93259

The approximation error is somewhat lower than that of other models, but is also unacceptable.

Quadratic dependence = 656,415018

A high approximation error is observed, which indicates a low quality of fitting the equation

Cubic dependency = 409,3804652

The approximation error also significantly exceeds the allowable values.In all the considered models, the average approximation error significantly exceeds the allowable values, and the quality of model fitting to the original data is very low.

3. Construction and analysis of multiple regression is carried out.

The initial data for building multiple regression are given in Table 7.

Table 7. Initial data for building multiple regression.

y - the number of people who arrived in the country for permanent residence, thousand people:

x 1 - nominal annual salary of employees, thousand euros.

x2 - the number of unemployed, thousand people.

x 3 - GDP, billion euros.

Estimates of the parameters of the regression equation:

Multiple regression equation:

Average coefficients of elasticity.

0,12026241 = -0,06319176 = 0,86930458

The calculation of these values ​​is given in Table 8.

With an increase in the annual wages of hired workers by 1% of the average level, with the other factors unchanged, the number of arrivals for permanent residence increases by 0,12 %.

With an increase in the number of unemployed by 1% of the average, with the other factors unchanged, the number of arrivals for permanent residence decreases by 0,06 %

With an increase in GDP by 1% of the average, with the other factors unchanged, the number of arrivals for permanent residence increases by 0,87 %

The change in the number of people who arrived in the country for permanent residence is directly dependent on the annual wages of hired workers and the level of the country's GDP and inversely on the number of unemployed, which does not contradict logical assumptions. Elasticity coefficients, as indicators of the strength of the connection, show that the largest change in the number of arrivals in the country is caused by the value of GDP, and the smallest - by the number of unemployed.

The coefficient of multiple correlation is calculated:

The value of the multiple correlation index ranges from 0 to 1.

The average approximation error is calculated:

372,353247%

The value of the average approximation error indicates a poor fit of the model to the original data.

Table 8. Calculation of the values ​​of the characteristics of the multiple regression model

The combined influence of all factors on the number of people arriving in the country for permanent residence is quite large. FROMthe relationship between the indicator under consideration and the factors influencing it increased compared to the paired regression ( r yx =0.506). There is a pretty strong connection.

It should be taken into account that there is a slight multicollinearity in the model, which may indicate its instability, since the determinant of the interfactorial correlation matrix is ​​quite far from 1. The maximum pair correlation coefficient is observed between the factors x 1 and x 3 (r x 1 x 3 =0.595), which is quite understandable, since the average annual wage in the country should be directly dependent on the country's GDP.

Calculation of estimates of error variances and variances of model parameters:

n = 20 – number of observations, m =4 – number of parameters.

For the constructed model, the estimate of the error variance was:

6674,02207

Estimates of dispersions of model parameters:

Standard errors of model parameters:

Intermediate calculations of the obtained data are presented in Appendix 8.

Estimating the significance of regression coefficients using t -Student's criterion.

Values<, значит коэффициенты являются статистически незначимыми и случайно отличаются от 0.

> means it is statistically significant

For the constructed model, the confidence intervals for the regression coefficients are:

All obtained regression coefficients except, are statistically insignificant, the confidence interval for them is quite large, which may indicate an insufficient quality of the model.

Multiple determination coefficient for the constructed model

This coefficient of determination shows that the quality of the model is satisfactory.

With the addition of one more variable, it usually increases. In order to avoid possible exaggeration of the tightness of the connection, an adjusted coefficient of determination is applied. For a given volume of observations, other things being equal, with an increase in the number of independent variables (parameters), the adjusted coefficient of multiple determination decreases. For the constructed model, the values ​​of the corrected and uncorrected coefficient of determination do not differ significantly from each other, but since the adjusted coefficient of determination decreased slightly, it can be assumed that the increase in the share of the explained regression when adding a new variable is insignificant, and that adding a variable is not practical.

Estimating the significance of the regression equation using F - Fisher's criterion.

F (0.05, m -1, n - m)= F (0.05,1.18)= 4.413873

Linear Model = 6,150512218

Exponential dependence = 4,6394274

Quadratic dependence = 11,6775003

Cubic dependency = 19,25548322

In all considered models<, гипотеза отвергается.

Significance of the multiple regression equation in general using F- Fisher's criterion:

Since F table< F факт that is not accepted

4. As a result of the study, the following conclusion can be drawn: All the resulting regression equations are significant. According to the results F -test and indicators of the coefficient of determination and the average approximation error, we can conclude that among the considered models of pair regression there is no model with good quality that could be used for the purpose of forecasting. However, the best model that describes the relationship between the annual salary of wage workers in the country and the number of people who arrived in the country for permanent residence is a model with a cubic dependence, since it is significant, the coefficient of determination takes the largest value and the average approximation error is not so large compared to with other models, although it does not take a valid value.

All four paired regression models are statistically significant, however, rather small values ​​of the coefficient of determination, large errors in the mean approximation indicate poor quality of these models.

Comparing the parameters and characteristics of these equations, it is concluded that the model with a cubic dependence has the highest reliability and accuracy. This is evidenced by the highest value of the correlation index and, accordingly, the coefficient of determination, which is closest to 1 and confirms the best quality of the model in terms of data approximation, the results of the F-test, which recognized the model as significant, as well as the average approximation error, which is smaller than that of other models. The standard errors of the regression parameters and the standard error of the forecast for this model also take smaller values.

The multiple regression equation is significant, i.e. the hypothesis about the random nature of the estimated characteristics is rejected. The resulting model is statistically reliable.

CONCLUSION

As a result of econometric research and data analysis, four paired regression equations were considered that establish the relationship between the average annual wages of wage workers in the country and the number of people who arrived in the country for permanent residence. This is a linear model, exponential, models with quadratic and cubic dependence. All constructed models confirm that the growth in the wages of hired workers is a factor in the increase in the number of people arriving in the country for permanent residence.

The highest indicator of the tightness of the relationship of variables in the model with a cubic dependence, since the coefficient of determination in the cubic model takes the highest value, which indicates the greatest reliability of the found regression equation. The model in the form of a cubic dependence best describes the relationship between the number of people who arrived in the country for permanent residence and the annual wages of hired workers.In all considered models, the average approximation error significantly exceeds the allowable values, which indicates a low quality of model fitting. However, the model with a cubic dependence is the best in terms of data approximation and estimation of the tightness of the connection, since it has the largest proportion of explained variation compared to other models - 52% (the coefficient of determination is closest to 1).

For all the parameters considered, the regression equation with a cubic dependence is the best of those considered. But not optimal for practical use and forecasting, which is explained by the large scatter of data, and also by the fact that the number of immigrants depends on many factors that cannot be taken into account in a pairwise regression.

Not good enough characteristics of the model can be caused by the presence in the initial data of units with anomalous values ​​of the characteristics under study: in the UK, the number of arrivals for permanent residence significantly exceeds this indicator for other countries. Perhaps, to obtain a more accurate and reliable result, this country should be excluded from the sample.

As a result of building a multiple regression, the influence on the number of people who arrived in the country for permanent residence of such factors as the country's GDP, the number of unemployed and the average annual wage of hired workers was studied.

The change in the number of people who arrived in the country for permanent residence is in direct proportion to the annual wages of hired workers and the level of the country's GDP and inversely to the number of unemployed. The largest change in the number of arrivals in the country is caused by the value of GDP, and the smallest - by the number of unemployed.

The combined influence of all factors on the number of arrivals in the country for permanent residence is quite large, since the multiple correlation indextakes on a high value. However, this can be explained by the presence of multicollinearity.

All the obtained coefficients of the multiple regression equation, except for the coefficient at the factor level of GDP, are statistically insignificant, the confidence interval for them is quite large.

Despite this, the coefficient of determination shows that the quality of the model is satisfactory. The multiple regression equation is significant, i.e. the hypothesis about the random nature of the estimated characteristics is rejected.

However, heteroscedasticity can be observed in the model; The model may need to be corrected.

These results can be explained by a rather small sample size, especially taking into account the global nature of the study, the presence of an abnormal value of the trait under study, the absence of any significant factors, and the fact that the number of emigrants to the country depends on a large number of non-quantitative, personal factors, individual preferences.

Despite the lack of an accurate result and a qualitative regression equation suitable for forecasting and further research, the study found that the wages of wage workers in the country, the unemployment rate and GDP have an important impact on the number of people who arrived in the country for permanent residence.

List of sources used

1. Gerasimov, A.N. Econometrics: theory and practice [Electronic resource]: electronic textbook / Gerasimov, A.N., Gladilin, A.V., Gromov, E.I. - M.: KnoRus, 2011. - CD. - (82803-2) (U; G 37)

2. Yakovleva, A. Order. Econometrics: a course of lectures - M .: Eksmo, 2010. - (83407-1)

3. Valentinov, V.A. Econometrics [Text]: workshop - M.: Dashkov i K, 2010. - 435 p. - (84265-12) (U; V 15)

4. Valentinov, V.A. Econometrics [Text]: a textbook for universities on special. "Mathematical methods in economics" and other economics. specialist. - M.: Dashkov i K, 2010. - 448 p. - (84266-30) (U; V 15)

5. Novikov, A.I. Econometrics [Text]: textbook. allowance for universities in the direction 521600 "Economics" and economics. specialties - M.: INFRA-M, 2011. - 143, p. - (86112-10) (U; N 73)

6. Kolemaev, V.A. Econometrics [Text]: a textbook for universities in the specialty 061800 "Mathematical Methods in Economics" / State. un-t ex. - M.: INFRA-M, 2010. - 160 p. - (86113-10) (U; K 60)

7. Gladilin, A.V. Econometrics [Text]: textbook. allowance for universities in economics. specialties / Gladilin, A.V., Gerasimov, A.N., Gromov, E.I. - M.: KnoRus, 2011. - 227 p. - (86160-10) (U; G 52)

8. Novikov, A.I. Econometrics [Text]: textbook. allowance for eg. "Finance and credit", "Economics" - M.: Dashkov i K, 2013. - 223 p. - (93895-1) (U; N 73)

9. Timofeev, V.S. Econometrics [Text]: textbook for bachelors in economics. e.g. and special / Timofeev, V.S., Faddeenkov, A.V., Shchekoldin, V.Yu. - M.: Yurayt, 2013. - 328 p. - (94305-3) (U; T 41)

10. Econometrics [Text]: a textbook for masters, for universities in economics. directions and special / Eliseeva, I.I., Kurysheva, S.V., Neradovskaya, Yu.V., [and others]; ed. I.I. Eliseeva; St. Petersburg state. University of Economics and Finance - Moscow: Yurayt, 2012. - 449 p. - (95469-2) (U; E 40)

11. Novikov, A.I. Econometrics [Electronic resource]: textbook. allowance - M .: Dashkov and K, 2013. - EBS Lan. - (104974-1) (U; N 73)

12. Varyukhin, A.M. Econometrics [Text]: lecture notes / Varyukhin, A.M., Pankina, O.Yu., Yakovleva, A.V. - M.: Yurayt, 2007. - 191 p. - (105626-1) (U; V 18)

13. Econometrics [Electronic resource]: textbook / Baldin, K.V., Bashlykov, V.N., Bryzgalov, N.A., [and others]; ed. V.B. Utkin - Moscow: Dashkov and K, 2013. - EBS Lan. - (107123-1) (U; E 40)

14. Perepelitsa, N.M. *Econometrics: workshop (direction 100700.62 Trade business) [Electronic resource]: as part of the educational and methodological complex / Tver State University. tech. un-t, Dept. MEN - Tver: TVGTU, 2012. - Server. - (107926-1)

EMBED Equation.3

Other related works that may interest you.vshm>

1589. Comparative analysis of antivirus programs 79.33KB
In this final qualifying work, the problem of combating computer viruses, which anti-virus programs are engaged in, is considered. Among the set of programs used by most personal computer users every day, anti-virus programs traditionally occupy a special place.
19100. Comparative analysis of intuitive and logical thinking 22.37KB
Comparative analysis of intuitive and logical thinking. The main theories of thinking and approaches to its study in foreign and domestic psychology. In the process of thinking, a person reflects the objective world differently than in the processes of perception and imagination. During independent work the main theories of thinking and approaches to its study in psychology will be considered.
18483. NORTH AMERICA INDIAN TALES: A COMPARATIVE ANALYSIS 8.39KB
The phenomenon of a fairy tale is a very mysterious topic of research, since oral folk art, more than other types of art, is subject to changes and distortions of meaning under the influence of changing factors in the sociocultural environment.
18490. 115.79KB
Responsibility of a public notary in the implementation of notarial acts. The legal basis for the activities of privately practicing notaries in the territory of the Republic of Kazakhstan. Responsibility of a notary in private practice. Comparative analysis of institutions of public and private notaries in the territory of the Republic of Kazakhstan. Judicial practice in considering cases of challenging the actions of notaries in the exercise of notarial ...
9809. Comparative analysis and prospects for the development of portable computers 343.85KB
problem this study is relevant in modern conditions. This is evidenced by the frequent study of the issues raised, and despite all the abundance of information about portable computers, their functional features, fundamental differences and long-term development prospects remain unclear.
14351. SHADOW ECONOMY IN MODERN INTERPRITATION: A COMPARATIVE ANALYSIS 186.56KB
To achieve the formulated goal, the following tasks are set. First, it is necessary to consider the main causes and prerequisites for the emergence of the shadow economy. Second, give general characteristics the concept of the phenomenon of the shadow economy, its economic nature. Thirdly, there is a need to conduct a meaningful and structural analysis this economic phenomenon
14398. COMPARATIVE ANALYSIS OF GAS FIELDS IN THE AMUDARYA REGION OF TURKMENISTAN 5.97MB
Comparative characteristics of gas fields on deposits of the Upper and Middle Jurassic. Today, Jurassic and Cretaceous sediments are the main object for searching for oil and gas deposits. Other objects of the Amudarya region, despite their prospects, remain in anticipation of drilling and discovery of oil and gas fields in the Cenozoic...
20554. Comparative analysis of approaches to determining margin requirements for derivatives portfolios 275.48KB
Central counterparties serve markets that often differ significantly both in terms of microstructure and range of financial instruments with different risk profiles: spot markets with T+ execution mode, money market instruments (for example, repo transactions), exchange-traded and over-the-counter derivatives
19049. COMPARATIVE ANALYSIS AND EVALUATION OF THE PERFORMANCE CHARACTERISTICS OF PC POWER SUPPLY 1.04MB
A modern power supply is a switching unit, not a power unit. The impulse block contains more electronics and has its own advantages and disadvantages. The advantages include low weight and the possibility of continuous power supply when the voltage drops. The disadvantages are the presence of a not very long service life compared to power units due to the presence of electronics.
16100. Demand for Education Services in Russia: Comparative Econometric Analysis 228.72KB
Data and variables used To analyze Russian household spending on educational services, we used data from a regular sample micro-survey of household budgets of the Federal State Statistics Service of the Russian Federation for 2007. The variable was changed to eliminate outliers in the sample and obtain more robust estimation results. Models and results Heckman model The Heckman model was chosen to estimate household demand for education, the variables with asterisks are unobservable...

The main goal of multiple regression is to build a model with a large number factors and at the same time determining the influence of each of the factors separately on the result, as well as determining the cumulative effect of the factors on the modeled indicator.

The specification of a multiple regression model includes the selection of a factor and the choice of the type of mathematical function (selection of the type of regression equation). The factors included in the multiple regression should be quantitatively measurable and should not be intercorrelated, and even more so be in an exact functional relationship (i.e., they should influence each other to a lesser extent, and to a greater extent on the effective trait).

The factors included in the multiple regression should explain the variation in the independent variable. For example, if a model is built with a set of - factors, then the value of the determination indicator is found for it, which fixes the share of the explained variation of the effective attribute due to - factors.

The influence of other unaccounted for factors in the model is estimated as the corresponding residual variance.

When an additional factor is included in the model, the value of the determination index should increase, and the value of the residual variance should decrease. If this does not happen, then the additional factor does not improve the model and is practically redundant, and the introduction of such a factor can lead to statistical insignificance of the regression parameters according to Student's t-test.

The selection of factors for multiple regression is carried out in two stages:

1. Factors are selected based on the essence of the problem.

2. Based on the matrix of correlation indicators, statistics are determined for the regression parameters.

Correlation coefficients between explanatory variables, which are also called intercorrelation coefficients, allow you to exclude duplicate factors from the model.

Two variables and are said to be explicitly collinear if the correlation coefficient is .

If the variables are clearly collinear, then they are in a strong linear relationship.



In the presence of clearly collinear variables, preference is given not to the factor that is more closely related to the result, but to the factor that, at the same time, has the least close relationship with other factors.

According to the magnitude of the pair correlation coefficients, only the explicit colleniarity of the factors is found.

When using multiple regression, multicolleniality of facts may occur, i.e. more than two factors are linearly related. In such cases, OLS becomes less reliable when assessing individual factors, resulting in difficulty in interpreting the parameters of multiple regression as characteristics of the action of a factor in its pure form. Linear regression parameters lose their economic meaning, parameter estimates are unreliable, large standard errors arise, which, in this case, can change with a change in the volume of observations, i.e. the model becomes unsuitable for analyzing and forecasting the economic situation. The following methods are used to assess the multicolleniality of a factor:

1. Determination of the matrix of paired correlation coefficients between factors, for example, if a linear multiple regression model is given, then the determinant of the matrix of paired coefficients will take the form:

If the value of this determinant is 1

,

then the factors are non-collinear with each other.

If there is a complete linear relationship between the factors, then all pair correlation coefficients are equal to 1, resulting in

.

2. Method for testing the hypothesis of the independence of variables. In this case, the null hypothesis , it is proved that the value has an approximate distribution with the number of degrees of freedom .

If , then the null hypothesis is rejected.

By determining and comparing the coefficients of multiple determination of a factor, using each of the factors as a dependent variable in succession, it is possible to determine the factors responsible for multicolleniality, i.e. factor c highest value values ​​.

There are the following ways to overcome strong cross-factorial correlation:

1) exclusion from the model of one or more data;

2) transformation of factors to reduce the correlation;

3) combination of the regression equation, which will reflect not only the factors, but also their interaction;

4) transition of the reduced form equation, etc.

When constructing a multiple regression equation, one of the most important stages is the selection of factors included in the model. Various approaches to the selection of factors based on correlation indicators to various methods, among which the most applicable are:

1) Exclusion method - data is filtered out;

2) Inclusion method - an additional factor is introduced;

3) Stepwise regression analysis - eliminate the previously introduced factor.

When selecting factors, the following rule is used: the number of factors included is usually 6-7 times less than the volume of the population on which the model is built.

The parameter is not subject to economic interpretation. In the power model nonlinear equation multiple regression coefficients , ,…, are elasticity coefficients that show how much, on average, the result will change when the corresponding factor changes by 1%, with the influence of other factors remaining unchanged.

The subject and method of econometrics.

Econometrics is a science that gives a quantitative expression of the interactions of economic phenomena and processes.

Econometrics is any application of mathematics or statistical methods to the study of economic phenomena.

Econometrics is the science of modeling economic phenomena, which makes it possible to explain and predict their development, to identify and measure the determining factors.

It arose as a result of the fusion of statistics, economic theory and mathematical methods.

The subject of econometrics is economic phenomena.

Tasks of econometrics:

1.Construction of econometric models convenient for analysis (specification)

2. Estimation of parameters that make the selected model adequate to real data (parametrization)

3. Checking the quality of the found parameters and the model itself as a whole (verification)

4. Using the built models to explain the studied econometric indicators, their forecasting (forecasting and interpretation)

Methodological tools:

1.methods of mathematical and statistical regression analysis

2. time series analysis

3. solution of systems of simultaneous equations

4. testing statistical hypotheses

5. Methods for solving problems of specification and identification of models

6. multivariate statistical methods

Paired regression model specification

Pair regression is an equation that describes the correlation between a pair of variables: the dependent variable y (result) and the independent variable x (factor). y=f(x)

The function can be linear or non-linear.

Any econometric study begins with the specification of the model, i.e. formulation of the type of model, based on the relevant theory and the relationship between variables.

In each individual case, the value of y is obtained from 2 terms: , where

Yj is the actual value of the result, is the theoretical value of the result found from the corresponding function of y and x, Ej is a random variable characterized by the deviation of the real and calculated values ​​of the result y.

The presence of a random variable in the model is associated with: the specification of the model, the selective nature of the initial data, and the features of the measurement of variables.

3.Linear regression and correlation.

Linear regression is an equation that describes the correlation between a pair of variables: the dependent variable "y" (result) and the independent variable "x" (factor). y=f(x) The function can be linear or non-linear. Any econometric study begins with the specification of the model, i.e. the type of model is formed based on the corresponding theory and the relationship between variables. In each individual case, the value of "y" is obtained from 2 terms yj \u003d yxj (with a hat) + Ej, where yj is the actual value of the result; yxj (with a hat) is the theoretical value of the result; Ej is a random value. The presence of a random variable in the model is associated with: 1. The specification of the model 2. The selective nature of the initial data 3. Features of the measurement of variables Pair regression describing a linear relationship can be represented as follows. form: Yi=α+βxi+E , i= 1,2,3…N, where yi is the ith value of the dependent variable; α and β are the general parameters of paired linear regression; N is the volume of the gene set. Practical regression is built on the basis of the sample data and is written as: yi= a+bxi+Ei, i=1.2…n (n-Volume; a and b are sample parameters of the paired gene regression of the sample). Estimation of parameters a and b is possible with using the method of least squares (LSM) With the help of which a system of linear equations is built to find a and b.



B - keff. regression, it shows the average change in result with a change in the factor by 1 unit. coefficient sign. b shows the direction of the connection, if b>0, then the connection is direct, if b<0 , то обратная

A is a free member of the regression, this is the value of the cut at x = 0. This parameter has no economic content, it is possible to characterize only the sign in front of the parameter, if a>0, then the relative change in the result is slower than the change in the factor, if a<0 , то происходит опережение в изменении рез-та над изменением фактора. Уравнение регрессии всегда дополняется показателями тесноты связи в качестве, которого выступает линейный коэфф. корреляции Ϥyx.

Ϥyx= xy(average)– x(average)*y(average) / Ǫx*Ǫy Ǫx= root of ∑(x-xav)^2/n Ǫy= root of ∑(y-av)^2/n



Linear coefficient. correlation is within -1<=Ϥxy>=1 , for b>0 0<=Ϥxy>=1; at b<0 -1<=Ϥxy>=0 To assess the quality of a linear function, the coefficient is calculated. determinations (D) . D= Ϥ 2 xy. After the linear regression equation is found, the significance of the equation as a whole and in individual parameters is assessed. To assess the ur-ya and regression, you need to use Fisher's T-test: F \u003d (r 2 / 1- r 2) * (n-2). The calculated value is compared with the table value at the significance level α= 0.05. If Fcalc> Ftabl, then the regression equation is recognized as significant. Also, in linear regression, the significance of not only the model as a whole, but also its individual parameters is evaluated. For this purpose, the standard error of the parameters ma and mb is determined

Mb= (∑(y-y with a hat)^2/n-2) / ∑(x-xav)^2

Ma = root of S^2*∑X^2/N∑(X-XCP)^2 ; t = a/ ma correlation is checked based on the magnitude of his error: mr=root of 1-r^2/n-2. The actual value of Student's t test: tr=Ϥ/ root of 1-Ϥ 2 * root of n-2 Model parameters will be significant if tcalc >ttable, and otherwise it is not allowed to study.

Spurs on econometrics.

No. 1. MODEL SPECIFICATION

simple regression is a regression between two variables -y and x, i.e. view model , where at- effective sign; X- sign factor.

Multiple regression is a regression of an effective feature with two or more factors, i.e. a model of the form

Model specification - formulation of the type of model, based on the relevant theory of the relationship between variables. In the regression equation, the essentially correlation relationship of features is represented as a functional relationship expressed by the corresponding mathematical function. where y j - the actual value of the effective feature;

y xj is the theoretical value of the effective feature.

- a random variable that characterizes the deviations of the real value of the resulting feature from the theoretical one.

Random valueε is also called outrage. It includes the influence of factors not taken into account in the model, random errors and measurement features.

The amount of random errors depends on the correctly chosen specification of the model: they are the smaller, the more the theoretical values ​​of the resulting feature fit the actual data. y.

Specification errors include the wrong choice of one or another mathematical functions for, and underestimation in the regression equation of any significant factor, i.e., the use of paired regression instead of multiple.

Sampling errors - the researcher most often deals with sample data when establishing a regular relationship between features.

Measurement errors practically negate all efforts to quantify the relationship between features. The focus of econometric research is on model specification errors.

In pair regression, the choice of the type of mathematical function can be carried out by three methods: graphical, analytical and experimental.

The graphical method is based on the correlation field. Analytical method is based on the study of the material nature of the connection of the studied features.

experimental method is carried out by comparing the value of the residual dispersion Dres, calculated with different models. If the actual values ​​of the resulting attribute coincide with the theoretical at =, then Docm=0. If there are deviations of the actual data from the theoretical ( at - ) then .

The smaller the residual variance, the better the regression equation fits the original data. The number of observations should be 6 - 7 times greater than the number of calculated parameters for the variable x.

#2 LINEAR REGRESSION AND CORRELATION: MEANING AND ASSESSMENT OF PARAMETERS.

Linear regression is reduced to finding an equation of the form or .

An equation of the form allows for the given values ​​of the factor x to have the theoretical values ​​of the effective feature, substituting the actual values ​​of the factor x into it.

The construction of a linear regression is reduced to estimating its parameters a and b.

Linear regression parameter estimates can be found by different methods.

1.

2.

Parameter b called the regression coefficient. Its value shows the average change in the result with a change in the factor by one unit.

Formally but- meaning at at x = 0. If the sign-factor
does not and cannot have a zero value, then the above
free term interpretation, but doesn't make sense. Parameter, but maybe
have no economic content. Attempts economically
interpret the parameter, but can lead to absurdity, especially when but < 0.

Only the sign of the parameter can be interpreted but. If but> 0, then the relative change in the result is slower than the change in the factor.

The regression equation is always supplemented with an indicator of the tightness of the relationship. When using linear regression, such an indicator is the linear correlation coefficient r xy . There are various modifications of the linear correlation coefficient formula.

The linear correlation coefficient is in the limits: -1≤ . rxy≤ 1. Moreover, the closer r to 0, the weaker the correlation, and vice versa, the closer r is to 1 or -1, the stronger the correlation, i.e. the dependence of x and y is close to linear. If r exactly =1 or -1 all points lie on the same straight line. If the coefficient regression b>0 then 0 ≤. rxy≤ 1 and vice versa for b<0 -1≤.rxy≤0. Coef. correlation reflects the degree of linear dependence of m / y values ​​in the presence of a pronounced dependence of another type.

To assess the quality of the selection of a linear function, the square of the linear correlation coefficient is calculated, called determination coefficient. The coefficient of determination characterizes the proportion of the variance of the resulting feature y, explained by the regression. The corresponding value characterizes the proportion of dispersion y, caused by the influence of other factors not taken into account in the model.

No. 3. MNK.

LSM allows one to obtain such parameter estimates but And b , which the sum of the squared deviations of the actual values ​​of the resulting attribute (y) from the calculated (theoretical) minimum:

In other words, from the entire set of lines, the regression line on the graph is chosen so that the sum of the squares of the vertical distances between the points and this line would be minimal. The system of normal equations is solved

No. 4. ASSESSMENT OF THE SIGNIFICANCE OF PARAMETERS LINEAR REGRESSION AND CORRELATION .

The assessment of the significance of the regression equation as a whole is given using Fisher's F-test. In this case, a null hypothesis is put forward that the regression coefficient is equal to zero, i.e. b = 0, and hence the factor X does not affect the result y.

The direct calculation of the F-criterion is preceded by an analysis of the variance. Central to it is the expansion of the total sum of squared deviations of the variable at from the average value at into two parts - "explained" and "unexplained":

Total sum of squared deviations

The sum of squares of the deviation explained by the regression is the residual sum of the squares of the deviation.

, i.e. with the number of freedom of independent variation of the feature. The number of degrees of freedom is related to the number of units of the population n and the number of constants determined from it. With regard to the problem under study, the number of degrees of freedom should show how many independent deviations from P

Dispersion per degree of freedom D .

F-ratios (F-criterion):

If the null hypothesis is true, then the factorial and residual variances do not differ from each other. For H 0, a refutation is necessary so that the factor variance exceeds the residual one by several times. The English statistician Snedekor developed tables of critical values ​​of F-ratios for different levels of significance of the null hypothesis and a different number of degrees of freedom. The tabular value of the F-criterion is the maximum value of the ratio of variances that can occur if they randomly diverge for a given level of probability of the presence of a null hypothesis. The calculated value of the F-ratio is recognized as reliable if o is greater than the table value. In this case, the null hypothesis about the absence of a relationship of signs is rejected and a conclusion is made about the significance of this relationship: F ​​fact > F table H 0 is rejected.

If the value is less than the tabular F fact ‹, F table, then the probability of the null hypothesis is above a given level and it cannot be rejected without a serious risk of making an incorrect conclusion about the presence of a relationship. In this case, the regression equation is considered statistically insignificant. N o does not deviate.

Standard error of the regression coefficient

To assess the significance of the regression coefficient, its value is compared with its standard error, i.e., the actual value of Student's t-test is determined: which

then it is compared with the table value at a certain level of significance and the number of degrees of freedom (n-2).

Parameter Standard Error but :

The significance of the linear correlation coefficient is tested based on the magnitude of the correlation coefficient error T r :

The total variance of feature x:

Coef. regression Its value shows cf. change in result with a change in the factor by 1 unit.

Approximation error:

No. 5. FORECAST INTERVALS BY LINEAR EQUATION

REGRESSIONS

Assessment stat. the significance of the regression parameters is carried out using t - Student's statistics and by calculating the confidence interval for each of the indicators. A hypothesis H 0 is put forward about a statistically significant difference of indicators from 0 a = b = r = 0. Standard errors are calculated parameters a, b, r and actual value t - Student's criterion.

The stat is determined. the significance of the parameters.

t a ›T tabl - a stat. significant

t b ›T tab - b stat. meaningful

The boundaries of confidence intervals are found.

An analysis of the upper and lower boundaries of the confidence intervals leads to the conclusion that the parameters a and b, being within the specified boundaries, do not take zero values, i.e. not yavl.. stat. insignificant and significantly different from 0.

No. 6. NONLINEAR REGRESSION. TYPES OF MODELS

If there are non-linear relationships between economic phenomena, then they are expressed using the corresponding non-linear functions: for example, an equilateral hyperbola , second degree parabolas and etc.

There are two classes of non-linear regressions:

regressions that are non-linear with respect to the explanatory variables included in the analysis, but linear with respect to the estimated parameters;

Regressions that are non-linear in the estimated parameters.
The following functions can serve as an example of non-linear regression on the explanatory variables included in it:

Polynomials of various degrees

Equilateral hyperbola

Nonlinear regressions by estimated parameters include the following functions:

Power

Demonstration

Exponential I

No. 7. THE MEANING OF THE REGRESSION COEFFICIENT.

Parameter b called the regression coefficient. Its value shows the average change in the result with a change in the factor by one unit. An estimate of the regression coefficient can be obtained without resorting to the least squares method. Alternative parameter estimation b can be found based on the content of this coefficient: the change in the result is compared with the change in the factor

The total sum of the squared deviations of the individual values ​​of the resulting attribute at from the average value is caused by the influence of many factors. We conditionally divide the entire set of reasons into two groups: studied factor x And other factors.

If the factor does not affect the result, then the regression line on the graph is parallel to the axis Oh and .Then the entire variance of the resulting attribute is due to the influence of other factors and the total sum of squared deviations will coincide with the residual. If other factors do not affect the result, then u tied from X functionally, and the residual sum of squares is zero. In this case, the sum of squared deviations explained by the regression is the same as the total sum of squares.

Since not all points of the correlation field lie on the regression line, their scatter always takes place as due to the influence of the factor x, i.e., regression at on X, and caused by the action of other causes (unexplained variation). The suitability of the regression line for prediction depends on how much of the total variation of the trait at falls under the explained variation

Obviously, if the sum of squared deviations due to regression is greater than the residual sum of squares, then the regression equation is statistically significant and the factor X has a significant impact on the outcome

Any sum of squared deviations is related to the number of degrees of freedom , i.e. with the number of freedom of independent variation of the feature. The number of degrees of freedom is related to the number of units of the population n and the number of constants determined from it. In relation to the problem under study, the number of degrees of freedom should show how many independent deviations from P possible is required to form a given sum of squares.

No. 8. APPLICATION OF LSM TO NONLINEAR MODELS WITH RESPECT TO INCLUDED VARIABLES AND EVALUATED PARAMETERS.

Nonlinear regression on the included variables does not contain any difficulties in estimating its parameters. It is determined, as in linear regression, by the least squares method (LSM), because these functions are linear in parameters. So, in a parabola of the second degree y=a 0 +a 1 x+a 2 x 2 +ε replacing the variables x=x 1 ,x 2 =x 2 , we get a two-factor linear regression equation: y \u003d a 0 + a 1 x 1 + a 2 x 2 + ε

A parabola of the second degree is appropriate for use if, for a certain interval of factor values, the nature of the relationship of the features under consideration changes: a direct relationship changes to an inverse one or an inverse one to a direct one. In this case, the value of the factor is determined at which the maximum (or minimum) value of the effective feature is achieved: equate to zero the first derivative of the second degree parabola: , i.e. b+2cx=0 and x=-b/2c

The use of least squares to estimate the parameters of a parabola of the second degree leads to the following system of normal equations:

It can be solved by the determinant method:

In models that are non-linear in terms of estimated parameters, but reduced to a linear form, LSM is applied to the transformed equations. If in a linear model and models that are non-linear in variables, when estimating parameters, they proceed from the min criterion, then in models that are non-linear in terms of estimated parameters, the LSM requirement is applied not to the initial data of the resulting attribute, but to their transformed values, i.e. ln y , 1/y . So, in a power function, the least squares method is applied to the transformed equation lny = lnα + β ln x ln ε. This means that parameter estimation is based on minimizing the sum of squared deviations in logarithms. Accordingly, if in linear models, then in models that are nonlinear in terms of the estimated parameters, . As a result, the estimates of the parameters turn out to be somewhat biased.

No. 9. ELASTICITY COEFFICIENTS FOR DIFFERENT TYPES OF REGRESSION MODELS.

1. Linear y = a + bx + , y′ = b, E = .

2. Parabola of the 2nd order y = a + bx + c +, y′ = b + 2cx, E = .

3. Hyperbola y = a+b/x +, y′=-b/, E = .

4. Exponential y=a, y′ = ln , E = x ln b.

5. Power y = a, y′ = , E = b.

6. Semilogarithmic y = a + b ln x +ε , y′ = b/x , E = .

7. Logistic , y′ = , E = .

8. Inverse y = , y′ = , E = .

#10 CORRELATION INDICATORS

The value of this indicator is within the limits: 0 ≤ R 1, the closer to 1, the closer the relationship of the features under consideration, the more reliable the found regression equation.

2. The determination index is used to check the significance of the non-linear regression equation in general according to Fisher's F-criterion:

No. 11. MULTIPLE REGRESSION. MODEL SPECIFICATION. SELECTION OF FACTORS IN CONSTRUCTION OF THE MODEL.

Regression can give a good result in modeling if the influence of other factors affecting the object of study can be neglected. The behavior of individual economic variables cannot be controlled, i.e., it is not possible to ensure the equality of all other conditions for assessing the influence of one factor under study. In this case, you should try to identify the influence of other factors by introducing them into the model, i.e., build a multiple regression equation: y = a + b 1 x 1 + b 2 +…+ b p x p + e ; This kind of equation can be used in the study of consumption. Then the coefficients bj- private derivatives of consumption at according to relevant factors x i : , assuming that all other x i are constant. In the 30s. 20th century Keynes formulated his consumer function hypothesis. Since that time, researchers have repeatedly addressed the problem of its improvement. The modern consumer function is most often thought of as a view model: C = j ( y , P , M , Z ), where FROM- consumption; at- income; R- price, cost of living index; M - cash; Z- liquid assets. At the same time, .. The main goal of multiple regression is to build a model with a large number of factors, while determining the influence of each of them individually, as well as their cumulative impact on the modeled indicator. The specification of the model includes two areas of questions: the selection of factors and the choice of the type of regression equation. Requirements for factors.1 They must be quantifiable. If it is necessary to include a qualitative factor in the model that does not have a quantitative measurement, then it must be given quantitative certainty (for example, in the yield model, soil quality is given in the form of points) Inclusion in the model of factors with high intercorrelation, when R yx 1 R x 1 x 2. For the dependence y = a + b 1 x 1 + b 2 +…+ b p x p + e can lead to undesirable consequences, entail instability and unreliability of estimates of regression coefficients. If there is a high correlation between the factors, then it is impossible to determine their isolated influence on the performance indicator, and the parameters of the regression equation are not interpreted.

The factors included in the multiple regression should explain the variation in the independent variable. If a model is built with a set R- factors, then the indicator of determination is calculated for it R 2 , which fixes the proportion of the explained variation of the resulting attribute due to the factors considered in the regression R- factors. The influence of other factors not taken into account in the model is estimated as 1 - R 2 with the corresponding residual variance S 2 . With additional inclusion in the regression ( p+ 1) factor, the coefficient of determination should increase, and the residual variance should decrease:. Saturation of the model with unnecessary factors not only does not reduce the value of the residual variance and does not increase the determination index, but also leads to the statistical insignificance of the regression parameters according to the Student's t-test.

Thus, although theoretically the regression model allows you to take into account any number of factors, in practice this is not necessary. The selection of factors is based on a qualitative theoretical and economic analysis, which is usually carried out in two stages: at the first stage, factors are selected based on the nature of the problem; at the second stage, t-statistics for the regression parameters are determined based on the correlation indicators. Intercorrelation coefficients (i.e., correlations between explanatory variables) allow you to eliminate duplicative factors from the model. It is assumed that two variables are explicitly count linear, i.e., are linearly related to each other, if . If the factors are clearly collinear, then they duplicate each other and it is recommended to exclude one of them from the regression. In this case, preference is given not to the factor that is more closely related to the result, but to the factor that, with a sufficiently close connection with the result, has the least closeness of connection with other factors. This requirement reveals the specificity of multiple regression as a method of studying the complex impact of factors in conditions of their independence from each other. The greatest difficulties in using the multiple regression apparatus arise in the presence of multicollinearity factors, when more than two factors are interconnected by a linear relationship. The presence of factor multicollinearity may mean that some factors will always act in unison. As a result, the variation in the original data is no longer completely independent, and it is impossible to assess the impact of each factor separately. The stronger the multicollinearity of the factors, the less reliable is the estimate of the distribution of the sum of the explained variation over individual factors using the method of least squares (LSM). The inclusion of multicollinear factors in the model is undesirable due to the following consequences: 1. it is difficult to interpret the parameters of multiple regression as characteristics of the action of factors in a "pure" form, because the factors are correlated; linear regression parameters lose their economic meaning;2 parameter estimates are unreliable, exhibit large standard errors, and change with the volume of observations. To assess the multicollinearity of factors, one can use determinant of the matrix of paired correl coefficients tions between factors.

If the factors did not correlate with each other, then the matrix of pairwise correlation coefficients between the factors would be an identity matrix. For an equation that includes three explanatory variables: y = a + b 1 x 1 + b 2 + b 3 x 3 + e . The matrix of coefficients in the correlation of m / y factors would have a determinant equal to 1. Det = 1, because r x 1 x 1 =r x 2 x 2 =1 and r x 1 x 2 =r x 1 x 3 =r x 2 x 3 =0. If m / y factors exist a complete linear dependence and all correlation coefficients = 1, then the determinant of such a matrix = 0. The closer to zero the determinant of the interfactorial correlation matrix, the stronger the multicollinearity of the factors and the more unreliable the results of multiple regression. Conversely, the closer the determinant of the interfactorial correlation matrix is ​​to one, the lower the multicollinearity of the factors.

No. 12. WHAT DOES THE INTERACTION OF FACTORS MEAN AND HOW CAN IT BE PRESENTED GRAPHICALLY?

One of the ways to take into account the internal correlation of factors is the transition to combined regression equations, that is, to equations that reflect not only the influence of factors, but also their interaction. So, if y=f(x1,x2,x3) , then it is possible to construct the following combined equation: y = a + b 1 x 1 + b 2 x 2 + b 3 x 3 + b 12 x 1 x 2 + b 13 x 1 x 3 + b 23 x 2 x 3 + e . The equation under consideration includes a first-order interaction (the interaction of two factors). It is possible to include interactions of a higher order in the model if their statistical significance according to Fisher's F-test is proved. If the analysis of the combined equation showed the significance of only the interaction of the factors x 1 and x 3, then the equation will look like: y = a + b 1 x 1 + b 2 x 2 + b 3 x 3 + b 13 x 1 x 3 + e . The interaction of factors x 1 and x 3 means that at different levels of factor x 3 the influence of factor x 1 on at will not be the same, i.e. it depends on the values ​​of the factor x 3. On fig. the interaction of factors is represented by non-parallel lines of communication with the result y. And vice versa, parallel lines of influence of the factor x 1 on at at different levels of factor x 3 mean the absence of interaction between factors x 1 and x 3 . Charts:

but - x 1 affects y, moreover, this influence is the same both for x 3 \u003d B 1, and for x 3 \u003d B 2(the same slope of the regression lines), which means that there is no interaction between factors x 1 and x 3; b - with the growth of x 1, the effective sign y increases at x 3 \u003d B 1; with growth x 1 effective sign at decreases at x 3 = IN 2 .. Between x 1 and x 3 there is interaction. Combined regression equations are built, for example, when studying the effect on yield different types fertilizers. The solution to the problem of eliminating the multicollinearity of factors can also be helped by the transition to the equations of the reduced form. For this purpose, the considered factor is substituted into the regression equation through its expression from another equation.

No. 13. INTERPRETATION OF REGRESSION COEFFICIENTS OF LINEAR CONSUMPTION MODEL. MEANING OF THE SUM b i IN PRODUCTION FUNCTIONS AND THE VALUE OF THE SUM b i >1 . COEFFICIENTS USED TO ASSESS THE COMPARATIVE STRENGTH OF THE IMPACT OF FACTORS ON THE RESULT.

Consumption function: С=К*у+L, where С-consumption, y-income, К and L-function parameters. (у=С+I, I-size of investments). Suppose the consumption function is: C= 1,9 + 0,65 *y. The regression coefficient characterizes the propensity to consume. It shows that out of every thousand of income, an average of 650 rubles is spent on consumption, and 350 rubles. are invested. In production functions:

where R- quantity of product produced by T production factors (F 1 , F 2 ,..., Fm ); b - a parameter that is the elasticity of the quantity of output with respect to the quantity of the corresponding production factors.

Not only coefficients make economic sense b of each factor, but also their sum, i.e. the sum of elasticities: B= b 1 + b 2 +...+ b t. This value fixes the generalized characteristic of the elasticity of production.

In practical calculations, not always. It can be either more or less than unity. In this case, the value IN fixes an approximate estimate of the elasticity of output with an increase in each factor of production by 1% in an environment of increasing (IN> 1) or decreasing (IN < 1) отдачи на масштаб. Так, если P = 2.4* F * F 2 0.7 * F 3 0.2, then with an increase in the values ​​of each production factor by 1%, the overall output increases by approximately 1.2%.

No. 14. ASSIGNMENT OF A PARTIAL CORRELATION WHEN CONSTRUCTING A MULTIPLE REGRESSION MODEL. The ranking of the factors involved in multiple linear regression can be done through standardized regression coefficients, using partial correlation coefficients - for linear relationships. With a non-linear relationship of the features under study, this function is performed by partial determination indices. In addition, partial correlation indicators are widely used in solving the problem of selecting factors: the expediency of including one or another factor in the model is proved by the value of the partial correlation indicator.

Partial Coefficients (or Indices) of Correlation characterize the closeness of the relationship between the result and the corresponding factor when eliminating the influence of other factors included in the regression equation.

Partial correlation indicators are the ratio of the reduction in residual variance due to the additional inclusion of a new factor in the analysis to the residual variance that took place before its introduction into the model.

Partial correlation coefficients measuring the impact on y factor x i at a constant level of other factors can be determined by the formula:

With two factors and i=1, this formula will take the form:

Partial correlation coefficients vary from -1 to 1.

No. 15. PRIVATE F -CRITERION, ITS DIFFERENCE FROM SERIAL F -CRITERIA, COMMUNICATION BETWEEN THEM t - STUDENT CRITERIA FOR EVALUATION OF SIGNIFICANCE b i AND PRIVATE F -CRITERION .

Due to the correlation of m/y factors, the significance of the same factor m/b is different depending on the sequence of its introduction into the model. The measure for evaluating the inclusion of a factor in the model is the frequent F-test, i.e. F x i. In general, for the factor x i the frequent F-test is defined as:

If we consider the equation y = a + b 1 x 1 + b 2 + b 3 x 3 + e, then the F-criterion for an equation with one factor x 1 is determined sequentially, then the F-criterion for additional inclusion of the factor x 2 in the model, i.e., for the transition from a one-factor regression equation to a two-factor one, and, finally, an F-criterion for an additional inclusion of the factor x 3 in the model, i.e., an estimate of the significance of the factor x 3 is given after the inclusion of factors x 1 of them 2 in the model. In this case, the F-criterion for additional inclusion of factor x 2 after x 1 is consistent in contrast to the F-criterion for the additional inclusion in the model of the factor x 3 , which is private F-criterion, because it evaluates the significance of a factor on the assumption that it is included in the model last. It is the particular F-test that is associated with Student's t-test. A consistent F-test may be of interest to a researcher at the stage of model formation. For the equation y = a + b 1 x 1 + b 2 + b 3 x 3 + e assessment of the significance of regression coefficients b 1, b 2, b 3 involves the calculation of three interfactorial coefficients of determination, namely:

Based on the ratio b i and we get:

#16 BACKGROUND OLS.

When estimating the parameters of the regression equation, LSM is used. In this case, certain prerequisites are made regarding the component , which is an unobservable quantity.

Residue studies - involve checking the presence of the following five OLS prerequisites: 1.random nature of residues; 2.zero average value of residuals, independent of x i ;

3.homoscedasticity-dispersion of each deviation , is the same for all values X; 4. lack of autocorrelation of residuals. Residual values , distributed independently of each other; 5.residuals follow a normal distribution.

1. The random nature of the residuals is checked , for this purpose, a graph of the dependence of the residuals is constructed from the theoretical values ​​of the effective feature. If a horizontal band is obtained on the chart, then the residuals , represent random variables and LSM is justified, the theoretical values y x approximate the actual y values ​​well. In other cases, you must either apply a different function, or enter additional information and rebuild the regression equation until the residuals , will not be random variables.

2. The second premise of the LSM regarding the zero mean of the residuals means that (y - y x)= 0. This is feasible for linear models and models that are non-linear with respect to the included variables. For this purpose, along with the above graph of the dependence of the residuals from the theoretical values ​​of the effective feature y x plotting random residuals on the factors included in the x i regression. If the residuals on the chart are arranged in the form of a horizontal bar, then they are independent of the values ​​of x j . If the graph shows the presence of dependence and x j then the model is inadequate. Reasons for inadequacy may be different.

3. According to the third premise of the least squares, the variance of the residuals is required to be homoscedastic. This means that for each value of the factor x j leftovers , have the same variance. If this condition for applying the LSM is not met, then heteroscedasticity occurs. The presence of heteroscedasticity can be clearly seen from the correlation field. Homoscedasticity of the residuals means that the variance of the residuals - same for every value X .

4. Lack of autocorrelation of residuals, i.e. residual values distributed independently of each other. Autocorrelation of residuals means the presence of a correlation between the residuals of current and previous (subsequent) observations. The absence of autocorrelation of the residuals ensures the consistency and efficiency of the estimates of the regression coefficients.

No. 17. THE ESSENCE OF RESIDUAL ANALYSIS IN THE PRESENCE OF A REGRESSION MODEL. HOW IT IS POSSIBLE TO CHECK THE PRESENCE OF HOMO- OR HETEROSKEDASTICITY OF RESIDUES. EVALUATION OF THE ABSENCE OF AUTOCORRELATION OF REMAINS WHEN CONSTRUCTING A STATISTICAL REGRESSION MODEL.

For this purpose, a graph of the dependence of the residuals e i from the theoretical values ​​of the effective feature:

If a horizontal band is obtained on the chart, then the residuals e i are random variables and the least squares is justified, the theoretical values y x approximate the actual values ​​well y.

The following cases are possible: if e i depends on at x , then: 1.remains e i not random.2. leftovers e i, do not have constant dispersion. 3. Leftovers e i are systematic in this case, negative values e i, correspond to low values y x, and positive - high values. In these cases, you must either use another function or enter additional information.

How can one test for the presence of homo- or heteroscedasticity residues? Homoscedasticity of the residuals means that the variance of the residuals e i the same for every value X. If this condition for applying the LSM is not met, then heteroscedasticity occurs. The presence of heteroscedasticity can be clearly seen from the correlation field. but- the variance of the residuals increases as X; b - the variance of the residuals reaches its maximum value at the average values ​​of the variable X and decreases at the minimum and maximum values X; in - maximum variance of residuals at

small values X and the variance of the residuals is uniform as the values ​​increase X. Homo- and hetero-tee charts.

Estimating the lack of autocorrelation of residuals(i.e. residual values e i distributed independently). Autocorrelation of residuals means the presence of a correlation between the residuals of current and previous (subsequent) observations. Correlation coefficient between e i And e j, where e i- residuals of current observations, e j- the residuals of previous observations, can be determined by the usual formula of the linear correlation coefficient . If this coefficient turns out to be significantly different from zero, then the residuals are autocorrelated and the probability density function F( e) depends j-th point observation and on the distribution of residual values ​​at other observation points. For regression models on static information, the autocorrelation of the residuals can be calculated if the observations are ordered by the factor X. The absence of autocorrelation of the residuals ensures the consistency and efficiency of the estimates of the regression coefficients. It is especially important to comply with this LSM premise when constructing regression models for time series, where, due to the presence of a trend, subsequent levels of the time series, as a rule, depend on their previous levels.

#18 THE MEANING OF GENERALIZED LSM .

In case of violation of homoscedasticity and the presence of autocorrelation of errors, it is recommended to replace the traditional LSM generalized method. The generalized least squares method is applied to the transformed data and makes it possible to obtain estimates that are not only unbiased, but also have smaller sample variances. Generalized least squares for heterosity correction. In general, for the equation y i =a+bx i + e i at where K i is the coefficient of proportion. The model will take the form: y i =+x i + e i . In it, the residuals are heteroscedastic. Assuming the absence of autocorrelation in them, we can pass to the equation with homoscedastic residuals by dividing all the variables recorded during the i-th observation by . Then the variance of the residuals will be a constant value. From regressing y with respect to x, we move on to regression on new variables: y / And X/. The regression equation will take the form: . In relation to conventional regression, the equation with new, transformed variables is weighted regression, in which the variables at And X taken with weights. The regression coefficient b can be defined as b is a weighted value with respect to the usual least squares with weights 1/K. A similar approach is possible not only for the pair equation, but also for multiple regression. The model will take the form: . The model with transformed variables will be

This equation does not contain a free term, using the usual LSM we get: The use of the generalized LSM in this case leads to the fact that observations with smaller values ​​of the transformed variables x/K have relatively more weight in determining the regression parameters than with the original variables.

No. 19. SYSTEMS OF ECONOMETRIC EQUATIONS. PROBLEM OF IDENTIFICATION.

Complex economic processes are described using a system of interrelated equations. There are several types of systems of equations: 1. System of independent equations - when each dependent variable at considered as a function of the same set of factors X :

y 1 = a 11 * x 1 + a 12 * x 2 +…+ a 1 m * x m + e 1 To solve this system and find its parameters

y n =a n1 *x 1 +a n2 *x 2 +…+a nm *x m +e n MNC is used.

2. System of recursive equations - when the dependent variable at one equation acts as a factor X in another equation:

y 1 =a 11 *x 1 +a 12 *x 2 +…+a 1m *x m +e 1

y 2 =b 21 *y 1 +a 21 *x 1 +a 22 *x 2 +…+a 2m *x m +e 2

y 3 =b 31 *y 1 +b 32 *y 2 +a 31 *x 1 +a 32 *x 2 +…+a 3m *x m +e 3

The least squares method is used to solve this system and find its parameters.

3 A system of interrelated equations - when the same dependent variables in some equations are on the left side, and in others - on the right.

y 1= b 12 *y 2 +b 13 *y 3 +…+b 1n *y n +a 11 *x 1 +a 12 *x 2 +…+a 1m *x m +e 1

y 2 =b 21 *y 1 +b 23 *y 3 +…+b 2n *y n +a 21 *x 1 +a 22 *x 2 +…+a 2m *x m +e 2

y n =b n1 *y 1 +b n2 *y 2 +…+b nn-1 *y n-1 +a n1 *x 1 +a n2 *x 2 +…+a nm *x m +e n

Such a system of equations is called the structural form of the model. Endogenous variables are interrelated variables that are determined within the model (system) y. Exogenous variables are independent variables that are determined outside the system x. Predefined variables are exogenous and lag (for previous time points) endogenous variables of the system. The coefficients a and b for variables are the structural coefficients of the model. The system of linear functions of endogenous variables from all predefined variables of the system is a reduced form of the model.

Where are the coefficients of the reduced form of the model.

A necessary condition for identification is the fulfillment of the counting rule:

D+1=H – the equation is identifiable;

D+1

D+1>H – the equation is over-identifiable.

Where H is the number of endogenous variables in the equation, D is the number of predefined variables that are not in the equation but are present in the system.

A sufficient condition for identification is the determinant of a matrix composed of coefficients for variables that are absent in the equation under study for equal to zero and the rank of this matrix is ​​not less than endogenous variables without unity. To solve an identified equation, QLS is used, to solve over-identified equations, a two-step least squares method is used.

№20 KMNK . It is used in the case of an accurately identified model. The procedure for applying QLS involves the following steps: 1. Compose the reduced form of the model and determine the numerical values ​​of the parameters for each of its equations by conventional LLS. 2. By means of algebraic transformations, they pass from the reduced form to the equations of the structural form of the model, thereby obtaining numerical estimates of the structural parameters.

№21 TWO-STEP LSM. (D MNK)

The main idea of ​​DMNC is to obtain theoretical values ​​of endogenous variables contained in the right side of the equation for the overidentified equation based on the reduced form of the model. Further, by substituting them for the actual values, one can apply the usual least squares method to the structural form of the overidentified equation. The method is called the two-step LSM, because the LSM is used twice: at the first step, when determining the reduced form of the model and finding, on its basis, estimates of the theoretical values ​​of the endogenous variable

and at the second step in relation to the structural over-identified equation when determining the structural coefficients of the model according to the theoretical (calculated) values ​​of endogenous variables.

Overidentified structural model can be of two types:

All equations of the system are overidentifiable;

The system contains, along with over-identified precisely
identifiable equations.

If all equations of the system are over-identifiable, then LSLS is used to estimate the structural coefficients of each equation. If the system has exactly identifiable equations, then the structural coefficients for them are found from the system of reduced equations.

Let us apply DMNC to the simplest overidentifiable

This model can be derived from a previous identifiable model:

if we impose restrictions on its parameters, namely: b 12 =a 11

As a result, the first equation became over-identifiable: H= 1 (at 1),

D = 1(x 2) And D+1 > H. The second equation has not changed and is exactly identifiable: H = 2 And D =1

At the first step, we find the reduced form of the model, and

DMNC is the most general and widespread method for solving a system of simultaneous equations.

Despite the importance of the system of econometric equations, in practice some relationships are often not taken into account, the application of traditional least squares to one or more equations is also widespread in econometrics. In particular, when constructing production functions, demand analysis can be carried out using the usual least squares method.

№22 MAIN ELEMENTS OF THE TIME SERIES.

time series- is a set of values ​​of any indicator for several consecutive moments or periods of time. Each level of the time series is formed under the influence of a large number of factors, which can be conditionally divided into three groups:

Factors that shape the trend of the series;

Factors that form the cyclic fluctuations of the series;

random factors.

With various combinations of these factors in the phenomenon or process under study, the dependence of the levels of the series on time can take various forms. Firstly, most time series of economic indicators have a trend that characterizes the cumulative long-term impact of many factors on the dynamics of the indicator under study. It is obvious that these factors, taken separately, can have a multidirectional effect on the studied indicator. However, together they form its increasing or decreasing trend. Fig1

Secondly, the indicator under study may be subject to cyclical fluctuations. These fluctuations can be seasonal in nature, since the economic activity of a number of sectors of the economy depends on the time of year. Fig3

In most cases, the actual level of a time series can be represented as the sum or product of the trend, cycle, and random components. A model in which the time series is presented as the sum of the listed components is called additive model time series. A model in which the time series is represented as a product of the listed components is called multiplicative model time series. The main task of the econometric study of a separate time series is to identify and quantify each of the above components in order to use the information obtained to predict future values ​​of the series or to build models of the relationship between two or more time series.

No. 23. AUTOCORRELATION OF TIME SERIES LEVELS

The correlation dependence between successive levels of the time series is called autocorrelation of the levels of the series. It can be quantitatively measured using a linear correlation coefficient between the levels of the original time series and the levels of this series, shifted by several steps in time. The correlation coefficient has the form:

it is possible to determine the autocorrelation coefficients of the second and higher orders. Thus, the second-order autocorrelation coefficient characterizes the tightness of the relationship between the levels at t and y t -1 and is determined by the formula:

The number of periods over which the autocorrelation coefficient is calculated is called lag. As the lag increases, the number of pairs of values ​​used to calculate the autocorrelation coefficient decreases.

We note two important coefficient properties autocorrelation. Firstly, it is constructed by analogy with the linear correlation coefficient and thus characterizes the tightness of only a linear relationship between the current and previous levels of the series.

Secondly, by the sign of the autocorrelation coefficient, it is impossible to draw a conclusion about an increasing or decreasing trend in the levels of the series.

The sequence of autocorrelation coefficients of levels of the first, second, etc. orders is called autocorrelation function of the time series. The graph of the dependence of its values ​​on the magnitude of the lag is called correlogram.

No. 24. TIME SERIES TREND MODELING (TIME SERIES ANALYTICAL ALIGNMENT)

One of the most common ways to model the trend of a time series is to build an analytical function that characterizes the dependence of the levels of the series on time, or trend. This method is called analytical you alignment of the time series.

Since time dependence can take many forms, various kinds of functions can be used to formalize it. The following functions are most often used to build trends:

Linear trend:

Hyperbole: ,

Exponential trend:

Power function trend:

Parabola of the second and higher orders:

The parameters of each of the above listed trends can be determined by conventional least squares using the time t=1,2,..., n as an independent variable, and the actual levels of the time series y t as the dependent variable. . There are several ways to determine the type of trend. The most common methods include a qualitative analysis of the process under study, the construction and visual analysis of a graph of the dependence of the levels of a series on time, and the calculation of some basic indicators of dynamics. For the same purposes, the autocorrelation coefficients of the levels of the series can also be used. The type of trend can be determined by comparing the first-order autocorrelation coefficients calculated from the original and transformed levels of the series. If the time series has a linear trend, then its neighboring levels at t And at t -1 are closely correlated. In this case, the first-order autocorrelation coefficient of the levels of the original series should be high. If the time series contains a non-linear trend, for example, in the form of an exponential, then the first-order autocorrelation coefficient from the logarithms of the levels of the original series will be higher than the corresponding coefficient calculated from the levels of the series. The more pronounced the non-linear trend in the time series under study, the more the values ​​of the indicated coefficients will differ.

The choice of the best equation in case the series contains a non-linear trend can be done by enumeration of the main trend forms, calculation of the adjusted coefficient of determination for each equation R 2 and selecting the trend equation with the maximum value of the adjusted determination coefficient.

No;25. METHODS FOR EXCLUDING TRENDS. METHOD OF DEVIATIONS FROM TREND.

The essence of all trend elimination methods is to eliminate or fix the influence of the time factor on the formation of the levels of the series. Basic Methods Trends can be divided into two groups:

Methods based on the conversion of the levels of the original
series into new variables that do not contain trends. The resulting variables are used further to analyze the relationship between the studied time series. These methods involve the direct elimination of the trend component T from each level of the time series. The two main methods in
this group is the method of successive differences and
trend deviation method;

Methods based on the study of the relationship between the initial
time series levels in the elimination of exposure
time factor on the dependent and independent variables
models. First of all, this is a method of including the time factor in the regression model over time series.
Let us consider in more detail the method of application, the advantages and disadvantages of each of the above methods. Trend Deviation Method

Let there be two time series x t and y t each containing a trend component T and a random component e. Analytical alignment for each of these series allows us to find the parameters of the corresponding trend equations and determine the levels calculated according to the trend, respectively. These calculated values ​​can be taken as an estimate of the trend component T of each series. Therefore, the influence of the trend can be eliminated by subtracting the calculated values ​​of the levels of the series from the actual ones. This procedure is done for each time series in the model. Further analysis of the relationship of the series is carried out using not the initial levels, but deviations from the trend and provided that the latter do not contain a trend.

No. 26. METHOD OF SERIAL DIFFERENCES .

In some cases, instead of analytical alignment of the time series in order to eliminate the trend, you can apply a simpler method - the method of successive differences.

If the time series contains a pronounced linear trend, it can be eliminated by replacing the initial levels of the series with chain absolute increments (first differences).

Let (1) ;

Then (6.3) Then

Coefficient b is a constant that does not depend on time.

If the time series contains a trend in the form of a second-order parabola, then to eliminate it, you can replace the original levels of the series with the second differences.

Let relation (1) hold, but

Then

As this relation shows, the first differences ∆ t directly depend on the time factor t and therefore contain a trend.

Let's define the second differences:

Obviously, the second differences ∆ t 2 do not contain a trend, therefore, if there is a trend in the initial levels in the form of a second-order parabola, they can be used for further analysis. If the trend of the time series corresponds to an exponential or power trend, the method of successive differences should be applied not to the initial levels of the series, but to their logarithms.

No. 27. INCLUDING THE TIME FACTOR INTO THE REGRESSION MODEL.

In correlation-regression analysis, the influence of any factor can be eliminated if the influence of this factor on the result and other factors included in the model is fixed. This technique is widely used in time series analysis, when the trend is fixed by including the time factor in the model as an independent variable.

A view model refers to a group of models that include a time factor. Obviously, the number of independent variables in such a model can be greater than one. In addition, it can be not only current, but also lag values ​​of the independent variable, as well as lag values ​​of the resulting variable. The advantage of this model in comparison with the methods of deviations from trends and successive differences is that it allows you to take into account all the information contained in the original data, since the values ​​of y t and X t there are levels of the original time series. In addition, the model is built on the entire set of data for the period under consideration, in contrast to the method of successive differences, which leads to the loss of the number of observations. Parameters but And b models with the inclusion of the time factor are determined by the usual least squares.

The system of normal equations has the form:

#28 AUTOCORRELATION IN REMAINS. DURBIN-WATSON CRITERION.

There are two most common methods for determining the autocorrelation of residuals. The first method is construction plot of residuals versus time and visual determination presence or absence autocorrelation. Second method - use determination of the Durbin-Watson criterion and calculation of the value

Thus, d is the ratio of the sum of squared differences of successive residual values ​​to the residual sum of squares according to the regression model. It can be assumed that: , suppose also

The residual autocorrelation coefficient is defined as

Taking into account (3), we have:

Thus, if there is complete positive autocorrelation in the residuals and , then d = 0. If there is complete negative autocorrelation in the residuals, then and, therefore, d = 4. If there is no autocorrelation of residuals, then d= 2. Therefore, 0≤d≤4

The algorithm for detecting autocorrelation of residuals based on the Durbin-Watson test is as follows. The hypothesis H 0 about the absence of autocorrelation of residuals is put forward. Alternative hypotheses H 1 H 1 * consist, respectively, in the presence of positive or negative autocorrelation in the residuals. Further, according to special tables, the critical values ​​​​of the Durbin-Watson criterion are determined d l And d u for a given number of observations n, the number of independent model variables to and significance level α. According to these values, the numerical interval is divided into five segments. If the actual value of the Durbin-Watson criterion falls into the zone of uncertainty, then in practice the existence of autocorrelation of the residuals is assumed and the hypothesis H o is rejected.

№29 . GENERAL CHARACTERISTICS OF MODELS WITH DISTRIBUTED LAG. INTERPRETATION OF PARAMETERS OF MODELS WITH DISTRIBUTED LAG .

The value L, which characterizes the delay in the impact of a factor on the result, is called in econometrics lagom, and the time series of the factor variables themselves, shifted by one or more points in time, - lag variables.

Econometric modeling is carried out using models containing not only current, but also lag values ​​of factor variables. These models are called models with distributed lag. View Model

is an example of a distributed lag model.

Along with the lag values ​​of independent, or factorial, variables, the value of the dependent variable of the current period can be influenced by its values ​​in past moments or periods of time. These processes are usually described using regression models containing lag values ​​of the dependent variable as factors, which are called autoregressive models. View Model

refers to autoregressive models. Building models with a distributed lag and autoregressive models has its own specifics. Firstly, estimating the parameters of autoregressive models, and in most cases even models with a distributed lag, cannot be done using conventional least squares due to a violation of its prerequisites and requires special statistical methods. Secondly, researchers have to solve the problems of choosing the optimal lag value and determining its structure. Finally, third, there is a certain relationship between distributed lag models and autoregressive models, and in some cases it is necessary to move from one type of model to another. Interpretation of the parameters of models with distribution lag. Consider a model with a distributed lag in its general form, assuming that the maximum lag value is finite:

This model says that if at some point in time t there is a change in the independent variable X, then this change will affect the values ​​of the variable at during l following time moments.

The regression coefficient b 0 with the variable x t characterizes the average absolute change at t when it changes X t for 1 unit its measurement at some fixed point in time t , without taking into account the impact of the lag values ​​of the factor x. This ratio is called short term multiplier.

At the moment (t + 1) the total impact of the factor variable x t on the result y t will be (b 0 + b 1) arb. units, at the moment (t + 2) this effect can be characterized by the sum ( b 0 + b 1 + b 2 ) etc. The sums obtained in this way are called intermediate multipliers.

Let us introduce the following notation:

b 0 +b 1 +…+b l =b

the value b called long term multiplier. It shows the absolute change in the long run t + l result at influenced by a change of 1 unit. factor a X.

Suppose

ß j \u003d b j / b, j \u003d 0: 1

We call the obtained quantities relative coefficients tami models with distributed lag. Wednesday ny lag is determined by the weighted arithmetic mean formula: and represents the average period during which the result will change under the influence of a factor change at a point in time t . A small value of the average lag indicates a relatively quick response of the result to a change in the factor, while its high value indicates that the effect of the factor on the result will be felt over a long period of time. Median lag is the lag value for which

This is the period of time during which, from the time t half of the total impact of the factor on the result will be realized.

No. 30 ALMON'S METHOD.

The A. method assumes that the weights of the current lag values ​​of the explanatory variables are subject to a palenial distribution. b j = c 0 +c 1 j+ c 2 j 2 +…+ c k j k

The regression equation will take the form y t = a+c 0 z 0 +c 1 z 1 + c 2 z 2 + c k z k +ε t , where z i =; i=1,…,k; j=1,…,p. The calculation of the parameters of the model with a distributed lag is carried out according to the following scheme:

1. Maxi is installed. the value of the lag l.

2. The degree of palenoma k, which describes the structure of the lag, is determined.

3. The value of variables from z 0 to z k is calculated.

4. The parameters of the linear regression equation y t (z i) are determined.

5. The parameters of the original model with a distributed lag are calculated.

No. 31 BED METHOD.

The Koik distribution assumes that the coefficients at lag values ​​of the explanatory variable decrease exponentially. b l = b 0 λ l ; l=0.1.2.3; 0 ≤ λ ≤ 1. The regression equation is transformed to the form:

y t =a+b 0 x t +b 0 λx t -1 +b 0 λ 2 x t -2 +…+ ε t . After simple transformations, we obtain an equation for estimating the parameters of the outgoing equation.

№ 32 THE METHOD OF THE MAIN COMPONENTS.

The essence of the method is to reduce the number of explanatory variables to the most significantly influencing factors. Principal Component Analysis is used to eliminate or reduce multicollinearity in explanatory regression variables. The main idea is to reduce the number of explanatory variables to the most significantly influencing factors. This is achieved by linear transformation of all explanatory variables x i (i=0,..,n) into new variables, the so-called principal components. In this case, it is required that the selection of the first principal component corresponds to the maximum of the total variance of all explanatory variables x i (i=0,..,n). The second component is the maximum of the remaining variance, after the influence of the first principal component is eliminated, etc.

Models containing lag values ​​as factors. dependent variable are called autoregressive models. eg y t =a+b 0 x t +c 1 y t -1 + ε t . As in the model with a distributed lag, b 0 in this model also characterizes short-term changes in y t under the influence of a change in x 1 by 1 unit. The long-term multiplier in the autoregressive model is calculated as the sum of the short-term and intermediate multipliers b = b 0 +b 0 c 1 +b 0 c 1 2 +b 0 c 1 3 +…=b 0 (1+c 1 +c 1 2 +c 1 3 +…)=b 0 /1-c 1

Note that such an interpretation of the coefficients of the autoregressive model and the calculation of the long-term multiplier are based on the assumption that there is an infinite lag in the impact of the current value. dependent variable to its future value.

One of the possible methods for calculating the parameters of the autoregressive equation is method of instrumental variables. The essence of this method is to replace the variable from the right side of the model, for which the LSM assumptions are violated, with a new variable, the inclusion of which in the regression model does not lead to the violation of its assumptions. With regard to autoregressive models, it is necessary to remove the variable y t -1 from the right side of the model. The desired new variable that will be introduced into the model instead of y t -1 b must have two properties. First, it must be closely correlated with y t -1 b secondly, it should not correlate with the residuals u r .

Another method that can be applied to estimate the parameters of autoregressive models of the type is the maximum likelihood method

#34 VERIFICATION OF STATISTICAL HYPOTHESES. ??????????????????????

№ 35 MOVING (MOVING) AVERAGE METHOD.

The simple moving average method. consists in the fact that the calculation of the indicator for the predicted point in time is built by averaging the value of this indicator for several previous points in time.

where x k - i is the real value. indicator at time t n -1.

n is the number of previous time points used in the calculation.

f k is the forecast at time t k .

№ 36 EXPONENTIAL SMOOTHING METHOD.

Deviations of the previous forecast from the real indicator are taken into account, and the calculation itself is carried out according to the next. formula:

where x k -1 is the actual value of the indicator at time t k -1 .

f k is the forecast at time t k .

α is constant smoothing.

Note: the value of α obeys the condition 0‹ α ‹ 1, determines the degree of smoothing, and is usually chosen by a universal trial and error method.

№ 37 TREND PROJECTION METHOD.

The main idea of ​​the linear trend projection method is to construct a straight line, which, on average, least deviates from the array of points given by the time series. The straight line is sought in the form: x = at + b (a and b are constants). The values ​​a and b are satisfactory. following linear system:

No. 38. CASUAL FORECASTING METHODS. QUALITATIVE FORECASTING METHODS. ????????????????