x values and the y values are [latex]\displaystyle\overline{{x}}[/latex] and [latex]\overline{{y}}[/latex]. To graph the best-fit line, press the "\(Y =\)" key and type the equation \(-173.5 + 4.83X\) into equation Y1. At RegEq: press VARS and arrow over to Y-VARS. The third exam score, x, is the independent variable and the final exam score, y, is the dependent variable. When \(r\) is positive, the \(x\) and \(y\) will tend to increase and decrease together. A regression line, or a line of best fit, can be drawn on a scatter plot and used to predict outcomes for the \(x\) and \(y\) variables in a given data set or sample data. Graphing the Scatterplot and Regression Line The standard error of. <>
Make sure you have done the scatter plot. So, if the slope is 3, then as X increases by 1, Y increases by 1 X 3 = 3. For one-point calibration, one cannot be sure that if it has a zero intercept. In simple words, "Regression shows a line or curve that passes through all the datapoints on target-predictor graph in such a way that the vertical distance between the datapoints and the regression line is minimum." The distance between datapoints and line tells whether a model has captured a strong relationship or not. argue that in the case of simple linear regression, the least squares line always passes through the point (mean(x), mean . Simple linear regression model equation - Simple linear regression formula y is the predicted value of the dependent variable (y) for any given value of the . Use your calculator to find the least squares regression line and predict the maximum dive time for 110 feet. It is used to solve problems and to understand the world around us. Regression In we saw that if the scatterplot of Y versus X is football-shaped, it can be summarized well by five numbers: the mean of X, the mean of Y, the standard deviations SD X and SD Y, and the correlation coefficient r XY.Such scatterplots also can be summarized by the regression line, which is introduced in this chapter. I notice some brands of spectrometer produce a calibration curve as y = bx without y-intercept. If you know a person's pinky (smallest) finger length, do you think you could predict that person's height? During the process of finding the relation between two variables, the trend of outcomes are estimated quantitatively. ). Make your graph big enough and use a ruler. Brandon Sharber Almost no ads and it's so easy to use. If you are redistributing all or part of this book in a print format, Linear Regression Formula For each data point, you can calculate the residuals or errors, \(y_{i} - \hat{y}_{i} = \varepsilon_{i}\) for \(i = 1, 2, 3, , 11\). and you must attribute OpenStax. Interpretation: For a one-point increase in the score on the third exam, the final exam score increases by 4.83 points, on average. As an Amazon Associate we earn from qualifying purchases. Instructions to use the TI-83, TI-83+, and TI-84+ calculators to find the best-fit line and create a scatterplot are shown at the end of this section. SCUBA divers have maximum dive times they cannot exceed when going to different depths. %
1999-2023, Rice University. The absolute value of a residual measures the vertical distance between the actual value of \(y\) and the estimated value of \(y\). 6 cm B 8 cm 16 cm CM then D Minimum. For Mark: it does not matter which symbol you highlight. My problem: The point $(\\bar x, \\bar y)$ is the center of mass for the collection of points in Exercise 7. The regression problem comes down to determining which straight line would best represent the data in Figure 13.8. The[latex]\displaystyle\hat{{y}}[/latex] is read y hat and is theestimated value of y. It's also known as fitting a model without an intercept (e.g., the intercept-free linear model y=bx is equivalent to the model y=a+bx with a=0). Why dont you allow the intercept float naturally based on the best fit data? For the example about the third exam scores and the final exam scores for the 11 statistics students, there are 11 data points. Therefore the critical range R = 1.96 x SQRT(2) x sigma or 2.77 x sgima which is the maximum bound of variation with 95% confidence. The third exam score,x, is the independent variable and the final exam score, y, is the dependent variable. Answer is 137.1 (in thousands of $) . column by column; for example. (Be careful to select LinRegTTest, as some calculators may also have a different item called LinRegTInt. In the STAT list editor, enter the \(X\) data in list L1 and the Y data in list L2, paired so that the corresponding (\(x,y\)) values are next to each other in the lists. M = slope (rise/run). It is not generally equal to \(y\) from data. intercept for the centered data has to be zero. If \(r = 1\), there is perfect positive correlation. Sorry to bother you so many times. Hence, this linear regression can be allowed to pass through the origin. The regression line (found with these formulas) minimizes the sum of the squares . Scroll down to find the values \(a = -173.513\), and \(b = 4.8273\); the equation of the best fit line is \(\hat{y} = -173.51 + 4.83x\). JZJ@` 3@-;2^X=r}]!X%" If r = 1, there is perfect negativecorrelation. In other words, it measures the vertical distance between the actual data point and the predicted point on the line. Interpretation of the Slope: The slope of the best-fit line tells us how the dependent variable (y) changes for every one unit increase in the independent (x) variable, on average. As you can see, there is exactly one straight line that passes through the two data points. B Positive. \(r\) is the correlation coefficient, which is discussed in the next section. Why the least squares regression line has to pass through XBAR, YBAR (created 2010-10-01). 0 <, https://openstax.org/books/introductory-statistics/pages/1-introduction, https://openstax.org/books/introductory-statistics/pages/12-3-the-regression-equation, Creative Commons Attribution 4.0 International License, In the STAT list editor, enter the X data in list L1 and the Y data in list L2, paired so that the corresponding (, On the STAT TESTS menu, scroll down with the cursor to select the LinRegTTest. Here the point lies above the line and the residual is positive. This site uses Akismet to reduce spam. Graphing the Scatterplot and Regression Line, Another way to graph the line after you create a scatter plot is to use LinRegTTest. If the scatter plot indicates that there is a linear relationship between the variables, then it is reasonable to use a best fit line to make predictions for \(y\) given \(x\) within the domain of \(x\)-values in the sample data, but not necessarily for x-values outside that domain. points get very little weight in the weighted average. For one-point calibration, it is indeed used for concentration determination in Chinese Pharmacopoeia. T Which of the following is a nonlinear regression model? In regression, the explanatory variable is always x and the response variable is always y. The value of \(r\) is always between 1 and +1: 1 . An issue came up about whether the least squares regression line has to
That means that if you graphed the equation -2.2923x + 4624.4, the line would be a rough approximation for your data. Equation\ref{SSE} is called the Sum of Squared Errors (SSE). (mean of x,0) C. (mean of X, mean of Y) d. (mean of Y, 0) 24. The given regression line of y on x is ; y = kx + 4 . It also turns out that the slope of the regression line can be written as . The least-squares regression line equation is y = mx + b, where m is the slope, which is equal to (Nsum (xy) - sum (x)sum (y))/ (Nsum (x^2) - (sum x)^2), and b is the y-intercept, which is. For each data point, you can calculate the residuals or errors, For Mark: it does not matter which symbol you highlight. In general, the data are scattered around the regression line. Data rarely fit a straight line exactly. There are several ways to find a regression line, but usually the least-squares regression line is used because it creates a uniform line. In the diagram above,[latex]\displaystyle{y}_{0}-\hat{y}_{0}={\epsilon}_{0}[/latex] is the residual for the point shown. This best fit line is called the least-squares regression line. It is like an average of where all the points align. After going through sample preparation procedure and instrumental analysis, the instrument response of this standard solution = R1 and the instrument repeatability standard uncertainty expressed as standard deviation = u1, Let the instrument response for the analyzed sample = R2 and the repeatability standard uncertainty = u2. This model is sometimes used when researchers know that the response variable must . It is: y = 2.01467487 * x - 3.9057602. This type of model takes on the following form: y = 1x. In the situation(3) of multi-point calibration(ordinary linear regressoin), we have a equation to calculate the uncertainty, as in your blog(Linear regression for calibration Part 1). The two items at the bottom are r2 = 0.43969 and r = 0.663. Besides looking at the scatter plot and seeing that a line seems reasonable, how can you tell if the line is a good predictor? Line Of Best Fit: A line of best fit is a straight line drawn through the center of a group of data points plotted on a scatter plot. This is called aLine of Best Fit or Least-Squares Line. Legal. For now we will focus on a few items from the output, and will return later to the other items. The second line saysy = a + bx. This is called theSum of Squared Errors (SSE). Therefore R = 2.46 x MR(bar). [Hint: Use a cha. In other words, there is insufficient evidence to claim that the intercept differs from zero more than can be accounted for by the analytical errors. Equation of least-squares regression line y = a + bx y : predicted y value b: slope a: y-intercept r: correlation sy: standard deviation of the response variable y sx: standard deviation of the explanatory variable x Once we know b, the slope, we can calculate a, the y-intercept: a = y - bx (This is seen as the scattering of the points about the line. 1 0 obj
Using the training data, a regression line is obtained which will give minimum error. 35 In the regression equation Y = a +bX, a is called: A X . Could you please tell if theres any difference in uncertainty evaluation in the situations below: The idea behind finding the best-fit line is based on the assumption that the data are scattered about a straight line. [latex]\displaystyle{a}=\overline{y}-{b}\overline{{x}}[/latex]. are licensed under a, Definitions of Statistics, Probability, and Key Terms, Data, Sampling, and Variation in Data and Sampling, Frequency, Frequency Tables, and Levels of Measurement, Stem-and-Leaf Graphs (Stemplots), Line Graphs, and Bar Graphs, Histograms, Frequency Polygons, and Time Series Graphs, Independent and Mutually Exclusive Events, Probability Distribution Function (PDF) for a Discrete Random Variable, Mean or Expected Value and Standard Deviation, Discrete Distribution (Playing Card Experiment), Discrete Distribution (Lucky Dice Experiment), The Central Limit Theorem for Sample Means (Averages), A Single Population Mean using the Normal Distribution, A Single Population Mean using the Student t Distribution, Outcomes and the Type I and Type II Errors, Distribution Needed for Hypothesis Testing, Rare Events, the Sample, Decision and Conclusion, Additional Information and Full Hypothesis Test Examples, Hypothesis Testing of a Single Mean and Single Proportion, Two Population Means with Unknown Standard Deviations, Two Population Means with Known Standard Deviations, Comparing Two Independent Population Proportions, Hypothesis Testing for Two Means and Two Proportions, Testing the Significance of the Correlation Coefficient, Mathematical Phrases, Symbols, and Formulas, Notes for the TI-83, 83+, 84, 84+ Calculators. What the SIGN of r tells us: A positive value of r means that when x increases, y tends to increase and when x decreases, y tends to decrease (positive correlation). If you suspect a linear relationship between \(x\) and \(y\), then \(r\) can measure how strong the linear relationship is. The regression line is calculated as follows: Substituting 20 for the value of x in the formula, = a + bx = 69.7 + (1.13) (20) = 92.3 The performance rating for a technician with 20 years of experience is estimated to be 92.3. In measurable displaying, regression examination is a bunch of factual cycles for assessing the connections between a reliant variable and at least one free factor. It has an interpretation in the context of the data: The line of best fit is[latex]\displaystyle\hat{{y}}=-{173.51}+{4.83}{x}[/latex], The correlation coefficient isr = 0.6631The coefficient of determination is r2 = 0.66312 = 0.4397, Interpretation of r2 in the context of this example: Approximately 44% of the variation (0.4397 is approximately 0.44) in the final-exam grades can be explained by the variation in the grades on the third exam, using the best-fit regression line. x\ms|$[|x3u!HI7H& 2N'cE"wW^w|bsf_f~}8}~?kU*}{d7>~?fz]QVEgE5KjP5B>}`o~v~!f?o>Hc# SCUBA divers have maximum dive times they cannot exceed when going to different depths. The tests are normed to have a mean of 50 and standard deviation of 10. Chapter 5. The variable \(r\) has to be between 1 and +1. ;{tw{`,;c,Xvir\:iZ@bqkBJYSw&!t;Z@D7'ztLC7_g The independent variable, \(x\), is pinky finger length and the dependent variable, \(y\), is height. Another approach is to evaluate any significant difference between the standard deviation of the slope for y = a + bx and that of the slope for y = bx when a = 0 by a F-test. We also acknowledge previous National Science Foundation support under grant numbers 1246120, 1525057, and 1413739. In the equation for a line, Y = the vertical value. The third exam score, x, is the independent variable and the final exam score, y, is the dependent variable. every point in the given data set. Any other line you might choose would have a higher SSE than the best fit line. In linear regression, uncertainty of standard calibration concentration was omitted, but the uncertaity of intercept was considered. In this equation substitute for and then we check if the value is equal to . The least squares estimates represent the minimum value for the following
This best fit line is called the least-squares regression line . The second line says y = a + bx. . The graph of the line of best fit for the third-exam/final-exam example is as follows: The least squares regression line (best-fit line) for the third-exam/final-exam example has the equation: Remember, it is always important to plot a scatter diagram first. Making predictions, The equation of the least-squares regression allows you to predict y for any x within the, is a variable not included in the study design that does have an effect bu/@A>r[>,a$KIV
QR*2[\B#zI-k^7(Ug-I\ 4\"\6eLkV Using the Linear Regression T Test: LinRegTTest. (Be careful to select LinRegTTest, as some calculators may also have a different item called LinRegTInt. The term[latex]\displaystyle{y}_{0}-\hat{y}_{0}={\epsilon}_{0}[/latex] is called the error or residual. The goal we had of finding a line of best fit is the same as making the sum of these squared distances as small as possible. In this situation with only one predictor variable, b= r *(SDy/SDx) where r = the correlation between X and Y SDy is the standard deviatio. Optional: If you want to change the viewing window, press the WINDOW key. Scroll down to find the values a = -173.513, and b = 4.8273; the equation of the best fit line is = -173.51 + 4.83 x The two items at the bottom are r2 = 0.43969 and r = 0.663. Use your calculator to find the least squares regression line and predict the maximum dive time for 110 feet. Regression through the origin is when you force the intercept of a regression model to equal zero. Therefore, approximately 56% of the variation (1 0.44 = 0.56) in the final exam grades can NOT be explained by the variation in the grades on the third exam, using the best-fit regression line. (0,0) b. b. 2.01467487 is the regression coefficient (the a value) and -3.9057602 is the intercept (the b value). Data rarely fit a straight line exactly. Then arrow down to Calculate and do the calculation for the line of best fit.Press Y = (you will see the regression equation).Press GRAPH. In both these cases, all of the original data points lie on a straight line. The regression line approximates the relationship between X and Y. The number and the sign are talking about two different things. r = 0. INTERPRETATION OF THE SLOPE: The slope of the best-fit line tells us how the dependent variable (\(y\)) changes for every one unit increase in the independent (\(x\)) variable, on average. Slope, intercept and variation of Y have contibution to uncertainty. Press Y = (you will see the regression equation). r F5,tL0G+pFJP,4W|FdHVAxOL9=_}7,rG& hX3&)5ZfyiIy#x]+a}!E46x/Xh|p%YATYA7R}PBJT=R/zqWQy:Aj0b=1}Ln)mK+lm+Le5. Press ZOOM 9 again to graph it. Use the correlation coefficient as another indicator (besides the scatterplot) of the strength of the relationship between x and y. You'll get a detailed solution from a subject matter expert that helps you learn core concepts. The data in the table show different depths with the maximum dive times in minutes. Accessibility StatementFor more information contact us atinfo@libretexts.orgor check out our status page at https://status.libretexts.org. Conversely, if the slope is -3, then Y decreases as X increases. The term \(y_{0} \hat{y}_{0} = \varepsilon_{0}\) is called the "error" or residual. The regression line always passes through the (x,y) point a. Using calculus, you can determine the values ofa and b that make the SSE a minimum. When regression line passes through the origin, then: (a) Intercept is zero (b) Regression coefficient is zero (c) Correlation is zero (d) Association is zero MCQ 14.30 The calculated analyte concentration therefore is Cs = (c/R1)xR2. The calculations tend to be tedious if done by hand. D. Explanation-At any rate, the View the full answer You can specify conditions of storing and accessing cookies in your browser, The regression Line always passes through, write the condition of discontinuity of function f(x) at point x=a in symbol , The virial theorem in classical mechanics, 30. Determine the rank of MnM_nMn . Slope: The slope of the line is \(b = 4.83\). In addition, interpolation is another similar case, which might be discussed together. endobj
If \(r = -1\), there is perfect negative correlation. A simple linear regression equation is given by y = 5.25 + 3.8x. Creative Commons Attribution License The criteria for the best fit line is that the sum of the squared errors (SSE) is minimized, that is, made as small as possible. It tells the degree to which variables move in relation to each other. The variable r has to be between 1 and +1. 2. ), On the LinRegTTest input screen enter: Xlist: L1 ; Ylist: L2 ; Freq: 1, On the next line, at the prompt \(\beta\) or \(\rho\), highlight "\(\neq 0\)" and press ENTER, We are assuming your \(X\) data is already entered in list L1 and your \(Y\) data is in list L2, On the input screen for PLOT 1, highlight, For TYPE: highlight the very first icon which is the scatterplot and press ENTER. You could use the line to predict the final exam score for a student who earned a grade of 73 on the third exam. The process of fitting the best-fit line is called linear regression. Then arrow down to Calculate and do the calculation for the line of best fit. Regression investigation is utilized when you need to foresee a consistent ward variable from various free factors. Except where otherwise noted, textbooks on this site The slope consent of Rice University. Maybe one-point calibration is not an usual case in your experience, but I think you went deep in the uncertainty field, so would you please give me a direction to deal with such case? It is not an error in the sense of a mistake. OpenStax, Statistics, The Regression Equation. The correlation coefficient, r, developed by Karl Pearson in the early 1900s, is numerical and provides a measure of strength and direction of the linear association between the independent variable x and the dependent variable y. Regression 2 The Least-Squares Regression Line . An observation that lies outside the overall pattern of observations. I dont have a knowledge in such deep, maybe you could help me to make it clear. The residual, d, is the di erence of the observed y-value and the predicted y-value. Statistical Techniques in Business and Economics, Douglas A. Lind, Samuel A. Wathen, William G. Marchal, Daniel S. Yates, Daren S. Starnes, David Moore, Fundamentals of Statistics Chapter 5 Regressi. This book uses the The size of the correlation \(r\) indicates the strength of the linear relationship between \(x\) and \(y\). Computer spreadsheets, statistical software, and many calculators can quickly calculate the best-fit line and create the graphs. Of course,in the real world, this will not generally happen. %PDF-1.5
If (- y) 2 the sum of squares regression (the improvement), is large relative to (- y) 3, the sum of squares residual (the mistakes still . We recommend using a Enter your desired window using Xmin, Xmax, Ymin, Ymax. The line always passes through the point ( x; y). The line of best fit is: \(\hat{y} = -173.51 + 4.83x\), The correlation coefficient is \(r = 0.6631\), The coefficient of determination is \(r^{2} = 0.6631^{2} = 0.4397\). The size of the correlation rindicates the strength of the linear relationship between x and y. Consider the following diagram. The sum of the difference between the actual values of Y and its values obtained from the fitted regression line is always: (a) Zero (b) Positive (c) Negative (d) Minimum. In the diagram in Figure, \(y_{0} \hat{y}_{0} = \varepsilon_{0}\) is the residual for the point shown. y=x4(x2+120)(4x1)y=x^{4}-\left(x^{2}+120\right)(4 x-1)y=x4(x2+120)(4x1). This means that, regardless of the value of the slope, when X is at its mean, so is Y. Advertisement . Any other line you might choose would have a higher SSE than the best fit line. So I know that the 2 equations define the least squares coefficient estimates for a simple linear regression. It has an interpretation in the context of the data: Consider the third exam/final exam example introduced in the previous section. This means that if you were to graph the equation -2.2923x + 4624.4, the line would be a rough approximation for your data. distinguished from each other. Correlation coefficient's lies b/w: a) (0,1) squares criteria can be written as, The value of b that minimizes this equations is a weighted average of n
Find the \(y\)-intercept of the line by extending your line so it crosses the \(y\)-axis. You should NOT use the line to predict the final exam score for a student who earned a grade of 50 on the third exam, because 50 is not within the domain of the \(x\)-values in the sample data, which are between 65 and 75. In both these cases, all of the original data points lie on a straight line. One of the approaches to evaluate if the y-intercept, a, is statistically significant is to conduct a hypothesis testing involving a Students t-test. sr = m(or* pq) , then the value of m is a . In the figure, ABC is a right angled triangle and DPL AB. One-point calibration in a routine work is to check if the variation of the calibration curve prepared earlier is still reliable or not. Please note that the line of best fit passes through the centroid point (X-mean, Y-mean) representing the average of X and Y (i.e. Do you think everyone will have the same equation? Press 1 for 1:Y1. The Regression Equation Learning Outcomes Create and interpret a line of best fit Data rarely fit a straight line exactly. This gives a collection of nonnegative numbers. Press 1 for 1:Y1. To graph the best-fit line, press the "Y=" key and type the equation 173.5 + 4.83X into equation Y1. Calculus comes to the rescue here. line. The coefficient of determination r2, is equal to the square of the correlation coefficient. the least squares line always passes through the point (mean(x), mean . It is important to interpret the slope of the line in the context of the situation represented by the data. For situation(2), intercept will be set to zero, how to consider about the intercept uncertainty? Values of \(r\) close to 1 or to +1 indicate a stronger linear relationship between \(x\) and \(y\). quite discrepant from the remaining slopes). It is not generally equal to y from data. 2 0 obj
(0,0) b. There is a question which states that: It is a simple two-variable regression: Any regression equation written in its deviation form would not pass through the origin. We can use what is called aleast-squares regression line to obtain the best fit line. Therefore, approximately 56% of the variation (\(1 - 0.44 = 0.56\)) in the final exam grades can NOT be explained by the variation in the grades on the third exam, using the best-fit regression line. \(\varepsilon =\) the Greek letter epsilon. However, computer spreadsheets, statistical software, and many calculators can quickly calculate \(r\). If each of you were to fit a line "by eye," you would draw different lines. Press \(Y = (\text{you will see the regression equation})\). Both x and y must be quantitative variables. This is because the reagent blank is supposed to be used in its reference cell, instead. In one-point calibration, the uncertaity of the assumption of zero intercept was not considered, but uncertainty of standard calibration concentration was considered. emphasis. At RegEq: press VARS and arrow over to Y-VARS. For each set of data, plot the points on graph paper. If the observed data point lies above the line, the residual is positive, and the line underestimates the actual data value fory. According to your equation, what is the predicted height for a pinky length of 2.5 inches? The sample means of the A random sample of 11 statistics students produced the following data, wherex is the third exam score out of 80, and y is the final exam score out of 200. For now we will focus on a few items from the output, and will return later to the other items. c. For which nnn is MnM_nMn invertible? I love spending time with my family and friends, especially when we can do something fun together. minimizes the deviation between actual and predicted values. ), On the LinRegTTest input screen enter: Xlist: L1 ; Ylist: L2 ; Freq: 1, We are assuming your X data is already entered in list L1 and your Y data is in list L2, On the input screen for PLOT 1, highlightOn, and press ENTER, For TYPE: highlight the very first icon which is the scatterplot and press ENTER. For now, just note where to find these values; we will discuss them in the next two sections. (Be careful to select LinRegTTest, as some calculators may also have a different item called LinRegTInt. (3) Multi-point calibration(no forcing through zero, with linear least squares fit). Scatter plots depict the results of gathering data on two . equation to, and divide both sides of the equation by n to get, Now there is an alternate way of visualizing the least squares regression
The confounded variables may be either explanatory We plot them in a. However, computer spreadsheets, statistical software, and many calculators can quickly calculate r. The correlation coefficient ris the bottom item in the output screens for the LinRegTTest on the TI-83, TI-83+, or TI-84+ calculator (see previous section for instructions). For the case of one-point calibration, is there any way to consider the uncertaity of the assumption of zero intercept? If the observed data point lies above the line, the residual is positive, and the line underestimates the actual data value for \(y\). The calculations tend to be tedious if done by hand. If the slope is found to be significantly greater than zero, using the regression line to predict values on the dependent variable will always lead to highly accurate predictions a. But I think the assumption of zero intercept may introduce uncertainty, how to consider it ? In statistics, Linear Regression is a linear approach to model the relationship between a scalar response (or dependent variable), say Y, and one or more explanatory variables (or independent variables), say X. Regression Line: If our data shows a linear relationship between X . <>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/MediaBox[ 0 0 595.32 841.92] /Contents 4 0 R/Group<>/Tabs/S/StructParents 0>>
slope values where the slopes, represent the estimated slope when you join each data point to the mean of
on the variables studied. The \(\hat{y}\) is read "\(y\) hat" and is the estimated value of \(y\). You are right. And regression line of x on y is x = 4y + 5 . The standard deviation of the errors or residuals around the regression line b. all integers 1,2,3,,n21, 2, 3, \ldots , n^21,2,3,,n2 as its entries, written in sequence, (b) B={xxNB=\{x \mid x \in NB={xxN and x+1=x}x+1=x\}x+1=x}, a straight line that describes how a response variable y changes as an, the unique line such that the sum of the squared vertical, The distinction between explanatory and response variables is essential in, Equation of least-squares regression line, r2: the fraction of the variance in y (vertical scatter from the regression line) that can be, Residuals are the distances between y-observed and y-predicted.