Home Case Studies MS Office Tips and Tricks Sample Code Helpful Videos Custom Development  Custom Training Join Us Contact Us
Search Our Site:  

Charts And Statistical Analysis

Do large car engines (cubic feet of displacement –cid) consume more gasoline (miles per gallon –mpg) than small engines? Does the crime rate (crimes per 100,000 persons) increase with population density (inhabitants per square mile)? Does increase exercise decrease the rate of heart attacks?

If the problem you are trying to analyze is similar to the simple examples we just gave, then you might consider using the statistical technique we are about to describe: simple linear regression.

In the examples mentioned, we are trying to determine if the two attributes (variables) are related: engine size and miles per gallon, crime rate and population density, exercise and heart attacks. In order to use simple linear regression several requirements must be met. Let’s start by mentioning two:

So, is there a relationship between the square footage of a house and its selling price? The answer seems obvious in the case of real estate, but other relationships are not so simple to establish. As an example, let’s analyze data collected on a sample of single-family homes in the San Jose, CA area.

Before determining whether or not it is appropriate to use linear regression to assess the relationship between square footage and selling price, we must ask ourselves: are the data linear? One way to find out is to use the scatter plot capabilities of Excel (XY Scatter Chart). Plot the independent variable on the x-axis (horizontal) and the dependent variable on the y-axis (vertical).
 




We can immediately see that the data look linear. If you are experienced in statistics, you will also be able to see that the data have been manipulated to illustrate a point. Actual real estate data are not nearly so linear!

Once the data have been graphed, you can go to the Chart menu, click on Add Trendline… and you’ll see the dialog box below.



The first tab (“Type”), above, allows you to choose the type of regression you’d like to use. If you feel the data set you are trying to model are logarithmic or polynomial, you may choose that type. Our visual inspection of the scatter plot has led us to conclude that a linear association seems to exist between the two variables we are analyzing, so we select “Linear”.

A Word From Our Sponsor

Online Courses For CPE Credit
CPE Link is a new provider of web-based continuing professional education (CPE) for accounting professionals.


The second tab is Options (see below). On this tab we’ll select “Display equation on chart” and “Display R-squared value on chart”.



The equation on the chart below describes a linear relationship. It has the form y=mx+b. The predicted value of the dependant variable (price) is in terms of a given value of the independent variable (square footage). The coefficient m, the slope, is the average increase in price for every one unit increase in square footage. In our case, the price of the house increases by $165.25 for every square foot. The coefficient b is called the y-intercept, or the value of when x equals zero. Here, an empty lot (0 square feet of housing) still costs $360,633!



The trend line describes the values (the predicted prices based on the equation) given the x values (square footage) in the data set. Because the observed values (the ones collected from the sample of homes) gather closely near the trend line, the model is a good fit for the data.

The final value obtained by using Excel’s “Add Trendline” function is R2, also known as the coefficient of determination: it measures how much of the variance in prices (the dependent variable) is accounted for by square footage (the independent variable). Variance is a term used by statisticians to express how much variation (spread) there is in the data. So R2 tells us how well square footage “explains” the differences in prices among single-family homes in the San Jose area. Specifically, it tells us that over 96% of the variance in price can be accounted for by square footage. In other words, all you need to know to figure out the price of a single-family home in that area is the square footage.

Of course, this is a simplified example used to illustrate a point. Unfortunately, or fortunately, depending on your point of view, reality, as you imagine, is far more complex. It is rare that only one independent variable would “explain” all of the variation in some dependent variable. Surely, there are many other factors that account for price differences among homes…but that’s for another discussion…
 

References

Journal of Accountancy Articles

Tweaking The Numbers

Block That Spreadsheet Error

Excel Security Issues


 

Clients

Do you need help with Microsoft Office? Contact us

See how we've helped companies like yours develop effective business processes

Client Success Stories


 

Consultants

I Get It! Consultants set their own hours, work with their own clients, and choose their own projects.

If you're a Microsoft Office guru and would like to explore being a full-time consultant, please see our consultant information pages.