Home Case Studies MS Office Tips and Tricks Sample Code Helpful Videos Custom Development  Custom Training Join Us Contact Us
Search Our Site:  

Charts And Statistical Analysis

Do large car engines (cubic feet of displacement –cid) consume more gasoline (miles per gallon –mpg) than small engines? Does the crime rate (crimes per 100,000 persons) increase with population density (inhabitants per square mile)? Does increase exercise decrease the rate of heart attacks?

If the problem you are trying to analyze is similar to the simple examples we just gave, then you might consider using the statistical technique we are about to describe: simple linear regression.

In the examples mentioned, we are trying to determine if the two attributes (variables) are related: engine size and miles per gallon, crime rate and population density, exercise and heart attacks. In order to use simple linear regression several requirements must be met. Let’s start by mentioning two:
  • The variables used in the analysis must be expressed in a meaningful numerical scale (miles per gallon, degrees Fahrenheit, etc.).
  • We expect some time precedence of one variable over the other: thus we anticipate that one attribute has an effect on the other, such as the size of a house has an effect on the selling price of that property, not vice versa. Thus, we call one factor the independent variable (square feet) and the other the dependent variable (price).

So, is there a relationship between the square footage of a house and its selling price? The answer seems obvious in the case of real estate, but other relationships are not so simple to establish. As an example, let’s analyze data collected on a sample of single-family homes in the San Jose, CA area.

Before determining whether or not it is appropriate to use linear regression to assess the relationship between square footage and selling price, we must ask ourselves: are the data linear? One way to find out is to use the scatter plot capabilities of Excel (XY Scatter Chart). Plot the independent variable on the x-axis (horizontal) and the dependent variable on the y-axis (vertical).
 




We can immediately see that the data look linear. If you are experienced in statistics, you will also be able to see that the data have been manipulated to illustrate a point. Actual real estate data are not nearly so linear!

Once the data have been graphed, you can go to the Chart menu, click on Add Trendline… and you’ll see the dialog box below.



The first tab (“Type”), above, allows you to choose the type of regression you’d like to use. If you feel the data set you are trying to model are logarithmic or polynomial, you may choose that type. Our visual inspection of the scatter plot has led us to conclude that a linear association seems to exist between the two variables we are analyzing, so we select “Linear”.

Are You Ready For Office 365?


Office 365 is

  • Email and calendaring powered by Microsoft Exchange
  • Office Web Apps - the office productivity apps you already know, tailored for the web
  • Websites powered by Microsoft SharePoint both for internal-facing intranet and customer-facing website
  • Instant Messaging and Online Meetings powered by Microsoft Lync
  • All with a financially-backed 99.9% uptime guarantee

Sign up for a free trial:

Name
Company
Email Address

Your information will not be shared with any other party. You must provide a valid email in order to recieve your login information and starting information kit.

Watch this video  to see how Office 365 can help your business.

The second tab is Options (see below). On this tab we’ll select “Display equation on chart” and “Display R-squared value on chart”.



The equation on the chart below describes a linear relationship. It has the form y=mx+b. The predicted value of the dependant variable (price) is in terms of a given value of the independent variable (square footage). The coefficient m, the slope, is the average increase in price for every one unit increase in square footage. In our case, the price of the house increases by $165.25 for every square foot. The coefficient b is called the y-intercept, or the value of when x equals zero. Here, an empty lot (0 square feet of housing) still costs $360,633!



The trend line describes the values (the predicted prices based on the equation) given the x values (square footage) in the data set. Because the observed values (the ones collected from the sample of homes) gather closely near the trend line, the model is a good fit for the data.

The final value obtained by using Excel’s “Add Trendline” function is R2, also known as the coefficient of determination: it measures how much of the variance in prices (the dependent variable) is accounted for by square footage (the independent variable). Variance is a term used by statisticians to express how much variation (spread) there is in the data. So R2 tells us how well square footage “explains” the differences in prices among single-family homes in the San Jose area. Specifically, it tells us that over 96% of the variance in price can be accounted for by square footage. In other words, all you need to know to figure out the price of a single-family home in that area is the square footage.

Of course, this is a simplified example used to illustrate a point. Unfortunately, or fortunately, depending on your point of view, reality, as you imagine, is far more complex. It is rare that only one independent variable would “explain” all of the variation in some dependent variable. Surely, there are many other factors that account for price differences among homes…but that’s for another discussion…
 
Please share your comments on this article!

 
Name (opt)
Email Address (opt)
Your Questions/
Comments

I Get It! Development does not share your email address or other information, nor do we add you to any mailing lists unless you specifically request to be added by checking the box below.



SEE ALSO
You may also find the following articles useful:

Calculate Growth Rates In Excel

Spreadsheet vs. Database

Pimp My Spreadsheet!

Amazing Excel Comments

Deduplicate Excel Data

Excel - Access Import Problems

Circular References

Custom Formats In Excel

Auto Update Charts

Data Quick Check In Excel

Advanced Sorting In Excel

Excel Text/Date Formulas

Named Ranges

Paste Special Tricks

Subtotals In Excel

References

Journal of Accountancy Articles

Tweaking The Numbers

Block That Spreadsheet Error

Excel Security Issues


 

Clients

Do you need help with Microsoft Office? Contact us

See how we've helped companies like yours develop effective business processes

Client Success Stories


 

Consultants

I Get It! Consultants set their own hours, work with their own clients, and choose their own projects.

If you're a Microsoft Office guru and would like to explore being a full-time consultant, please see our consultant information pages.