The problems with data extrapolation

I've been doing a lot of curve fitting lately. Mostly to create a pattern to improve software or just some stuff for school. But I've noticed that it's pretty easy to find the curve you want, rather than the curve you need.

Take the housing market. Plenty of people out there have a vested interest in it picking up again. But plenty of other people have an interest in it staying down. I get it. Some people are underwater and want/need it to go up again. And plenty of other people would like to buy and want it to go down even further to make that easier. Or just want to be right in their feeling that the market is still overinflated.

Take Michael David White, over at HousingStory.net. Here's his graph for where he thinks prices are going:
I'm curious how he came up with this fit so I exported the data myself (you can too from the Case-Schiller home index website) and played with it a little. For fun I also pulled some data from the government's various census collecting sites that you'll see later.

Case-Schiller has only been collecting data since 1986. I grabbed the data from my hometown county, which looks pretty similar to his curve fit but might be more representative of living in a higher cost of living than looking at the whole nation averaged together. If you assume from about 1997 and on is an abberation, and do a linear fit from 1986 to 1997 and extrapolate it on into the future you get a line that looks very similar to what White came up with.
Clearly that would mean a huge drop in home prices if you were looking only at the first 10 years of data. That yellow line is inflation, this fit would seem to say inflation should rise faster than home prices. Well what if you fit the whole thing?

 Fitting the whole collection of data would seem to say homes are now underpriced and we should expect a pickup soon. I don't think anyone's anticipating that much of an increase in home prices either. Once again, the danger of looking at even the whole data set, the first 10 years, the last, or whatever portion you pick. No matter what trying to predict can seem pretty silly.

Here I just grabbed inflation with both the average and the median. Looks like White could have grabbed the average for his data as well, but as it's rising slower than inflation you have to wonder at that. And what's inflation mean anyways? Should se assume home prices will rise any faster or slower than inflation? What about food prices. There's been a lot of talk that food has stayed relatively cheap for Americans, what if we compare total food inflation to home prices.

If we looked at the rise in food prices, home prices don't look that far off or too far inflated. Does White think there's a food-price bubble? Does he think food prices are going to come crashing down anytime soon? What about healthcare?

I don't hear anyone speculating that healthcare costs are going to go down anytime soon. There's been plenty of talk about the "housing bubble" or even the "higher education" bubble. But you don't hear anyone saying you should pull out of healthcare stocks or just wait for that big crash in healthcare prices that's going to happen anytime soon.

And that's the danger of data extrapolation. It's plenty fun to graph some data and draw some lines. If you have a specific agenda, it's not hard to find something to back it up. But truth is it's just not a good way to make a conclusion. I expect Mr. White gets a lot of press and blog hits from people who want to buy right now and are really hoping prices will drop even more. They might or they might not. But these lines could tell any story you wanted them to.


  1. I run into this a lot. A director saw one day of data (because it was the only day available...it's a long story...files got corrupted; he's impatient...) and decided to extrapolate it out for an entire month. That month happened to be December which is already an anomaly. And then he took that month and extended it to a whole year. Long story short: what should have been cautious projections for a possible new product have now been blown entirely out of proportion because this director lost his head and ran off with crazy numbers. It's all because people became impatient and had an agenda.

