Blue Green Line in Header

Statistician, Heal Thyself


It’s a cliché to point out that we live in the era of big data – from piles of personal data collected by the web geniuses at Google and Facebook to financial data compiled by credit rating agencies to the new government sponsored economic databases open to use by any number of firms including Beacon Economics. Combine this with unprecedented desktop processing power and we have ways of dissecting what is going on in our world like never before. Very exciting!

But in the midst of all this excitement, we sometimes forget that while much of this data is truly cutting edge and allows for analyses that go beyond anything social scientists used to dream about, it doesn’t mean the data doesn’t have potentially significant flaws that can skew your perception of the world if taken at face value. We’ve recently seen an unfortunate example of this in the City of Los Angeles, which I’ll get to in a moment.

If you imagined Beacon Economics to be a sleek economic analysis vehicle—say a Ferrari—then the data we use is the engine that propels us. I would love to tell you that under the hood is a 565 HP turbo engine. Unfortunately, many times it seems we are driving with a 1967 2-cylinder Peugeot air-cooled engine that we hold together with gum, duct tape, and a prayer. But if that is what you have to work with, to paraphrase former U.S. Defense Secretary Donald Rumsfeld—you go to work with the data you have, not the data you want.

Why is so much of the economic data we rely on so messy? The problem is simply that collecting very high quality economic information is expensive and can often be quite personally invasive. Our official data creators in the nation (mainly the government) rely on small samples, self reported data, and simply proxies to try reach a reasonable approximation of what is actually going on.

What that means is that to be honest with any analysis, we also need to be skeptical of the data being used. At Beacon, we always try to get at the answer to an analytic problem using two or three different estimation techniques to make sure the answers line up—because if they don’t we know we may have to figure out other, more clever, ways to reach the most accurate conclusion.

A classic example of how not being skeptical enough of data can lead to warped views of the world comes from the recent debate here in Los Angeles over a measure to raise the minimum wage for some workers in the hotel industry. Those who wrote analyses supporting the measure painted a dismal picture of hotel workers. According to the reports, the median wage earned by fulltime workers was less than $13 per hour. The average wage was just over $14 per hour. With these poverty-level wages, no wonder hotel workers need a raise.

But—the data used comes from an annual dataset called the American Community Survey (ACS). It’s a very rich dataset, based on what was once the long form U.S. Census survey sent out every ten years. Much of the data is self-reported, rather than being verified through other means. As such, simple human misperceptions can weigh heavily on the results. For example, people tend to overestimate the number of hours they worked the previous years. They also tend to underestimate their gross income. And this implies that hourly wages are liable to be biased down as well.

And they do appear to be. When Beacon Economics pulled the ACS data, we arrived at similar results in terms of average and median hourly pay. But when we looked closer, it turns out that a large number of workers—one-quarter of the entire hotel industry in Los Angeles County—gave numbers that suggested they were paid less than the legal minimum wage in California for that year!

If that doesn’t sound believable, well—it isn’t. Better data from actual payroll records that the government collects as part of the unemployment insurance program suggest the average hotel worker is making almost 20% more than they are self reporting in the ACS data. And of course this fails to include the healthy tip income that many workers in the hospitality space earn, including waiters, valets, bellhops, and bartenders. In other words, clearly there are some low-income workers in the hotel field, but not nearly as many as the initial estimates the researchers would have you believe.

And given that, the entire question as to why we need a hotel industry specific minimum wage would be called into doubt. Unfortunately these critical ‘data’ doubts were never raised and the measure was passed by the City Council.

Moreover, the flawed ACS data hasn’t gone away, and these skewed numbers are now being used to promote minimum wage campaigns at the local level across California, not the least of which is the Mayor of Los Angeles’ proposal to raise the City’s minimum wage to $13.25 an hour.

Previous article  |  Next article