Thursday, July 15, 2010

About that long form census...

It goes without saying that the Harper government's decision to eliminate the mandatory long-form census, replacing it with a voluntary survey, is beyond stupid. They claim that the mandatory survey is "coercive" and an invasion of privacy. Or something:
In a written statement Tuesday, Clement said the short-form census, which is still mandatory, will provide a sufficient demographic picture of the country.

"The government does not think it is necessary for Canadians to provide Statistics Canada with the number of bedrooms in their home, or what time of day they leave for work or how long it takes them to get there," Clement said. "The government does not believe it is appropriate to force Canadians to divulge detailed personal information under threat of prosecution."

Garneau argued Clement does not understand how scientific data is gathered or used, and that the census has to be mandatory to get responses from a wide variety of people. He said that information from the long-form census is essential for the development of sound government policies.
I'll leave aside the fundamental reasons why this is a stupid idea and address Clement's comments directly. First, it's worth noting that nothing in the census data will, in the end, reflect any individual information. Second, Clement's own examples seem rather important to me - commuting time, for example, is a rather good proxy for the likes of traffic congestion and patterns of daily movement. The question of where people work vs. where they live, and how far they must travel daily, is of pretty obvious relevance to urban planning, highway construction, public transit, and a host of social policy issues. I'm not sure what about such questions is overly "personal" either.

So what's the real problem? Well, while it's nice to think that a voluntary survey would achieve a sufficiently high response rate, this strikes me as doubtful. Currently the mandatory survey is sent to 20% of Canadian households (I presume these households are chosen randomly). The line from the government seems to be that a voluntary survey might reach a larger number of Canadians, and that this would be superior to the mandatory survey of 20%. This is, however, wrong. Any kind of survey of a population contains some element of sampling error. You can actually predict the size of this error by choosing an appropriate sample size. Now, 20% of Canadian households is a big sample. REALLY big. The sampling error implied by such a large sample is very small, so small in fact that increasing its size to 25% or even 50% of households will not make a significant difference. Happily, because this survey is mandatory, there is minimal non-response and so the results of the sampling are said to be "unbiased".

Unfortunately, a voluntary survey leads to an unpredictable degree of non-response bias in the results. Bias is a form of structural error in a statistic that cannot be eliminated by increasing the sample size (which is not important in this case anyhow). We can imagine that poor people (or wealthy ones) or anarchists or libertarians or contrarians or any other sort of socioeconomic or political group might be more or less likely to respond to a voluntary survey. That's bias. And it means that the results of the survey may not - and almost certainly will not - be representative or generalizable to the Canadian population. That's a demonstrably inferior survey sample - less reliable, less accurate, and in ways that are difficult to quantify. We can estimate sampling error fairly readily, but that does not hold for sampling bias, since it requires a great deal of information about the sources of bias. Of course, without the mandatory long-form survey, it seems unlikely we'd know why particular people failed to respond to a voluntary survey. Quite the catch 22.

So, Tony Clement, with respect to this:
“I am not saying it's every Canadian, but I am saying there are Canadians [who complained] and we should try to accommodate their concerns in a balanced way,” he said.

He added that he took the privacy concerns to Statistics Canada and asked they be incorporated into the next census. “They gave me options and we chose one of those options,” he said.

“This is a methodology that Statistics Canada offered to us and if it's good enough for Statistics Canada, it should be good enough for some of our critics.”
As I happen to know a number of statisticians, I'm rather skeptical that Statcan thinks this change is "good enough". At this point, I should note that indeed have some kind of degree: MMath (Statistics-Biostatistics) from Waterloo. I think that beats Tony's poli sci undergrad (I have one too) and law degree - at least insofar as these things go.

No comments: