Tuesday, June 24, 2008

Basic Biological Change

Some excellent (as per usual) articles today at Wired magazine provide excellent illustrations of the changing nature of the way we do biology. Not content with the "Information Age," Wired editors are calling this the "Petabyte Age," and boldly proclaim that "More isn't just more - more is different."

When it comes to biology, they couldn't be more right. The scientific method starts with observations. It was accepted early on that no person could observe everything, and a big part of scientific study and training is learning where (and how) to make observations. In biology, for example, it's only been roughly 50 years since we figured out what gene expression was. 30 years ago, you could look at one gene at a time - but it had better be highly active. 15 years ago you could do loads of genes from the same sample, but they had to be done one at a time. Today, we are close to being able to looking at the expression patterns of every single gene in a given sample - whether we know what the gene is/does or not - simultaneously.

These tremendously powerful approaches, which are becoming more commonplace all the time, generate these massive datasets that humans simply can't deal with. Not that brains don't have the computing power to do it, but they're usually busy with things like sensory input and breathing. The development of new methods for generating and analyzing "big" data will trump traditional scientific approaches for a number of important reasons.

First of these is the simple volume of data. No longer shall we have to pick and choose endpoints to examine or the means to analyze them. Where possible, we measure everything. Which is a bit frightening. Then feed it all to a system that starts doing correlations and cluster analyses and crazy matrix algebra and tells us what's important. Which is the second thing - a total lack of bias in the interpretation.

Bias has been an important tool in science in the sense that hypotheses and models bring you to look at certain things and not at others. This "framing," as it is called in the artificial intelligence community, is very difficult for computers. But maybe they don't have to bother, and we can take advantage of this to see things that we would never have looked for (or even considered), no thanks to the tunnel vision of so-called scientific expertise. If we could work out a way to normalize the data from experiments across labs, we could then make the datasets public - like I talked about GlaxoSmithKline doing last week - preferably into a common, searchable resource, and produce more information than any single scientist could create in their careers. And we'll find out that more isn't just more, it is qualitatively different. Of course, this difference means there will be a whole new set of problems to solve as these analyses evolve.

The irony is that mathematicians and computer scientists will be able to do PhDs in biology from their armchairs in their spare time, and risk being more successful at it than the biologists.

* * *

Quote of the month: "The 23andMes of the world are more in the entertainment realm..." This is Andy Gores, CEO of HairDX, commenting on why he was disappointed about the company's recent decision to stop selling genetic tests without a doctor's approval in California and New York. 23andMe will sequence your genome for about a thousand dollars, and tell you whether you have elevated risk for a given condition based on your DNA. HairDX, one presumes, does something similar for genetically testing your chances for hair loss. People would do well remember this.

No comments: