Big Data Without Statisticians: BD2K Symposium At UCLA

UCLA is having a big data conference on Thursday and Friday Mar 27, 28 2014.  The conference is organized by four computer science and genomic biology types. Speakers cluster [one of the rare appropriate uses of cluster analysis I know of] into three types of folks. Big biologists [they must be big, they're doing big data] doing big data, genomic stuff; computer scientists doing topic models, and a few math modelers who as far as I know, don't usually look at data at all.

The conference [well, it's called a workshop] is idiosyncratic [unique] in that it has no statisticians involved. The four organizers hail from Radiological Sciences, Physiology, Microbiology and Computer Science. If I read and recall their backgrounds correctly, two come from Comp Sci, two from Biology backgrounds, three are working in genomics of some sort and one in imaging. It seems probable that all are doing statistics, likely very complex statistics and that's fine; there aren't enough statisticians around, there never has been. Scientists have always had to do their own data analysis. 

It is a bit hard to imagine who might be interested in the entire conference other than me and perhaps a few friends of mine. The genomicists may be interested in other genomicists but not in the math modelers. The comp sci folks may be interested in other topic modelers [a new class of perfectly good models] but will they be interested in genomics? And I haven't a clue what other talks the math modelers will want to hear.

My main interest as a statistician is perhaps to hear about the dynamic fringes of statistical application. The edges of statistical applications are where a lot of the fun stuff is going to be happening. And to defend a bit of statistics/biostatistics turf. I certainly want to hear about the topic models and from the topic modelers. I want to hear from the math modelers in case they've stumbled into some data; I happen to know a bit about how to combine data and math models and I care about the problem too. But I can't make progress without help. I'm not so interested in the genomics; I'm not not interested in the genomics mind you, but the problem is too much data, not enough time to investigate it. 

Another interest of mine is getting to know who the players are. The big data movement is going to get political shortly on campus and you can't tell the players without a scorecard. 

I imagine eventually the big data movement will devolve into a fight over resources. There is a ton of data out there and few statisticians. People will want resources to analyze their data, and will want those resources supplied by central campus, but won't want the resources controlled by stat or biostat departments. Minnesota is currently having a nice campus wide initiative (or fight) about big data; apparently the powers that be didn't want statistics or biostatistics involved at all. 

There are lots of misconceptions about statisticians and statistics. I'm expecting to hear some and I'll try to keep track of what I hear at the conference. Some of these are sheer prejudice: that statisticians don't care about big data or that we don't care about someone's specific type of data. This is partly a forest/tree problem: individual statisticians are busy and may not have time for some one else's data or problems. Another issue is way too much data and way too few statisticians. Statisticians code up lots of solutions to problems in canned software but that will never handle the bleeding edge of science and statistical problems. So we all get to develop new methods and apply old methods in new ways to new sorts of data. That's why statistics is so much fun. Welcome to our world.

I'll see if I can stomach the entire conference or not. There are serious difficulties in communicating across disciplinary boundaries and I've had my share of successes and failures, possibly more than most given that biostatistics is naturally poly-disciplinary and that I have a tendency to be apian and flit from science to science. Often people don't understand their audience's background and they don't communicate well because of that. That applies to me, and it applies to scientists that I sometimes try to talk to or listen to. Even people who spend their careers on the boundaries can easily make mistakes: mistakes in communicating, mistakes in talking about the tools that come from outside their immediate scientific background, mistakes in assessing what an audience needs to hear to follow the research. Thus it is quite possible that most speakers will be talking to their own close colleagues and it will be of interest to see whether some or indeed any of the speakers are prepared to speak to a broad audience as this conference seems designed to attract. 

Subscribe to Big Data