Halloween party ideas 2015

For those who don't know the term:  Moneyball is the title of a book and a movie about the 2002 Oakland Athletics baseball team, a team with a payroll in the bottom 10% of major league baseball at the time.   They used a data-intensive, analytics-based strategy called sabermetrics to find "hidden value" and "market inefficiencies", to put together a very competitive team despite their very limited financial resources.   A recent (very fun if you're a baseball fan) book along the same lines is this one.  (It also has a wonderful discussion of confirmation bias!)

A couple of years ago there was a flurry of articles (like this one and the academic paper on which it was based) about whether a similar data-driven approach could be used in scientific academia - to predict success of individuals in research careers, perhaps to put together a better department or institute (a "roster") by getting a competitive edge at identifying likely successful researchers.

The central problems in trying to apply this philosophy to academia are the lack of really good metrics and the timescales involved in research careers.  Baseball is a paradise for people who love statistics.  The rules have been (largely) unchanged for over a hundred years; the seasons are very long (formerly 154 games, now 162), and in any game an everyday player can get multiple opportunities to show their offensive or defensive skills.   With modern tools it is possible to get quantitative information about every single pitched ball and batted ball.  As a result, the baseball stats community has come up with a huge number of quantitative metrics for evaluating performance in different aspects of the game, and they have a gigantic database against which to test their models.  They even have devised metrics to try and normalize out the effects of local environment (baseball park-neutral or adjusted stats).

Fig. 1, top panel, from this article.  x-axis = # of citations.
The mean of the distribution is strongly affected by the outliers.
In scientific research, there are very few metrics (publications; citation count; impact factor of the journals in which articles are published), and the total historical record available on which to base some evaluation of an early career researcher is practically the definition of what a baseball stats person would call "small sample size".   An article in Nature this week highlights the flaws with impact factor as a metric.  I've written before about this (here and here), pointing out that impact factor is a lousy statistic because it's dominated by outliers, and now I finally have a nice graph (fig. 1 in the article; top panel shown here) to illustrate this.  

So, in academia, the tantalizing fact is that there is almost certainly a lot of "hidden value" out there missed by traditional evaluation approaches.  Just relying on pedigree (where did so-and-so get their doctorate?) and high impact publications (person A must be better than person B because person A published a paper as a postdoc in a high impact glossy journal) almost certainly misses some people who could be outstanding researchers.  However, the lack of good metrics, the small sample sizes, the long timescales associated with research, and enormous local environmental influence (it's just easier to do cutting-edge work at Harvard than at Northern Michigan), all mean that it's incredibly hard to come up with a way to find these people via some analytic approach.  

Diberdayakan oleh Blogger.