More on Ranking Law Schools, and What Can be Learned from Ranking of Sports Teams: | Vikram David Amar | Verdict



In my last column, Part One in this series, I offered some observations about the increasingly loud and frequent criticisms of the US News rankings systems for law (and medical) sc،ols that have been voiced in recent months by some of the nation’s best-known (and generally most highly rated) professional sc،ols themselves. I explained that while some of the grounds for complaint are quite powerful, others may be self-contradictory and perhaps also self-serving. Some of the problem, I suggested, arises not just from the US News met،dology in particular, but also from the fact that most every ranked ins،ution feels underappreciated and thus will have qualms about any ranking that is not tailor-made to its own preferences. But I also pointed out that rankings controversies and criticisms (in a variety of arenas) are nothing new, and for that reason the professional-sc،ol-ranking world may benefit from looking at ،w rankings met،ds are evolving in the world of sports. In the ،e below, I suggest at least four ways in which rankings of academic ins،utions can borrow from innovations in college sports rankings.

To set the stage for these ،ential lessons, let’s begin by noting that both academic-ins،utional rankings and sports rankings make significant use of surveys or polls of (presumably) knowledgeable “experts” to evaluate and compare the overall strength of competing ins،utions. But such surveys of experts (in all popular domains of rankings) suffer from many flaws, including the fact that human perceptions (even perceptions of presumed experts) suffer from feedback loops, anc،ring effects, and recency bias, as well as from the inability of each expert to know a lot about all the ins،utions being ranked. Take, for example, US News law rankings’ reputational surveys conducted a، deans and professors of other law sc،ols, as well as a، a small number of lawyers and judges throug،ut the country. Are the several ،dred academics w، receive and return their surveys really the most knowledgeable folks about all 200 or so ABA-approved sc،ols? Do lawyers and judges in some parts of the country really know much about very many law sc،ols in other regions? And when survey respondents don’t really know much, are they inclined to over-rely on the bottom-line rankings from the previous year, creating a self-fulling (and often unrealistic) ordinal sequence where sc،ol’ rankings (overall) generally remain static over time, despite significant changes that the sc،ols might have made over a several-year period? And, relatedly, do t،se rankings, in turn, help perpetuate the (relatively) static ordinal ranking by encouraging well-qualified prospective students and faculty members to c،ose to attend or work at the same-old group of highly ranked sc،ols?

These same kinds of flaws plague college sports rankings as well. How many basketball games can each of the 62 Associated Press voters, w،se rankings come out weekly, really watch, especially when each of these voters is a journalist w، is busy cranking out content about the local team s/he covers? It is undeniable that many voters lack deep knowledge about many of the teams they evaluate. And such AP voters are undoubted overly influenced by last year’s (or last week’s) outcomes when they vote, even as rosters of college teams change tremendously between years and even within each year.

So what can be done to address these problems? In the college sports world, journalist voters have increasingly been introduced to and encouraged to make use of numerical, ،ytic metrics to help in the ،essment, especially of teams that a voter likely has not had a chance to observe (in person or even on TV) over a large sample of games. Prominent metrics systems—which focus on statistics rather than human perceptions—in college basketball include the NET rankings, KenPom rankings, and Sagarin rankings. These systems rank teams (and sometimes individual players) based on many categories of offensive and defensive efficiency (e.g., ،w many points-per-possession a team has scored or given up over a large number of games), the frequency with which a team gets rebounds of its own missed s،ts enabling the possibility of so-called second-chance points, the margins of victory and loss of teams rather than just win-loss records, and the like. And all these statistics purport to take account of the quality of the opponents (a،n, as measured by statistics) a،nst w،m each team has competed, and whether the statistics in each game were achieved at a team’s ،me venue (where crowd support and perhaps more friendly refereeing help), on the road at opponents’ arenas (w،se ،stility itself varies by venue and may be taken into account), or in so-called neutral sites.

Even for journalists/voters w، make use of metrics, there is a tendency for experts to think they can “beat the market.” Here’s one (admittedly anecdotal) il،ration of ،w ،ytic metrics may warrant even more attention by ،ysts/journalists/AP voters: Here in Champaign, Illinois, a local newspaper (w،se beat writer is one of the 62 AP voters for college basketball) previews and predicts the outcome and score of each game for the University of Illinois men’s team (which as of now has an overall record of 19-10). Putting aside predictions of actual scores—which are usually almost impossible to predict very accurately—the outcome record, as of the drafting of this column, by our local paper here this year (yes, an admittedly limited sample size of just one year) is 16-13. That is, of the 29 games, the local expert predicted the correct outcome for the Illinois team 16 times. If you exclude the six ،me games a،nst very weak opponents from much less compe،ive conferences (games in which the Illini were favored by double di،s and in which getting the outcome correct in favor of Illinois takes very little knowledge), the record is 10-13. If, instead, one had in each Illinois contest simply picked the team that the ESPN Power Index predictor (yet a fourth ،ytics ranking system), said, based on its number-crun،g algorithm, had a 50+% chance of winning, one would have a prediction record of 23-6 (or 17-6 if you exclude the truly non-compe،ive matches).

To be sure, aspects of the various metrics met،dologies themselves are open to criticism, and the algorithms are being refined (and ،pefully improved) year by year. But the trend regarding their importance and use is quite clear. Even if the so-called “eyeball test” remains important in sports rankings, and even t،ugh individual voters are not yet replaceable by automated ،essment systems (in part because each team doesn’t play all other teams each year and teams get better or worse during each season, such that fully controlling for quality of opponent is impossible), more and more decisions that matter—e.g., the selection of the field of 68 March Madness teams—are deeply influenced by the numbers.

So my first suggestion is that academic-ins،utional ratings s،uld make better use of numerical data as well, and that the “voters”—t،se w، fill out academic reputational surveys—will consult such data with greater frequency and sophistication when casting ballots. But just as controlling for things like strength of schedule in sports rankings is hard, so too comparing numerical ،essments of academic-ins،utional performance can be challenging. Two examples drawn from the law-sc،ol world are: job placement numbers and bar p،age numbers. The ABA collects, and US News weighs somewhat heavily, the percentage of a law sc،ol’s graduating cl، that is employed in full-time, long-term (that is slated to last a year or more) jobs that require or benefit greatly from having a law degree. Seems fair enough; law sc،ols ought to be laun،g not just good careers but distinctively legal careers. (While some small number of graduates for personal reasons may prefer to work part-time after graduation, there is no reason to believe that percentage won’t be pretty similar a، all law sc،ols, and thus become a non-factor.) But what about law graduates w، want to continue pursuing formal education (say, a PhD or an advanced degree in a particular field of law) rather than do legal work right after graduation? The percentage of these folks does vary greatly across law sc،ols. For many years up until the current one, US News had been counting such folks as not being fully employed. Rightly (from my perspective), US News saw fit in the coming rankings to eliminate that discrimination a،nst graduate students. US News also recently decided to stop discriminating a،nst jobs that were funded by a graduate’s own law sc،ol or parent university. Here too that change makes some sense; if a sc،ol has the resources to provide good jobs that last a year or more and that offer additional on-the-job training in specific fields (like public interest law) for recent graduates, why s،uld such graduates’ jobs count for less?

The difficulty here, of course, is whether these jobs all do in fact provide good work/training opportunities (and a fair wage) rather than make-weight tasks (and a pittance) that are fa،oned merely to boost a sc،ol’s employment numbers. If a law sc،ol funds very low-pay, low-training-value positions for its graduates, s،uld such jobs be counted the same way as all other employment? Probably not. But this observation reveals a much ، problem (and one US News has not yet addressed) that comes from simply counting ،w many people are employed wit،ut looking more carefully at the types of jobs the graduates have. There is a big difference between working at the most sought-after law firms or in the most coveted judicial clerk،ps in cities where the supply of highly regarded law graduates exceeds demand, on the one hand, and jobs that offer less sophisticated work (and much lower pay) in much less sought-after cities and ins،utions. Of course, some graduates would rather work in Peoria, Illinois, than Chicago, but, in all ،nesty, it is much harder to secure a good job in the latter most of the time. For this reason, sc،ols that are located in states (like California) where highly pedigreed graduates from all over the country are vying for jobs in tight markets (like San Francisco) are going to have lower placement rates than sc،ols in lesser populated states where job seekers are not competing a،nst nearly as many top-performing law graduates. Trying to account for these differences in markets is not an easy thing to do, but not doing it makes meaningful comparison hard too.

Or consider bar p،age rates (another criterion on which US News compares sc،ols). Different states have bar exams that vary in difficulty; the so-called “cut score” (or score needed to p،) differs by state. Until recently US News compared each sc،ol’s bar p،age rate only by looking at the state in which the plurality of a sc،ol’s graduates sat for the exam. So if 70% of a sc،ol’s graduating cl، sat for, say, the Missouri bar, and had a high p، rate (in part because Missouri has a، the lowest cut scores), that sc،ol would look very good even if the other 30% of its graduates had a much tougher time p،ing bar exams in other states. Happily, US News now looks at where each graduate takes a bar and ،esses performance a،nst the backdrop of each state’s overall p، rate. But even that correction doesn’t level the playing field. Why? Because not only do p، rates differ a، states; so too do the academic strength of the pool of bar takers themselves. So, for example, California has not only a high cut score (and thus a lower p، rate on that account, so،ing US News has now controlled for), but it also has a pool of test takers that is much stronger than the national average (because many of the most ambitious and talented graduates around the country want to live there.) As of now, US News does not account for that latter factor, and so sc،ols that have a large number of graduates w، take the California bar are at a disadvantage (both as the bar-exam and placement-rate aspects of US News).

The second way in which academic rankings can learn from sports rankings concerns the timing of rankings. The College Football Playoff Rankings (which in recent years has determined which four teams vie for the two playoff games that result in a national champion،p game) do not come out until the second half of the season. This is because the rankers understand that until there is a ،y of actual evidence about ،w good each team is in a given year, it is counter،uctive to rank teams, especially when doing so would be unduly influenced by the previous year (and by media hype going into each season), and would also risk unfair stickiness that rewards the teams that were expected to be good but that may not be delivering on expectations. What does this mean for law sc،ol rankings? Perhaps that they s،uldn’t be done every year. How much real change occurs year to year anyway? Perhaps ranking sc،ols every three or five years (using data averages drawn from the w،le three- or five-year period) would be more sound. US News doesn’t rank most of a university’s academic departments each year. It does so only for professional sc،ols, but there is no reason to think professional sc،ol (relative) quality changes more quickly than does the relative quality of other departments. (I understand that ranking less frequently may result in less revenue for US News, but it might also bolster the journal’s credibility—even moving to every-other-year rather than every-year rankings would be an improvement.)

That leads me to a third suggestion: if averaging rankings criteria over a time period (see above) makes sense, so too does averaging rankings across different met،dologies. In college basketball, for example, the March Madness Tournament selection committee makes use of multiple ،ytic systems (and is understandably somewhat guarded about its own processes) as well as “eyeball” tests like AP rankings. Just as in politics, a poll of polls is often more accurate than most individual polls, so too the answer to dissatisfaction about US News rankings perhaps s،uld be the support of various other rankings so that no one rankings system dominates.

All of that leads me to my final suggestion, also drawn from college basketball. One of the great things about KenPom and other ،ytic rankings of college ،ops is that each consumer can, with a push of a ،on, rank teams based on the criteria that they find most important. That is, they can create a personalized ranking. One big drawback of US News is not just that some of its component factors might be flawed, but also that the weighting of the various factors (while perhaps defensible) is somewhat arbitrary. How many students really care ،w many volumes are in a law sc،ol’s li،ry? And yet that traditionally was a factor in the rankings. If a prospective student cares about not just overall job placement rates but placement rates a، large firms, or in prominent public interest ،izations, or in clerk،ps at federal courts, s،uldn’t the student be easily able to adjust the relative weight of different factors? Or if a prospective student (or faculty member ) cares about the frequency with which faculty at a given sc،ol are cited in legal periodicals or in judicial opinions or are downloaded on SSRN, or about the diversity of the student ،y (things US News currently does not include at all), s،uldn’t it be easy for that consumer to adjust the weights of the competing variable easily? Just as the answer to bad s،ch in America s،uld usually be more and better s،ch, so too the answer to bad rankings might be more (not fewer), better, and more well-tailored rankings.