clock menu more-arrow no yes

Filed under:

Statistics, Bias and the Draft, Part 1

New, comments

What should Redskins’ fans expect from the second overall pick?

COLLEGE FOOTBALL: SEP 21 Miami OH at Ohio State Photo by Adam Lacy/Icon Sportswire via Getty Images

Well, my fellow Redskins fans, it’s happened again. After a long offseason filled with hope, highlighted this time by one of the most highly-rated draft hauls in the league, featuring a potential franchise quarterback, an athletic freak pass rusher who had no business still being on the board at 26, and what looks like the team’s first true #1 receiver drafted since Art Monk, the regular season had to start. And by about the third quarter of the season opener, it began to dawn on Redskins fans that this team was not going to live up to their ever-optimistic 10-6 predictions, which eventually swirled into a 3-13 vortex of dysfunction and incompetence that gave dumpster fires a bad name. And just like about 18 of the 20 previous seasons, hope turned into despair.

But this time there is at least a silver lining for those few of us who have stuck with our team through all of that. The 2019 season turned out so spectacularly awful that Bruce Allen was finally fired and the new management team and coaching staff are heading into the draft holding the number two overall pick. Last year at this time, facing smoking wreckage at the quarterback position, a major theme of draft discussions among Redskins fans was whether the team should draft for need and pick a quarterback at number 15, or pick the best player available, maybe TJ Hockenson or Montez Sweat.

But this year, with #2 overall in play and a small handful of quarterback-needy teams eyeing the two to three quarterback prospects after Joe Burrow, the talk is instead about whether the team should take the consensus best player in the draft, Chase Young, or trade down to add a franchise-building haul of additional picks.

It’s not a no-brainer by any means for Kyle Smith or whoever is running the Redskins’ draft this year. On the one hand, Chase Young is a clear standout prospect at a high-impact position, where the current starter appears to be entering the declining phase of his career. If Chase lives up to the hype, would you want to be the GM who passed on the next J.J. Watt? On the other hand, the analytics crowd have been telling us for a while that NFL teams overvalue high first-round draft picks, which, if true, creates an opportunity to trade down with a team chasing a quarterback (usually), for multiple draft picks representing greater value to the team in the long run than a single elite prospect.

I am originally a neuroscientist by training, as well as being an unabashed total homer Redskins fan of 50+ years. While I know less about football than probably most regular posters on Hogs Haven, in my scientific past I had to deal with problems that have a lot of mathematical similarity to the ones facing NFL teams during the draft. Last draft season, I couldn’t help but notice that discussions about picking the best player available were missing something very important – a proper recognition of the level of uncertainty associated with draft evaluations and how that impacts teams’ ability to make meaningful distinctions between players on their draft boards.

That led me to write an article for Hogs Haven in which I used a branch of statistics known as signal detection theory to estimate the level of precision of draft selection. The results suggested that NFL teams picking in the middle of the first round have limited ability to distinguish between the top player on their board and the next 15 to 30 prospects, which makes a bit of a mess of a strict interpretation of BPA.

This year I’d like to keep to a similar theme and use more basic draft statistics to examine the reality behind fans’ expectations of high first-round draft picks that are commonly expressed in discussions about what to do with the #2 pick on Hogs Haven. I’ll have a good look at the “sure thing” draft prospect, the day-one impact player, and how picks are valued for draft trades. That won’t all fit in a single article. As a starting point, in this article I will have a look at how the basic statistical properties of the draft compare to fans’ expectations and how the distributions of player outcomes compare across picks at the top end of the draft.


What should we really expect at #2?

A common theme in HH draft discussions this year is that we should expect to get an elite, maybe generational talent such as Chase Young at #2. It’s often implied, that coming away with anything less would be a major failure. A fair number of posters have stated expectations of future Hall of Famers, perennial All-Pros, or at least day-one impact players.

There is some basis to that. A lot of really good players have been selected at #2. The 30-year sample I pulled together for last year’s article included a Hall of Famer (Marshall Faulk) and 6 other players who also made 1st team All-Pro more than once (3x Julius Peppers , Von Miller , Ndamukong Suh, Tony Boselli , Calvin Johnson; 2x Bill Fralic), as well as Donovan McNabb, who didn’t win that honor despite being very good. But the thing is, those are the best players selected at #2 in 30 years.

I don’t want to be the one to cast shade on people’s optimism, but from a statistical perspective, basing your expectations on the best-case scenario is setting yourself up for disappointment. A more sane approach is to set your expectations on the average of past outcomes. That way you will be pleasantly surprised as often as you are disappointed.

What then does the average player selected at #2 overall really look like? To answer that question, let’s take a look at the distribution of player outcomes at draft pick #2 from 1985-2015. Like last year, this snapshot makes use of Career AV (CarAV), an advanced statistic developed by Pro Football Reference, which attempts to place a single value on each NFL player’s career, regardless of position, through a weighted sum of their best seasons emphasizing peak performance. While CarAV has its limitations, it provides a useful metric to compare large samples of players at different positions. I’m limited here to using players drafted more than 4 years ago, because it takes about 5 years for CarAV to build up.

Draft outcomes at pick #2, 1985-2015

The 31-year distribution of CarAV values is shown in the first figure. The players I mentioned above mostly occupy the two bins on the far right, with Marshall Faulk maxing things out at a whopping CarAV = 133. Bill Fralic just missed out the top 12, at #14 with a respectable CarAV of 58. You may notice the distribution is somewhat skewed, with a relatively thinner tail on the right, made up of a smaller number of players with very high CarAV values, and a big peak to the left of center made up of players with lower CarAV values. The median of the distribution, marking the half-way point between low and high values, is CarAV = 48, Chris Long (DE, STL). The four players on either side of the median, representing the average range, are Tony Casillas (DT, ATL, CarAV 57), Eric Turner (DB, CLE, CarAV 51), LaVar Arrington (LB, WAS, AV 46) and Marcus Mariota (QB, TEN, CarAV 44). Those are some good players for the most part; and adding most of them in their primes would upgrade a position on the Redskins roster, although in some cases that might be a backup (e.g. Mariota > Colt). But they are not exactly the elite talents that I think we’re all hoping for and many fans seem to expect at #2 overall.

And those are the average talents of the group. Keep in mind that half of the players selected at #2 had lower CarAV values than Chris Long. Below Marcus Mariotta (ranked 18/31 players), we start to see players who are often labelled as “didn’t perform to his draft position” to “outright bust” like Robert Gallery (T, OAK, CarAV 38, rank 21/31) and RG3 (CarAV 36, rank 22/31), who in reality are not that far below average of players selected at #2. And rounding up the bottom is a player whose name has come to be synonymous with draft bust, Ryan Leaf (CarAv 1).

What could be going on here? Why do fans and most media commentators seem to expect much better outcomes from high draft selections than NFL front offices are able to achieve on a regular basis? Perhaps NFL GMs picking at the top of the draft, who tend to work for the worst teams, are just really bad at what they do. I suppose that could be a factor, but as we’ll see, the variation in outcomes doesn’t get a lot better as we go deeper into the draft. I believe the answer lies in a well-known psychological phenomenon and persistent bugbear for scientists, selection bias.

Psychologists and scientists use the term bias somewhat differently to common usage. In scientific discussion, bias refers to an innate, usually unconscious, tendency we all have to see things in a way that supports our world view. The most common bias in science is the tendency to select inputs to experiments or cherry pick results to favor the preferred hypothesis (mainly because it’s nearly impossible to publish negative results). Good scientific practice involves careful precautions to prevent bias from influencing the results, such as random and blind sampling. The form of bias that NFL fans appear to be exhibiting when they expect better results than have been achieved historically is a sub-type of selection bias, known as Survivorship Bias. From Wikipedia:

Survivorship bias or survival bias is the logical error of concentrating on the people or things that made it past some selection process and overlooking those that did not, typically because of their lack of visibility. This can lead to false conclusions in several different ways…

… Survivorship bias can lead to overly optimistic beliefs because failures are ignored, such as when companies that no longer exist are excluded from analyses of financial performance.”

Getting back to the draft, when Redskins fans think about how selecting Chase Young at number 2 overall will transform the defense, we tend to forget that a lot of prospects before him had similar reviews heading into the draft, but ended up in the lower half of the CarAV distribution. For example, Robert Gallery was widely heralded as the best offensive tackle prospect to enter the draft in years and received the same level of “sure thing” hype that Chase Young is getting now. Looking back, the pre-draft profiles bear an uncanny similarity to what is being written about Chase Young now. Here is what legendary Redskins offensive line coach Joe Bugel had to say about Robert Gallery heading into the draft:

”Whoever drafts him is going to get a 15-year Pro Bowl player,…I can’t find anything wrong with him.”

In fairness, he had a decent career at guard. He just never became the generational left tackle that pretty much every media analyst had him slated to be ahead of the draft. Even the prototypical draft bust, Ryan Leaf, was seen as competitive with Peyton Manning heading into the 1998 draft. Although consensus tended to favor Manning as the draft neared, there was little doubt among analysts that Leaf was a worthy choice at #2. And between RG3 and Ryan Leaf in this ranking there are a bunch of players that most fans have never heard of, but some NFL team’s draft crew thought was worth picking at #2.


How does #2 compare to lower first round picks?

If the Redskins aren’t absolutely guaranteed to land a generational talent at #2, as some of us have come to believe, maybe it’s worth exploring trade offers. Assuming that Burrow to Cincinnati at #1 is a lock, that only leaves two QBs with first-round grades to feed the two or more QB-needy teams picking in the top 10 and maybe three or four other teams that might look to address the position in the first round. That’s a pretty good setup for a bidding war for the Redskins’ pick. And if any teams have Chase Young rated like the media analysts do, I suppose it’s always possible that someone might want to trade up to pick a generational edge rusher. The most likely team to trade up would seem to be Miami who are reported to be fixated on Tua Tagovailoa, and might fear losing him to San Diego or Detroit. With that in mind, I’d like to take a look at how the Redskins’ #2 overall pick compares to some of the picks that Miami might include in a trade deal.

A lot of fans’ and draft analysts’ thinking about the relative value of draft picks seems to be influenced by the Jimmy Johnson Trade Value Chart (TVC). It is said that NFL teams use it, although its validity is being increasingly questioned with the rise of more modern analytics-based approaches. As far as I can tell, it’s never been stated where Jimmy, or his colleague Mike McCoy got the numbers from, but the TVC places very high values on high first-round picks. According to the TVC, the Redskins’ #2 pick is worth 2600 points. Miami holds picks #5, 18, and 26 in the first round, worth 1700, 900, and 700 points respectively, and picks #39 and 56 in the second worth 510 and 340. Let’s see how those picks compare in terms of historical average CarAV, as shown in figure 2.

Draft outcome distributions across pick #s, 1985-2015

Here I’m using box and whisker plots, a type of statistical graph designed for comparing distributions. The boxes show the middle peak of each distribution, containing the middle half of the values around the median, which is represented by the line through the middle of each box. The whiskers extend from the boxes to the maximum and minimum CarAV value at each peak number, except for outliers, points which sit well outside their distribution which are shown as little circles, like the point at CarAV 158 at pick #26. That little dot is Ray Lewis. The x in each box is the average CarAV value. These tend to be a bit higher than the medians because the distributions are mostly skewed to lower values like in Figure 1.

Two things stand out to me about this graph. First, while the median and average CarAV values do generally fall with increasing pick number (with one exception at #18), there is a huge amount of overlap between the player values at all pick numbers within round 1. That means that many players picked late in Round 1 turned out better than players picked in the top 5. Maybe that shouldn’t come as a huge surprise. While pick #2 only produced one Hall of Famer in the 31-year sample, pick #5 produced three (LaDanian Tomlinson, Junior Seau, Deion Sanders) and the same number of multiple-time AP 1st team All-Pros (Tomlinson 3, Seau 6, Sanders 6, Patrick Peterson 3, Khalil Mack 3, Eric Berry 3) with several more total appearances between them. The best player across all these picks, according to CarAV, is our outlier at pick #26, Ray Lewis, who is widely considered to have been the critical addition that led the 2000 Baltimore Ravens to the club’s first Super Bowl championship, powered by one of the greatest defenses of the modern era.

The second thing that stands out is something that Bill-in-Bangkok wrote about in Hogs Haven about this time last year. The average CarAV values at later first and second-round picks don’t fall as steeply as we might expect based on the Trade Value Chart. To illustrate that more clearly, I’ve plotted the average CarAV values and the TVC values for the same picks on a common scale in the next figure:

To convert the TVC to the scale of the CarAV plot, I divided their values by their maximum (2600 at pick 2) and then multiplied them by the maximum median CarAV value (48 at pick 2). As we’ve seen before, in a slightly different format, the TVC values drop very steeply from the beginning to end of Round 1, while the CarAV values hold their value at later draft picks better than the TVC would predict. To the extent that median or average CarAV represents our expectation about the value of future draft picks, this suggests that the TVC undervalues later draft picks relative to earlier ones. For example, according to the trade value chart, Pick #2 has 3.7 times the value of Pick #26, where the most optimistic of us might hope to land the next Ray Lewis; while according to measured historical CarAV values, it is only 1.6 times more valuable.

Based on this comparison, it would appear that a team holding the #2 overall pick could maximize its chance of adding quality players by accepting a trade down offer, valued based on the TVC, from a team offering a package of later first and/or second-round picks.


Uncertainty rules

What we’ve seen, thus far, is that, while the average value of players selected falls steadily as the draft progresses, there is so much variability at each pick number that it’s not very easy to separate picks in the first round. While the chance of picking an all-star player is higher at pick #2 than at pick #26, the difference does not appear to be as big as people think. Conversely, there’s a much higher chance of picking a dud in the top 5 than anyone wants to admit. So why is that? Shouldn’t professional talent evaluators be better at spotting elite talents when they are picking early in the first round? And how is it that every draft starting quality players last until the later rounds and go undrafted?

The answer to those questions gets to what draft selection really is and isn’t. And I’ll tell you at the start, the problem is not with the talent evaluators but with the task they’ve been given. To illustrate that, I need to introduce some terminology: deterministic and probabilistic.

The behaviour of a deterministic system is governed by a finite number of well-defined and easily measured variables. An example is rocket trajectory. The landing spot of a rocket is determined by a few simple physical parameters. If a guy at NASA knows the weight of the rocket, the force of the engine, and the direction it’s pointed, and the wind direction and speed, he can plug those variables into a formula to predict where it will land with a high level of accuracy and repeatability. Actually, the wind doesn’t always stay constant, and that can introduce some variability into the landing spots, which reduces the precision of the NASA guy’s predictions. The more changeable the wind becomes, the more probabilistic the system becomes.

Complex systems, like the stock market, tend to be highly probabilistic. The behavior of a probabilistic system is often determined by many variables, at least some of which are hard to measure or unknown. Highly probabilistic systems lend themselves to partially or weakly predictive models. The best you can hope for in this case is a model that predicts outcomes correctly more often than it gets them wrong. A good example of a probabilistic system is weather forecasting.

You’ve probably guessed what kind of system the NFL draft is, but in case you have any doubt take a look at the next figure. This is a plot of the CarAV numbers for all the players drafted in rounds 1 to 3 from 1986 to 2016, a bit over 2,900 players.

All draft picks in first three rounds, 1986-2016

The red line is a best fit curve to the mean CarAV at each pick number. I’ve labeled some of the highest valued players at various pick numbers (PM = Peyton Manning; RL = Ray Lewis; DB = Drew Brees; BF = Brett Farvre; MS = Michael Strahan; RB = Ronde Barber; JT = Jason Taylor; WS = Wes Shield).

As we’ve been seeing, there’s a trend to higher average player career values earlier in the draft and lower values later in the draft. But there is so much variability that it’s kind of hard to see. This plot represents the total output of the collective draft decision process by NFL front offices. And we can use that best fit curve to get a rough estimate of the predictive power of their selection process. Curve fitting finds a line that splits the CarAV values at each pick number, so that an equal amount of the variance of CarAV values sits on each side of the line. The line is described by an equation, which allows us to calculate a predicted value of CarAV if we plug in the pick number, and a value called the fit coefficient (R), which quantifies how well the fit line predicts CarAV. If I’d fit the data to a straight line, then R becomes the correlation coefficient. But a logarithmic function provides a better fit, so I’m sticking with it. That doesn’t really matter.

What is important is that the square of the fit coefficient (R2) tells us the proportion of the variance in CarAV values that’s explained by draft pick # (if we assume a logarithmic relationship), which is called the explained variance. In this case R2 = 0.15. In other words, the decision process that NFL teams use to select players explains 15% of the variance in player outcomes. Or, looking at it the other way, 85% of the total variation in draft outcomes is not explained by draft selection, as reflected by pick order. That’s what you call a weakly predictive model (or models, since each NFL team does its own thing).

That result might come as a surprise to some HH readers. The discussion threads lately are full of comments suggesting that teams have to stay in the top 5 (or 10) to have a decent shot at the elite talents, and that players available in the late first and later rounds are destined to be just solid contributors. Those sentiments imply that NFL talent evaluations have much more predictive power than the measured outcomes indicate they do.

Before I go much further with that line of thinking, though, I have to throw up a big caveat. The 15% explained variance I calculated almost certainly underestimates teams’ predictive ability. There are two main reasons for that. The first has to do with draft strategy. My analysis requires an assumption that each team is trying to pick the best player available when they are on the board regardless of position value (since R2 quantifies how well draft position predicts CarAV, and CarAV is designed to give equivalent values to players regardless of position), and we know that’s not always true.

Teams certainly factor position value into player rankings, which adds to the unexplained variance, since players at high value positions (QB, LT, edge, etc) will be selected earlier than they would be based on talent evaluation alone, and players at low value positions (P, K, LS, maybe others) will be picked later than they otherwise would be. Teams reaching to fill needs will have a similar effect. And finally, teams looking for instant starters in the draft might sacrifice long term upside to pick a more NFL-ready player. All of these practices will tend to blur the relationship between pick number and CarAV.

The second confounding factor is that accidents happen. There is a good deal of attrition of drafted players due to injuries, unpredictable events and life decisions outside of football which all have the effect of lowering the CarAV values of highly talented players, such as Bo Jackson (1986 pick #1, CarAV 22), Sean Taylor (2004 pick #5, CarAV 31), and Chris Borland (2014 pick #77, CarAV 6).

But even if NFL teams are twice or three times as good at selecting players than I have estimated, that still leaves the majority of the variance in player outcomes unexplained by the selection process. The exact figure isn’t really that important. The main point is that it’s clearly large – meaning that the draft has a major element of unpredictability. Teams can do a lot to increase their chance of picking the right player through scouting, measurements and interviews, but a lot of the information they would need to really accurately predict how well players will perform in the NFL is either unknowable or very hard to measure. A few of the major sources of unpredictability deserve comment:

Projecting out of range. Perhaps the biggest problem facing NFL scouts is that all of their film study is based on their prospect they are interested in playing against college kids, and more often than not in college systems. Predicting how they will play in the NFL based on watching college tape adds whole layers of uncertainty. Extrapolating how players will perform in an NFL system, facing NFL-caliber talent based on observations of college tape requires a fair amount of guesswork.

Comparing apples to oranges. To put together a big board, NFL teams have to rank players at different positions on the same scale. But most of the skills and attributes that make a left tackle and a cornerback good at what they do are completely different. Ask any statistician to come up with a valid method to do that and they’ll just laugh and hope you go away. NFL teams have to find some way to do it, no matter how dodgy, but any method that tries to rank players based on fundamentally different attributes is bound to be subject to questionable assumptions and huge levels of uncertainty.

Growth and development. Many draft prospects are still growing and nearly all are still learning how to play football. NFL teams have to project how they will develop physically, in their understanding of the game and processing speed, adding further levels of uncertainty.

Intangibles. Mental and character attributes, are extremely important for determining NFL success, but notoriously difficult to measure. This is something I know very well, having spent more than my fair share of time as the “hard scientist” in psychology departments, and much of that time laughing at my psychology colleagues’ small effect sizes. Those guys get really excited if they can explain 5% of the variance in an experiment. How do you quantify a quarterback’s ability to inspire confidence in his teammates when they are 10 points down with 4 minutes left, or predict whether a player with questionable judgment will mature and grow out of it? The answer is probably “not very well.”


Summary and Conclusions

To sum things up, the main point that I hope to have got across is that there is a huge amount of variability in player outcomes at all positions in the draft, including high first-round picks. The variability of draft outcomes suggests that NFL teams have a fairly limited ability to identify the best players in the draft, although the strength of that conclusion is diluted to some extent by the fact that teams are not always simply pursuing the highest rated players on their boards and by confounding effects of unpredictable events like injuries. Nevertheless, at every pick near the top half of the first round we can find plenty of players who just didn’t live up to their draft projections; and plenty of future all-stars managed to avoid being selected until the later rounds or went undrafted.

To help understand what is going on, I discussed two kinds of systems. A deterministic system produces highly predictable results, while a probabilistic system has a significant element of unpredictability. The behavior of deterministic systems is usually determined by a small number of variables that are easily measured. Deterministic systems lend themselves to statements like “Chase Young is guaranteed to be a perennial All-Pro. It would be insane to pass him up for some picks in the late first round and second round”.

The observed draft outcomes are consistent with the behavior of a probabilistic system. The outcomes of a probabilistic system are more unpredictable because it is not possible to know everything you need to know to eliminate all the sources of uncertainty. A GM who realizes he’s working with a probabilistic system might be thinking more along the lines of, “Chase Young has the best chance of being an All-Pro of any player on my board, but Isaiah Simmons is not that far behind, and Xavier McKinney could turn out to be the best of the bunch.”

The expectations of high first-round draft picks expressed by many NFL fans and media commentators appear to be filtered by survival bias, resulting in overvaluation of early draft picks. We tend to remember the high first round picks that have become NFL superstars, but seem to forget about the higher number who had just good or worse NFL careers. And very few media outlets ever revisit their pre-draft writeups of the next “generational” prospects, who seem to enter the draft every few years.

My main purpose here was to lay the groundwork by describing some of the basic statistical properties of the draft. Over the coming weeks I hope to examine some of the more common assertions about elite draft prospects and high first-round draft picks in more detail.

Poll

Given how uncertain draft outcomes actually are, and multiple needs across the roster, what should the Redskins do with the second overall pick?

This poll is closed

  • 12%
    Screw your statistics, Chase Young is a generational talent. No brainer.
    (82 votes)
  • 25%
    Chase Young is the best player in the draft, take him.
    (163 votes)
  • 51%
    Trade down in the top 10 and add additional 1st/2nd round picks.
    (328 votes)
  • 3%
    Trade down for a 1st rounder and a large package of later round pick.
    (22 votes)
  • 0%
    I like someone better at #2, explain in the comments.
    (3 votes)
  • 0%
    Trade #2 for a vet, explain in the comments.
    (1 vote)
  • 0%
    Something else
    (2 votes)
  • 5%
    I don’t have strong feelings either way. We can’t possibly screw this one up.
    (35 votes)
636 votes total Vote Now