Hello Hogs Haven readers. By way of introduction, I am a long time Hogs Haven reader and over the last few months I have started commenting in discussion threads under the handle MattInBrisVegas. In a previous life, I was a neuroscientist and worked a fair bit with “signal detection theory”, which is a branch of statistics devoted to measuring the performance of humans and other classifiers to detect faint signals in noise and make fine distinctions between similar but different things. As I have been reading articles and comment thread discussions, it has struck me that a major part of draft selection is a discrimination problem – that is distinguishing between players on the basis of estimated future NFL career potential - which is precisely the type of problem that signal detection theory was developed to address.
In this article, I will explain the basic principles of signal detection theory in terms of the NFL draft process, and hopefully use them to shed light on a few of the recurring controversies in draft strategy discussions. For example, should teams stick to a strict Best Player Available (BPA) strategy, is it OK to reach down the board to select a player at a position of need, and if so, how far down the board can teams go before making significant compromises on player quality?
For any readers with painful memories of high school or college statistics classes, don’t worry. The basic premise of signal detection analysis is really quite simple, and I will do my best to avoid overly technical terminology as far as possible.
Using Signal Detection Theory to Measure Draft Discrimination Performance
What is signal detection theory and what does it have to do with the NFL draft? Signal detection theory got its start in World War II, as the US Navy signal corps was trying to figure out how to distinguish Japanese planes from flocks of birds on their newly developed radar screens. It really took off after the war, as scientists at Bell Labs were trying to figure out just how cheaply they could make phone lines that were still usable. Since then, it has established itself as the best method to describe performance in any task where a classifier (e.g. human observer, diagnostic test, machine learning algorithm) has to detect a signal in noise, or detect slight differences between relatively similar groups of things.
Now consider the task facing an NFL GM on draft night. His team has digested scouting information on 300+ players, used that information to rate each player’s future NFL potential, and then ranked those players based on their ratings to form a big board. His team is on the clock at, say, pick #15 and he’s facing a dilemma. The two top ranked players left on his board are defensive linemen, but that’s the one position group that’s well stocked on the roster, while the next player after that is an edge rusher where he’s got a current vacancy, and eight players down the board is one of the top QB prospects, which is both the most important position and his biggest need. Should he pass up the BPA to pick the edge rusher he really needs, or reach down to select the QB? If he does reach for the QB is he really compromising his team’s future talent to address an immediate need?
The answer to these questions comes down to how well his rankings are able to predict differences in career potential between players. If he can’t actually distinguish between players ranked 8 positions apart, there’s no real reason not to take the QB or the edge rusher. The best way to get a handle on that is to look at his past performance in predicting outcomes between the players he picked and the ones he passed up, and signal detection analysis provides the tools to do just that.
Sadly, the data we’d need to look at performance of individual GMs just isn’t available. For that we would need to have access to teams’ draft boards, which are closely guarded secrets. We’d also need data going back 30-plus years to get a sample large enough to be considered adequate for statistical analysis, which is longer than most working GMs’ tenures. But to get some idea of how precise or imprecise draft rankings might be, what we can do is look at the performance of the league as a whole by looking at outcomes of the draft selection process.
To get an idea how this works, let’s look at some numbers. Thanks to helpful pointers from Hogs Haven’s resident analytics guru, James Dorsett, I’ve compiled Career Average Value (CarAV) statistics for the first four rounds of the draft from 1986 to 2016. AV is a statistic developed by Pro Football Reference that attempts to place a single value on each player-year, to make it possible (more or less) to compare players across years and positions. It’s not perfect, but to do any better we’d have to roll our own metric and that’s a job I’ll leave to guys like James. CarAV is a sum of the AV values from a player’s best 3 years in the league, weighted to give greatest emphasis on peak performance.
A 30-year range was picked to give a large enough sample size to be able to see statistical effects, and the 2016 was used as the cut-off to give time for the most recently drafted players to accumulate AV. Ideally, I would have picked 50 years or more, but the further you go back, the more drafting behaviour is likely to have changed, as well as the structure of the league and game schedules, so I wanted to keep it as recent as possible.
The first figure shows the average CarAV value (black dots) for players selected at each pick number during the 30-year period. The numbers bounce around a fair bit, but overall, provide a very good fit to an exponential trend line. There is a sharp drop-off in average CarAV following the first overall pick, but then the steepness of the drop off flattens out a lot by the middle of the first round and continues to get flatter as the draft progresses. I’ve also plotted the standard deviations of the CarAV values (gray dots) to give an idea of just how much variation there is at each draft position and, as you can see, there is a lot.
Now, let’s use signal detection theory to look at how good NFL teams have been at predicting career outcomes in the draft. Sticking with the example above, we want to see if the 30 players in our sample at pick 15, as a group, are significantly better than the players drafted at picks 16 to 25. The basic idea of signal detection analysis is really very simple. Two groups of players, for example those drafted at picks 15 and 23, are discriminable if the difference in average CarAV values at the two positions is large relative to the variation in CarAV values at the two positions. This concept gives us the “Sensitivity Index”, better known as d’ (pronounced “d prime”), which provides a metric of how well the league’s GMs did at discriminating between the two groups of players based on career potential. Sensitivity Index is calculated using the following formula:
Got that? OK, let me explain. The discriminability (d’) of the two sets of players’ CarAV values is equal to the difference in average CarAV values (mCarAV1, mCarAV2) divided by their pooled standard deviations (sd1, sd2). So, it’s really just the ratio of difference in average CarAV to the variance in CarAV values at two draft positions.
I realize d’ values can be kind of tricky to interpret, because they are in dimensionless units, but fortunately, signal detection theory allows us to relate them back to an easy to interpret value. d’ is mathematically related to the metric that the Navy signal corps started out with when they were trying to figure out how to interpret radar signals, which is known as Percent Correct. As the name implies, this is approximately the percent of correct choices a GM would make picking the best player out of pairs of players drawn from the two groups.
The Precision of Draft Selection
Now, let’s see how the NFL GMs did. The next two plots show discrimination performance for GMs picking first overall and at pick #15. To do this I have calculated d’ between players drafted at those picks and each successive draft position for over a full round. Discrimination performance at pick #1 starts out above chance. The d’ value between picks 1 and 2 is 0.33, which corresponds to approximately 59% correct. 59% might sound pretty good at first. But remember, here we are estimating how good the GMs are at picking the better of two similarly ranked players, and chance performance is 50%. So, at a d’ of 0.33 they are performing slightly better than drawing names from a hat.
How far out do we have to go before you we can say that GMs can reliably pick the better player? That depends on what you call reliable. A d’ value of 1 corresponds to approximately 76% correct, or slightly better than halfway between a coin flip and perfect performance. To get to that level of performance we have to ride the trend line all the way out to pick number 30, or as little as pick 21 if you are comfortable picking the first point where d’ is greater than 1. Is that surprising? The league as a whole is half-way decent at detecting differences in career potential between players it selects first overall and players selected at pick #30 (or #21).
At pick #15 it’s even harder to pick the differences between players. This time, discrimination performance starts at d’ values around 0, which equates to random guessing. The league’s GMs are completely unable to tell the difference between their guys at #15 and about the next 6 to 9 players. Sixty-four positions later, yes two whole rounds, the trend line hasn’t even reached a d’ value of 1. This is actually what you’d expect if you look back to the first figure I showed, because pick #15 is about where the slope of the plot of CarAV vs draft position transitions from steep to shallow. As the slope gets less steep, the differences between average CarAV of players taken at successive positions become smaller and smaller. As a result, the further the draft progresses, the harder it becomes to pick the better player.
Calculating the BPA window
So, how far down the draft board can a GM look to fill a need without reaching? In other words, how far can he go before he can reliably detect a decrease in career potential? The answer depends on two things. First, as we’ve just seen, is where he’s picking. The later in the draft he’s picking the bigger the window gets. The second is what he defines as a reach.
To illustrate I’m going to calculate the discrimination limits, or the “BPA window”, for two GMs. The first GM, Bill is a dyed in the wool BPA guy, and the last thing he’d want to do is reach for a less talented player. For Bill, I’m going to set a really minimal threshold for detecting a decrease in career potential of d’ = 0.3. Bill is going to stick to the top of his board at the slightest hint that the next player along is a lesser talent.
Mark is a value vs need guy, so he’s less concerned about a slight difference in career potential if the next player along represents greater value to his team. For Mark we’ll set a more rigorous discrimination threshold of d’ = 0.5.
The final graph plots discrimination limits/BPA windows for Bill (blue) and Mark (orange) at draft positions 1 to 33. To get these figures, I’ve plotted d’ prime vs draft position at each pick # shown on the horizontal axis (like in the two previous figures), fit the scatter of d’ values to a trend line to smooth out the noise, and determined the pick # at which the trend line exceeds Bill and Mark’s discrimination thresholds. The resulting discrimination limits are essentially Bill and Mark’s BPA windows – the number of picks each GM can comfortably go from the top of his board without getting worried that he’s reaching.
At first overall pick the window is 1 pick for Bill – true BPA – and about 5 picks for Mark (or 10 if you go with the trend line). By pick 15, Bill’s window is up to about 15 picks and Mark’s is up to 30. And by the beginning of the second round, Bill’s window is approaching 20 picks and Mark is up to 35. The BPA windows are expected to grow as the draft progresses, but at an ever-decreasing rate as the difference between CarAV values at successive positions gets smaller and smaller.
Conclusion and Discussion
What does this mean for draft strategy? My main conclusion is that, with the possible exception of the first overall pick, a strict interpretation of BPA, in which a GM is compelled to take the first player left on his board, regardless of how that player fits his roster needs, is nonsense.
The methods that NFL teams use to rank players, as reflected in the outcomes of past draft selections, does not appear to have anything close to that level of precision. Instead, the results of this analysis suggest that, as the draft progresses, a GM is facing an ever-growing window of players, near the top of his board, who are practically indistinguishable in terms of estimated career potential. Exactly how large that window is depends on a number of factors and is ultimately difficult to determine.
I would caution against taking the actual figures I have provided literally, because there are several limitations to the approach I have used. In particular, the CarAV statistic is only an approximate metric, and for purposes of this analysis, I have treated all players, positions and GMs equally, which they are not. A real GM seeking to develop a draft strategy that aims to maximize value of each selection, while recognising the limits of his ability to resolve differences between closely ranked players, would most likely settle on a more sophisticated strategy than simply setting a variable window of x players at each draft position. But I hope that the analysis I have presented has convinced at least a few readers that there are significant limits to the precision of draft selection and that this limitation has to be taken into account when considering draft strategy.
How a GM might incorporate these results into his process is probably a subject for another article, and I’d be happy to share my thoughts in the comments. But I would like to leave you with one final thought concerning the perennial BPA vs draft for need debate. Many readers clearly recognise that strict BPA is a bit of a straw man, and my results reinforce that view.
But I wouldn’t say that these results debunk the basic premise of BPA in any way. Instead, what they say is that a GM looking to identify the best player available is out of luck, because the tools at his disposal lack the precision to distinguish that player from similarly ranked players. And I think that provides a way to reconcile the BPA and need-vs-value philosophies. Provided that he picks from within that BPA window, he’s probably got the option to select a player that coincides with a roster need without reaching to fill that need.
What should Bruce Allen do when he’s on the clock at #15?
This poll is closed
Pick the player at the top of his board, regardless of position or need
Pick the best edge defender, interior OL, QB, or WR, since those are his biggest needs
Option 2, but scratch the OL and WR, since they don’t match where the value is in the first round of this draft
Pick the player near the top of his board who represents the best value for the team
Trade down to the late 20’s for more picks, because draft outcomes at pick #15 are practically indistinguishable from pick #25
Reach for a QB like Drew Lock or Daniel Jones
Pick the player who had the best measurables at the combine
Step out for a Coors Light and leave the hard choices to the trained professionals
It’s a trick question, he already traded up for Kyler Murray