Micah is a mathematician who likes to use pictures to understand things. He runs a website, hockeyviz.com, where he stores pictures about hockey. He lives in Halifax, Nova Scotia with his wife and his two children.

Episode Notes

Micah | Twitter | Site

Bubble physics
Python
Beautiful Soup
svgwrite
Matplotlib
Line-width illusion

Related Episodes

Episode #238: Jeremy Ney
Episode #237: Tristan Gullevin
Episode #194: Charlie Smart

Sponsor

Use my special link (https://zen.ai/policyviz12) to save 12% at blendjet.com. The discount will be applied at checkout!

New Ways to Support the Show!

With more than 200 guests and eight seasons of episodes, the PolicyViz Podcast is one of the longest-running data visualization podcasts around. You can support the show by downloading and listening, following the work of my guests, and sharing the show with your networks. I’m grateful to everyone who listens and supports the show, and now I’m offering new exciting ways for you to support the show financially. You can check out the special paid version of my newsletter, receive text messages with special data visualization tips, or go to the simplified Patreon platform. Whichever you choose, you’ll be sure to get great content to your inbox or phone every week!

Transcript

This is Jack Schwabish, and I want to do the ad read for this week’s episode of the PolicyViz podcast. After baseball practice, I’m very hungry, but lucky for me, I use my BlendJet 2 make a delicious shake. The BlendJet 2 is portable, so you can make a smoothie at home or a protein shake at the gym. It is small enough to fit in a cup holder, but powerful enough to blast through tough ingredients like ice and frozen fruit with ease. It lasts for 15 plus blends and recharges quickly via USB-C. It is whisper quiet, so you can make your morning smoothie without waking up the whole house. Best of all, BlendJet 2 cleans itself, just blend water with a drop of soap, and you’re good to go. What are you waiting for? Go to blendjet.com and grab yours today. Be sure to use the promo code, POLICYVIZ12 to get 12% off your order and free two-day shipping. No other portable blender on the market comes close to the quality, power, and innovation of the BlendJet 2. They’ll guarantee it or your money back. Blend anytime, anywhere with the BlendJet 2 portable blender. Go to blendjet.com and use the code POLICYVIZ12 to get 12% off your order, and free two-day shipping. Shop today and get the best deal ever. 

Welcome back to the PolicyViz podcast. I am your host, Jon Schwabish. On this week’s episode of the show, we explore the interaction between data, data visualization, and hockey. I am joined by the creator behind the website hockeyviz.net, Micah McCurdy, who pulls in real time NHL data to create a variety of exciting data visualizations and data tools around hockey data. We have a really exciting conversation about the tools that he uses to collect the data to clean the data, and, of course, to visualize the data. We also talk a lot about the balance between static visualizations and interactive visualizations, and why he focuses his attention on static visualizations. And then, of course, we talk about Connor McDavid, the current playoffs picture, and what he thinks is going to happen to a couple of teams in the next couple of years. So I hope you’ll enjoy this interesting episode on the intersection between data and DataViz and hockey. And so, here’s my conversation with Micah McCurdy. 

Jon Schwabish: Hey, Micah, welcome to the show. Great to have you on. 

Micah McCurdy: Thanks for having me. 

JS: VERY excited, we are in early May when we’re recording, so the second round of the playoffs just getting started, very exciting. I mean, less exciting for me, because the Caps didn’t quite have the season that we had expected, even after a great December, kind of, fell apart after that. But I want to get to your predictions, and some other things about playoffs in a little bit, but I wanted to start with maybe having you talk a little bit about yourself and how you got into this intersection of hockey data, DataViz, and have this pretty exciting site where you have this interesting Venn diagram going on. 

MM: So I was from Halifax which is where I live now, but I did not stay here my whole life by any means. I grew up here, and I was sort of a casual hockey fan like a lot of Canadians are. It’s in the water, you don’t need to seek it out or do anything, it’s just there. And so, I was sort of casual Senators fan, but I watched a number of other teams too when I was a kid. And then, I went to Australia to do my doctorate in mathematics, and I was, at the time, I was completely sure I was going to be a research mathematician that was my career. And I discovered, when I went to Australia that all of a sudden there was no more hockey, and they’re sports mad, of course, but ice hockey, as they insist on calling it, to my incredible annoyance, is not – it was just starting to get on the like, we’re going to fill a channel with some reruns in the middle of the night, because it’s live in Canada or in the States kind of thing. And I discovered that I was really homesick on the other side of the world, trying to do my PhD, and simultaneous to that, I was struggling with doing purely pure mathematics without doing any kind of hands-on anything. 

And so, I got more and more into hockey, especially because I could watch it while I worked, and when I wanted a distraction to do some different kind of work, I could run little simulations. Say, like, well, let’s make a model of what are the Senators going to do on this California road trip, how many points are they going to take. Well, the Kings are really good this year, and the Sharks are really bad, okay, how would that be. And you get into that, and what are you going to do, I might as well do it for all 30 teams. And then, you get into 32 now, of course. And so, that scratched an itch, both in terms of homesickness, and in terms of wanting to do some quantitative work. People always laugh when I tell them that my mathematics PhD contains essentially no quantitative work, because it’s all – but it’s true, it’s almost entirely, well, and then, of course, the material of the PhD is something that, by definition almost, no one else cares about. But it happened to have this kind of peculiar conscience, namely, I was working on graphical languages, so doing calculations with pictures. So not illustrating the calculations, but actually being the calculations, a two-dimensional string diagram calculus. You could actually work with the ribbons, and so, it sounds very abstract than it was, but the essence of it was doing things visually instead of doing things with symbols. 

And so then later, years later, when I decided to turn the hobby that I picked up in Australia into a job, so almost a decade later, it was very easy to think of it, not as an exercise in hockey or statistics, but as an exercise in data visualization, because I find that looking at things, seeing pictures of things activates a totally different part of my brain than trying to process things symbolically. And so, that’s where the two threads, where the math thread and the hockey thread kind of overlapped and became this entirely different thing centered around DataViz. 

JS: Right. It’s interesting when you tell that story, because it sounds like you started not by just doing kind of in-depth deeper dives of just tabulations and crosstabs, but actually doing predictions. 

MM: Yeah, I started with predictions, and that was because I wanted to make simulations. And so, it was kind of the wrong way round, if you like. If you’re doing science, you’d think about here’s the thing I really want to understand. And I had done a little bit of that – in my undergrad, I did a little bit of simulation work, I published a paper on bubble physics, and where you’re looking at soap froth squeezed between two glass plates, and I was simulating what they would look like with a computer. And that’s a really interesting scientific technique, but then, of course, the interest is, here’s this thing, and I can attack it experimentally, I can attack it theoretically, or you can attack it in the middle. And I discovered I really enjoyed that approach, and so, that became my focus, you know, how can I find another project using those tools, I just enjoy using them, and hockey was sort of naturally close to hand. And then afterwards, I thought, I don’t really know what I’m doing, I should look at the data better, am I doing this right. And so, in the very first instance, DataViz was a debugging tool for that, I thought. If I look at the data, then I’ll be able to see if it’s good, because I process information very well that way, relative to just looking at numerical output. And so, I had this little gadget, and I think it kind of works all right, but it became this need to make diagnostics to understand what it was that I was doing, led to this focus on DataViz. 

JS: So let’s talk about the DataViz piece of it, so I’m curious about your process, so you’re pulling in a lot of data, and for folks who haven’t checked out the site HockeyViz, I’ll leave links to it, and you absolutely should. You’ve got a ton of data, and a ton of visualizations. So what is your process? Where are you pulling data from? What is your toolkit to process the data? And then, what are you doing on the visualization side? And I want to dig in a little bit deeper on the visualization side as well, but what does that whole data workflow for you look like? 

MM: So for NHL data, specifically, it’s actually an enormous pain in the neck. I shouldn’t be too upset about it. The league after all puts it up for free, and they know that I use it and rearrange it and sell it on my website, and they don’t get too angry with me. But it’s definitely not presented in a way that is straightforward in anyway. And so, the data engineering aspect of it is a real pain, a lot of it is just scraping HTML. So there’s some pretty detailed HTML reports that I put out after every game by league – in fact, during each game, I update every few minutes. And so, you can even do some rudimentary live stuff. And then, there are some machine readable JSON endpoints that are not particularly sophisticated when the two data sources don’t quite line up, even though they both allegedly come from a league. And so, you have to do quite a bit of chicanery to make them line up. 

In the old days, people used to know about even more data that used to leak out through league partners of one kind or another. There was a time when the ESPN website could be relied upon for some information that you couldn’t get from the league, which is no longer true. But there’s a lot of tricks along those lines, where you have to do quite a bit more than just take in an endpoint and store it locally. The scraper, in particular, is one of those things where that’s a piece of software that’s carefully honed with lots of different if-thens for silly circumstances.

JS: Right, for a whole bunch of encases or edge cases that just pop up. 

MM: Yeah, and there’s not just the usual data we know, well, this data happens to be corrupt for reasons that are known possibly to no one. But then there’s also all kinds of weird things like Rich Peverley had a heart attack in a game after he scored, and then, they restarted that game and they credited him with the goal before the opening faceoff. And he also didn’t play, he got hurt. And then, sorry, it’s not Peverley who scored, some other guy scored and he got hurt. And so, then he didn’t get to play in the replay, and so, you make assumptions like do you assume that all the goal scorers in the game dressed in that game, and you get it wrong. So there’s a lot of data engineering along those lines as well. 

JS: And what are you doing on – what’s the tool you use to do the actual scraping? 

MM: Beautiful soup, my stack is all Python. 

JS: All Python, okay. So you’re pulling it, and so, you’re pulling it directly from the NHL API, and you’re not using like hockey reference, because those are more – hockey reference seems a little more aggregated as far as I can go.

MM: That’s right. I don’t have any partnerships with any other hockey websites. And I’m also sort of neurotic on this level, where, if there are mistakes I want to, at least, be able to say that’s my mistake or that’s the NHL’s mistake. That’s because what I’m doing is not just for my own benefit, but it’s public facing. I find it’s easier to say this is exactly who made a mistake here. 

JS: Yeah, so you control the entire workflow from A to Z, yeah. 

MM: Yeah, and I also don’t step on anybody else’s toes except [inaudible 00:11:29]. And so, hobbyists, there’s a certain amount of competition among people, but also a certain amount of professional respect for anybody else who’s working with NHL data where you say, like, again, you say, I have corrupt stuff from here, and the league hasn’t fixed it, do you have a copy that wasn’t corrupt, and you share things among practitioners. 

JS: So it’s interesting, so there is sort of a behind the scenes kind of community of, in this case, the hockey data folks? 

MM: Yeah, it’s probably about, like 10 or 12 people that I can think of that I consider them community. In fact, it’s large enough to have sort of its own little petty squabbles with people who don’t like one another. And, of course, everybody privately thinks that their way is best, it is best, and their models are best, and my data pipeline is definitely not the best. And so, it can be occasionally a little bit testy, but mostly people who have actually done a modicum of work and know what it’s like, are quite friendly with one another, and have to [inaudible 00:12:25]. 

JS: Right. They all know the challenges in these weird edge cases. 

MM: Yeah.

JS: So you’ve pulled the data in Python, you’ve cleaned it in Python, so what about the DataViz side, and maybe you could talk a little bit about your – I don’t know, do you have like, I mean, I obviously follow your stuff, and you’ve got a lot of different types of visuals, but do you have a few that you kind of market or share as like the kind of exemplar stuff of your work? 

MM: There is one for sure. In fact, I reworked it just this year, it’s a Sankey diagram, which shows the probabilities of each team as they move through the playoff rounds. 

JS: Yeah, I just saw the updated one yesterday, yeah.

MM: And so this one, in fact, I found a great blog post by another guy, I forget sadly, explaining how he had really admired Sankeys that were made with circle segments and straight lines. So with no – with only horizontal and vertical lines relative to some axis, and then, circle segments when you needed to turn. And so, I reworked mine, which had been based on lines with very different angles, which, of course, have this optical illusion, where thick lines can appear thin when they’re very steep, and which finally started to really annoy me. And so, I decided to rework it this way. And speaking of tech stacks that one graph that I show people all the time, it used to be called in its old incarnation, and fans of mine decided to call it the Rainbow Death Crab. It had a sort of crab like shape, and it was full of many multi colors, and the death is colored more due to people use, it has to do with how hockey fans, even the ones who love it, in fact, especially the ones who love it, best of all, find that that’s extremely stressful. 

In fact, I briefly sold some T-shirts – technically, you can still buy them and nobody has in a long time, that say we may win, but I may die. And so, I sort of leaned into that with the name of the thing, and that graph is the only one that I make with the Python library called svgwrite. It’s just a fairly gentle layer on top of plain SVG. And so, in fact, in a past life, in a totally different application, I used to write a lot of XML and XSLT stuff, and so, I find that quite natural, because I really wanted pinpoint control of actually making every single element. But that’s a little bit unusual, almost all of the other stuff I do is all matplotlib, which I’ve got reasonably good at attending to my will. And, in fact, I’ve avoided from time to time people say they, well, Micah, you should try this other tool, or you should try this other library. And I nearly never do, because I’m persnickety about having a lot of pinpoint control, and I don’t mind spending a great deal of time to get it to look exactly the way I want. And so, the process can be quite laborious in places, and quite intricate. But generally, it does come out exactly the way I want it once it’s done. 

JS: And you’re doing all that in code, like, do you pull any of it into as an SVG, and pull it into another tool to add annotations or do any manual stuff, or is it all – I mean, obviously, not automated, but is it all done in code? 

MM: It is entirely done in code, which I take as a kind of article of faith. If I was interested, like, every now and again, you come up with something and you say, oh, I see, no, that’s a case I didn’t quite expect, and doesn’t look quite as elegant, I could hand-tweak that. And if I was, I don’t know, like, if you commissioned me to do something for National Geographic or something, then maybe I would – I would tweak it and make sure that it was absolutely ideal. But I consider that as a rule, not as a virtue, but as a distraction from thinking at the right level of generality, where sort of, where the pure math background comes back out, where I think that solving the problem and solving the problem, in general. And so, which means it has to be done in code, without ever tweaking after the fact, even if that means that sometimes you have to put out something which is a pixel off from what it might be. 

JS: So during an evening, we got playoffs now, so there’s only two or three games a night, but during the regular season, multiple games going on, are you constantly updating, or do you have it running at a particular cadence, like, how is the updating process working? And also, how does it update through to the website so that when I log on, I can go grab last night’s, or yesterday’s shot diagram for the Caps? 

MM: That is all automated. Occasionally, stuff goes wrong, and I have to go fix it, and there’s always a certain amount of putting your finger in the hole in the dike. And almost all of that is automated, and a web server running on a server that I rent from a company in Toronto. And so, that’s all stashed there. 

JS: Right. The other thing that’s interesting about the site, so for folks who haven’t logged in, they should, and check it out, because you get into the page, and there are – and apply on the front main pages, like 16-20 graphs, something like that. And I’d say maybe two of them are animated, and the rest are static, I don’t think there’s a lot of interactive stuff on the site. What is your thought process about interactive versus static?

MM: The few interactive visits I’ve made, they’re very, I don’t know, two or three, I think, compared to 1000s of other types. And I consider the interactive ones to be failures on my part, where I think of DataViz in a genre sense as being akin to photography. And so, if you make photography that moves, in some sense, you have failed to understand what you’re doing, like, I consider the process where you take something, and you say, this is what goes in the rectangle, this is what it is I want people to look at. Then if you make something interactive, in some sense, failure might not be the word, abdication might be the word. You’ve declined to solve the problem. You leave it to your readers to solve the problem. And there’s an obvious virtue there, which is that agency brings something, but also an obvious vice, where you haven’t done the work, the editorial work that requires, where one of the things that people are looking for, and this is something that I think is a real virtue of having something static, is that it gives a certain kind of finality where you can say that, and even if there’s something else you want to see, there is something that you can see here, and I have shown all of it to you, where you say that implicitly just in the static choice, well before you actually look at the data. 

And then, of course, every now and then people say, no, they disagree with you, and sometimes they will yell at you on Twitter and say, this visit shows this, and it ought to show that. And I say, well, it doesn’t, because it’s not for that, it’s for this, the first thing I chose. And then, you can get into an argument about choices, why did you choose to show this, and those arguments are always interesting to me, especially because I do a lot of that work. In fact, I consider that to be, in some sense, the most interesting work, much more interesting than the color choices, the composition choices, but like what is it do you want to show here. And I find that restricting myself to static fans, with again, a few exceptions – crystallizes that process in my own mind, and by the time I’m done, I feel like I’ve gained a lot by knowing exactly what it is I want to show. 

JS: I agree with that, I think 90% of that, but in your particular case of this rich hockey data, for example, if I wanted to go in, and I could see the shot map for the Caps last game – I’m just going to focus on the Caps, even though we’re done, I’m just… 

MM: Hopefully, you don’t have to talk about the sudden news. 

JS: If I want to look at that shot map, but then filter by Ovechkin versus Oshie versus Carlson, now, on your side, I believe you can select the different maps for each of the different shooters. But do you think there is value in having an interactive version of that overall static where I could filter and click inside the visualization? 

MM: So no matter where you decide to draw a line, this is what I’m going to call static. And, of course, every interaction has that inside it,, right? Unless the pieces move when you’re not touching them, there’s always a static render, where you take your finger off the slider, and you look at it, whatever it is that you as a user together with the author of the Viz has created. And so, there’s no matter where you sit on the spectrum, you’re going to have a certain amount of staticness, and you’re going to have a certain amount of interactivity because no one is going to look at any particular thing for longer than they need to. And I know a good Viz, as always, of any type, you look at it, you see something interesting, you think, oh, I wonder about, and now you’re doing the interactivity in your head now. You’re thinking about a new question. And, of course, I’m running a web server, which means that there’s something inherently interactive at the level of the webs, where you’re going to click on a link, you’re going to click on something else, like, there’s links all over the website, which is something if not interactivity. 

And so, the question becomes, where do you put the interactivity. Even if you’re being incredibly old school, and you’re going to have a book of maps, your map doesn’t move when you look at it, but then you think, oh, I wonder if that’s the same in this other country, and you turn the page to the other country. Right? You interact with the gadget, which contains the static [inaudible 00:22:13]. And so, I’ve become quite comfortable with that for letting the interactivity be at the level of the webserver, especially because it builds on a lot of existing technologies. My interest technologically is quite minimal. I appreciate very much that other people have made any number of tools, but I don’t have a particular interest in the technology as such. And so, I prefer to use the simpler technologies whenever I can. And for me, there’s another angle to a more [inaudible 00:22:52] angle, if you will, which is that, in terms of getting customers, in terms of getting engagement, in terms of forming a community to put it in a slightly nicer light, it’s valuable to me that all my work be shareable.

And so, one of the things about having just PNG outputs for stuff, for almost everything is that other people, or more to the point, me, can click on something paste it into a tweet, put some sort of comment, linking it topically to whatever’s going on in whatever conversation, whatever sphere, and then, people can digest it right away, and you don’t have to – I mean, of course, you can take screenshots and stuff too, but there’s something about – something that’s going back to what we’re talking about earlier, something about the presentation of the thing as static. Yeah, if you take a screenshot of something that’s interactive, even if it’s your own thing, even if it’s curated as carefully as you’d like, there’s something not quite final about it that encourages or permits your readers to look at it as a little bit more transient. But if you’ve written something, even if you’ve written something in codes, it’s generating this for [inaudible 00:24:02] it’s generating that, that stuff is generating this for all the players, even if you haven’t gone over them one by one with a little hammer to say, this is exactly the way I want it, it still has that finished quality that comes across in the grain of the material, if you like. It feels slightly abstract to say that but you know about what’s just a set of pixels, like any other pixels, but you still get that, like, oh this feels like this, this feels like that, where if you made the table out of wood, it’s different than if you made it out of lead. 

JS: Yeah, I think that’s right in a lot of levels. I appreciate your point about what is the level of interactivity – is it to separate images that are layered? Is it a click? Is it a filter? But it is all – it is a movie at the end of the day, right? It’s a set of static images stitched together. I wanted to get into some more data questions, but go ahead, I don’t know if you had another thing to add about the, yeah. 

MM: I did want to, just using the word movie made ME think about it. There’s a third angle too about interactivity versus staticness that I really like, specifically for DataViz about sports. Because the sport itself is never pictures, the sport itself is always moving, it’s extremely interactive. It’s unavoidably happening through time. The space is, in fact, the time also, but space is very deliberately constrained. Every sport has its court that you play in, not out of, and also fairly tight times. But the time element is precisely the – is the one which is difficult to deal with. In the space element, you deal within a pretty straightforward way. You take – it’s just a matter of scale for every sport. You want to make a picture of the rink, you do. And the pictures that I put out are five inches high, just that much percent, you know, you could write it down exactly how much smaller it is than a regular rink. It’s just compression in space, and it’s so similar. You try to keep the proportions the same, you mark the rink, and you Viz the same way, or mostly the same way that you mark it on the ice, and for the same reason, so that people know where they are. 

But the compression and time is really fundamentally different, where hockey games last an hour and a bit of game time, two or three hours of real lifetime, but the Viz does not last in time. It’s compressed, and that compression I think is extremely important. And so, that’s part of what it is, I think that the creator has to do, where, in fact, frequently, you’re compressing not just a single game, you’re compressing entire seasons, maybe entire careers, maybe multiple careers, so that you can put them together in space. And that to me is the really fundamental aspect of what DataViz is at all, and it really comes through strongly when you’re doing sports stuff is to take variation in time, and turn it into variation in space instead. And so, if you say, well, we’ll just let the user click through this, you’re not doing all of that compression in time, which I think is the sort of first part of the job description. 

JS: Right. You’re giving them a snapshot of two hours of their life. 

MM: And it’s not just, like, you don’t want to – this is part of why people say, oh, you know, you should just watch the games where you get all this pushback from typical people, because, of course, they are putting in the time. And a lot of the pushback you get about having control or having power over the sport is because you’re getting pushback from people who are investing their entire lives. And right there, you have, and, of course, in my own way, I’m investing my entire life. I mean, this is my, like, professional career in a certain way. But it’s not the same level, and it’s, in particular, you get that extra level when players, ex-players especially can be a little bit twitchy about this as well, because they’ve invested a lot, not just the playing time, in addition to the watching time that somebody might, but also the physical pain, the trial, the exercise, the physical attributes, etc. 

And so that – but I think so far from trying to fight anybody who criticizes you on those terms, I think you have to accentuate the differences and say, we are not replaying this game. If you want to watch the game, you can watch the game. We are absolutely – and here I’m using the sort of royal mathematical lead, the author and the reader together, we are compressing the game. And that’s deliberate, because I am showing you in a specific restricted context, what matters, and I am taking down what doesn’t matter. And that requires a knowledge about the subject matter, and this is true for everything, it’s not just for sports, it just happens to be more vivid for sports, because you have this dichotomy of who currently has a lot of power and who is gaining power at their expense. But that aspect of we are condensing these – these things are being removed, and these things remain, and the process is completed, I think is really important to not shy away from. 

JS: Yeah, it’s an interesting way to think about, I mean, it is, in a lot of ways, real time data. Right? I mean, it’s not stock market data, but it’s real time data, and try to collapse that down, and it is always something that bothers me, when I’m stressed out watching my team playing someone who’s not in the sport says, oh, don’t worry about it. It’s like, no, I’ve invested my time and my energy, you just don’t understand. 

MM: No, I intend to worry about it. 

JS: Yeah, that’s right. So as we’ve been talking, I originally reached out to you because I was going through some of your shot visualizations, so for folks who haven’t seen these, these are more or less heat maps of where a team or a player shoots on the ice. And I reached out because I was curious about having just data that’s just where players are on the ice, because I have hypothesis of the students quickly, but I do have a question coming, I have hypothesis that there’s a lot of play happening on the side of the ice that’s opposite from the benches. And so, there’s a lot of play that we don’t see, because the cameras tend to face the benches. And so, that was why I reached out, and you were telling me that the data kind of doesn’t really exist. And so, I’m curious what data doesn’t exist that you wish you had? 

MM: Well, that data in particular that you mentioned is in this most maddening middle ground where it does exist, I just don’t get to have it. 

JS: Okay.

MM: And even that, of course, is reasonably new, just the same question a few years ago then, and it simply doesn’t exist at all. But player and puck tracking data has been available internally to teams for a couple of years now. And I happen to know just because of some connections with the industry that some teams are dealing with it quite well, and some teams are struggling, because the data, in fact, every now and again, I certainly get my eyes full of stars, when I think about what I could do with it, but it would be, among other things, I would have to take out some sort of loans and get a hold of a team of people, and it’s one thing to do everything yourself, and already, it’s all I can do for a full time job to actually make what I’m already making, even though it’s essentially all automated. But that kind of granularity, that kind of data where every player and the puck is on the ice with whatever, however many times a second resolution you’re looking at, all of a sudden, you don’t just need a rented computer and a half decent command of Python, you need a team of professionals. And not all hockey teams do. In fact, many even professional teams are run on what look like shoestring budgets once you take out player salaries. And so, there’ll be some movement from that, as, you know, over the next bunch of years, but it’s going to be quite slow, because that kind of data hasn’t been broadly used in this space. 

JS: Right. I mean, it’s interesting, do you think hockey, in general, in terms of the analytics is behind the other major, we’ll just stick to North American sports like baseball and football? 

MM: It’s behind all four. It’s the fourth of the big four; and arguably also behind how things are going in soccer, although Soccer has its own history with a lot of important established people who don’t like data driven approaches. But now and again, I referee basketball papers, statistical papers when they’re submitted to journals, and it is evident that they are miles, ahead of hockey. And baseball, of course, has its own little curiosity, because the sport is so different from the others. Because it has a data history that goes back very, very far, thanks to cricket. In some sense, baseball has always been this place where a particular kind of statistically inclined fan has always gravitated, because there’s so scrupulous about keeping so much data over such a long history. But it doesn’t lend itself to the same kind of techniques, because it’s not a continuous open play kind of game. Whereas basketball is the one that I’ve looked to most closely, because it has a lot of formal similarities to hockey, you know, the frequent substitutions, the generally open play, the constant contest for possession that has the – a lot of the analogs. The restricted space, compared to the number of people, where you don’t have that territory aspect that you have in the NFL or the CFL, but even there, the NFL data is like as a culture within the sport is miles ahead. 

JS: Yeah, I mean, I find it really interesting, this past season, all of a sudden, you are seeing real time replays of a play with the icons of each player moving. And that seemed to be, I mean, even within – I think that that was new this past season. So just the growth has been kind of amazing. It would be interesting to see if we could, I mean, I just remember being a kid, like, going through box scores in the physical newspaper, which maybe many readers or listeners now don’t know what a physical newspaper is, like, parsing through the box scores, and just whether that’s a predictor of people being interested in data and DataViz today in your sort of interesting Venn diagram. 

MM: Well, I was one of these kids too. I definitely pored over the standings too. It’s also worth noticing that despite the fact that hockey is really far behind, as I said, the improvements in the broader culture, in the last, say, five years are enormous. One of the nice things about having Twitter around is that you can see what were people arguing about, five-six years ago. Every now again people come up, and every now and again, even your own tweets, you think, boy, was that really where I was.

JS: I know, yeah. 

MM: [inaudible 00:34:55] you realize that as a culture, as a community, the improvement is enormous, even if we are still in fourth, I don’t know if we are gaining on anybody, but I don’t think of it as competition, I try to think of it is the culture improving, and ideally quickly, and I think the answer is yes, in both cases.

JS: Well, it is really interesting because at least here in Washington, obviously, the commander, the football team of the commanders being sold, and the leader of the, what looks to be the team that’s buying the commanders, the guy who’s leading that, he owns a part of the Devils, the New Jersey Devils hockey team, and part of the Philadelphia Sixers basketball team. And so, it will be really interesting, I think, I mean, obviously, this is behind the scenes, and whatever billionaires are able to do, but interesting to see is their sharing going on the technical piece of these data analytics happening. I mean, he’ll own parts of teams, or I don’t know if he knows how much he owns of each of these teams, of three major sports in North America, and how they’re sharing analytics and techniques and processes and data and all that sort of thing. 

MM: I think that’s definitely going to going to take even more hold. I know that the Kroenke Sports Group has a handful of sports teams across different leagues. I know that there’s also another couple of groups, like the Glazer group that owns a handful of teams, and some of them for sure are at least trying to build out cross-disciplinary, technical [inaudible 00:36:27]. 

JS: Sure. 

MM: I mean, it’s a lot easier said than done. 

JS: Oh absolutely, but to your point, I mean, basketball and hockey are two great examples. I mean, they’re similar in sort of the underlying structure, and so, if you have basketball data that’s real time of where everybody is on the floor at any one particular time, and you do whatever analysis and visualizations with that, and you have the same structural data for hockey, I would suspect that those teams can be talking to one another, it’s pretty interesting. 

MM: Yeah, and this is one of those areas where there’s a lot of ability to cross over, where you can say something like, especially because you have to deal with the culture in all cases, where you have an existing established culture that’s not data driven. And so, I was talking earlier about how I turned to this, because I’m not particularly moved or convinced by numbers, by which, I mean, physically, when you write numbers on a page, that doesn’t translate into information in my brain easily. And whereas people, you know, traditional owners, coaches, players of major sports are like that only 10 times, and so, being able to put Viz in front of them, where you can make a point quickly and say, look, this is how this helps us win, that’s an enormous gain. And so, if you have great technical skills, but not a lot of visualization skills specifically, and you have someone else on the team who can help you out with that, this is sort of a front facing version of what you can get out of, oh, you know, I don’t have any database administration abilities, and so, I’m going to get a DB tech who actually works in healthcare, I’m going to get her to come over and build out a system for me in sports. There the technology is sufficiently generic that everybody understands if you can build a database for this, you can build a database for that. But, in fact, the Viz skills are just as transferable. And so, that’s, I think, where some teams are going to find some traction there, some teams of teams, if you like.

JS: Right, teams of teams, yeah, absolutely. Okay, we’ve talked about the data, and your process for grabbing it and cleaning it, and we talked about the Viz. Let’s talk about some hockey and wrap this up. So we are now, I think, each of the second rounds were one game in to each of the second round games, so where we stand today, so it’s early May, what are your models saying right now, who’s the favorite to win in both conferences, what do you have? 

MM: So, all up, I favor the Carolina Hurricanes. They’re one up in their current series, but I favored them even before the second round started, in fact, even before the first round started. And in the West Conference, I prefer Dallas over all of the others, which is a slightly unusual choice. In fact, both of them are a little bit unusual. The Hurricanes especially have a very unusual style. The most important thing is to shoot the puck well, and that’s what they do not do. They do all of the other things extremely well. They just have the puck all the time, and they’re very good at turning it into shots. And the goal is to sort of come out eventually on the side, like, if you press enough almonds, you’ll get a certain amount of oil leaking out the side of the tank. 

JS: Yeah. What about Dallas? I’m curious about Dallas. 

MM: So Dallas is a more conventional construction, actually. But there’s something about being in the southern part of the United States, and also being out west, you know, west is here relative to Eastern Time. Well, and, of course, West is a question of when you play and they do play late at night to the chagrin of a lot of local fans actually. But that somehow manages to make you fly under some radar a little bit, and they have a fairly devoted but quite small fan base that also, you know, teams which have big fan bases like the Leafs, you can’t – they can’t do anything. You can’t change the sheets on the bed in a hotel without people making a big deal about it, however relevant it might be. Whereas in Dallas, you can assemble an extremely strong roster, and have only a handful of people notice. 

JS: Yeah, that’s interesting. 

MM: Well, so one of the greatest examples is their best forward, Jason Robertson, is, I think, very nearly as good as somebody like David, who has an incredibly high profile – I mean, very deservedly so, he’s one of the best to ever play the game. But Robertson is only a little bit off, and yet people don’t think of him as even remotely comparable. And the market, so that’s just one player, they have a whole stable of great players, but they’re much more traditional, they have a great goalie, they have great team defense, they have a handful of good shooters. That’s sort of the standard blueprint. But somehow people didn’t notice them doing it. 

JS: Yeah. So I’m curious about McDavid. I mean, my Caps are out, but I’m kind of rooting for Edmonton primarily because I just like watching him play. So kind of a two-parter, so, one, why doesn’t Edmonton go further every year with Draisaitl and McDavid? I just feel like they would go further every year, and is it just like a playoff problem, like, what is it? 

MM: I don’t think it’s a playoff problem, specifically. I do think it’s a dysfunction in the team as a management structure problem, where they’ve been unable to surround those two with the depth that is required because of their previous bad decisions. They here actually covers more people than just the current managers, because you can have these decisions where, you know, in a hard cap, you commit to a particular player for a really long time, and then, even if you buy them out, the buyout itself leaves marks, and the shadows can be cast for a really long time. And if you have poor management choices like that, you can hamstring even the very best players. Part of the two, of course, is that we’re talking about one of the differences in analytics between hockey and the other major sports, you also have differences in the game structure itself, where individual brilliance does not count for as much in hockey, as it does in all of the other sports, where in the position structure in football is so incredibly disparate, where you need to have a quarterback of a particular quality in order to go a particular distance; and if you do, you probably will, even if your other players are only so-so. And there’s no analog to that in hockey, even goalie, which is the closest you can get, just doesn’t have that same amount of leverage.

And then basketball, which, of course, has no – much, much flatter positional structure, all five – even though there’s technical positions, of course, but the difference between the five different positions on the court are so small, minuscule compared to NFL. But there, the time pattern is so different, where the best players play such an incredibly large fraction of the game. But in hockey, you just… 

JS: Right, you just can’t, yeah.

MM: You go so fast, no humans, no matter how Herculean, have the oxygen to supply their muscles to skate like that, for much longer than they’re already skating. And so, that really limits how much you can do, like McDavid is a physical freak, he can play for 30 minutes a night, that’s still only half the game. 

JS: Right. 

MM: And so, you can’t, you know, you just cannot squeeze a single player, or, in this case, two incredible players for that much. 

JS: Yeah. I had the – I was fortunate enough to be able to see Edmonton here in DC, in November, one of the rare games where we won, and, I mean, just watching McDavid play is just a wonder. I mean, he is incredible. 

MM: He’s confounding statistically, he’s one of these players who’s so good where you look at, you know, at outputs of graphs they can roger that wrong. And then, you go back and you look, and you say, no, I just have to move those axes. 

JS: Yeah. Do you think he is, right now, the or one of the most, I mean, certainly one of – do you think he’s the most dominant individual athlete in a team sport in, let’s say, of the big four we’ve been talking about?

MM: I know so little about the specific identities of anybody else who could be, like, if it’s not him, who might it be, and I don’t have the list on my finger, except of who it could be. I feel like structurally the constraints we’re talking about before mean that he can’t be somehow, just because he’s a hockey player, that no hockey player sort of like [inaudible 00:45:12] on skates, could ever have that kind of role. And I get – there’s something very pure about just saying I love watching this guy play, because it’s great, and there’s plenty of players who I have loved in the same sort of way. Carlson has occupied a similar kind of position mentally and spiritually for a lot of Senators fans, and then Sharks fans over the years. But this year, in particular, they were – the Sharks as a team were dreadful. And yeah, there’s plenty of Sharks fans who say, well, it was all still worth it, because I got to watch our [inaudible 00:45:41] have a historic season. But that’s understood that you can have a player doing titanic things and still lose. In fact, it’s reasonably common. And certainly, you watched Ovechkin take better, I mean, obviously, he got that one cup with that one team, but he took better teams, at considerably shorter distances into the past… 

JS: The President’s Trophy curse, right? I mean, we saw it this year. 

MM: Well, you don’t, you know, nothing is promised, nothing is written as they say. 

JS: Right. So I want to ask two more, you already mentioned I was going to ask you who the most underrated offensive player is, I think you said – Robertson would be your guy?

MM: Yeah, I think so. I try not to talk too much about underrated and overrated because it gets a little snippy about who knows stuff and who doesn’t.

JS: Oh yeah, sure. 

MM: But I try and think about it in the sense of like do I have a bottle of wine that’s really good, and it’s only 18 bucks, and I can only give you a glass, but I can say, oh, by the way, it’s only 18 bucks, and then, you will enjoy how they can achieve… 

JS: Right. 

MM: That’s sort of – I try to take it in this positive spirit, if I can. 

JS: Yeah. Okay, so let me rephrase my last question for you. Which team that’s not in the playoffs this year, do you think has the brightest future in the next couple of years? 

MM: I think Ottawa certainly tracks. Buffalo I think also looks really bright. They’ve acquired a number of – both of those teams have a number of extremely good players who are not just coming into their prime, but still a couple of years off it, you know, where you say, you fear this bit now at 21, at 22, what are you going to be like when you’re 23, 24. Those are the two that really stick out. 

JS: Yeah.

MM: And, of course, it’s part of the fun is that both of those teams are together, and they both just missed the playoffs this year. And they’re together in the same division, and their division is, at least, this year was the strongest one in the sport. So even if they do improve the way that I expected both will, it’s going to be a knife fight, just to even make the playoffs. And, in fact, probably one or both of them are going to lose in the first round of the playoffs just because they might have to play one another. 

JS: That’s right. 

MM: So it’s going to be – it’s part of the what they call business fun.

JS: Little fun.

MM: Yeah. But also kind of nerve-racking, it’s quite possible to have a dominant season, and to simply have to play your best opponent right away. 

JS: Right away, yeah, absolutely. Micah, this was great. Thank you so much for coming on the show. I really appreciate it, and I’ll look forward to seeing where we end up in the next few weeks over the playoffs. 

MM: Absolutely. Thanks for having me, Jon. 

And thanks, everyone, for tuning into this week’s episode of the show. I hope you’ll check out Micah’s website and learn a lot about hockey, and I hope you’re enjoying the hockey playoffs, of course. And if you haven’t yet checked out my new book, Data Visualization in Excel, it is available on Amazon at the Routledge Publishers site, and wherever you get your books. It’s a step by step guide of how to create more than 20 different non-standard graphs in Excel from heat maps to mosaic charts, to strip plots, and ring plots and dot plots and slope charts. So I hope you’ll check it out. Let me know what you think. There are downloadable Excel files that go along with it that you can use in your own work. So I hope you enjoyed this episode of the show. I hope you’re enjoying the NHL playoffs. And so until next time, this has been the PolicyViz podcast. Thanks so much for listening. 

A number of people help bring you the PolicyViz podcast. Music is provided by the NRIs. Audio editing is provided by Ken Skaggs. Design and promotion is created with assistance from Sharon Stotsky Ramirez. And each episode is transcribed by Jenny Transcription Services. If you’d like to help support the podcast, please share it and review it on iTunes, Stitcher, Spotify, YouTube, or wherever you get your podcasts. The PolicyViz podcast is ad free and supported by listeners. If you’d like to help support the show financially, please visit our PayPal page or our Patreon page at patreon.com/policyviz.