charts

Hide and seek during a lockdown

Foot traffic has fallen dramatically because of the coronavirus. Obvious things have stopped, like air travel and professional sports. But what about less high profile activities? One’s that aren’t explicitly banned and could even count as exercise.

Geocaching is kind of like a global game of hide and seek. Someone hides a container somewhere, publishes coordinates or clues and others try to find it. When you find a geocache you “log” it through an app or a website, maybe with tips and photos.

Geocaching should be the perfect social distancing activity. They’re usually off the beaten track. It can be done solo or just with your household. Geocache logs are also a pretty clean indicator of non-essential movement – nobody has to go geocaching or logs for work.

I’ve hidden a few geocaches. One under a bridge on the Gold Coast in Australia and another in a Sri Lankan park. Even now I get occasional alerts that they’ve been found. But I scraped the logs of 300 geocaches around the Gold Coast and there has been a 50% drop in geocaches found from March to April. April is down 45% from the previous year.

I wanted to make sure this isn’t an anomaly, or that there isn’t some state bias here. So I also scraped 300 geocaches from Adelaide, Sydney and Melbourne. The numbers in Adelaide aren’t as dramatic but the effect still appears. There were 38% fewer finds in April than April last year.

The effect is even clearer in Sydney and Melbourne. Finds in April are about a quarter of the previous month. There were only a couple hundred finds in April, down from almost 1500 last year.

My dataset goes back almost a decade. April normally is a solid month, with a couple of public holidays and the weather starting to turn. There’s also a general upward trend over the decade, probably due to an accumulating number of geocaches but maybe also smartphone uptake. Apart from those succeeding a massive outlier, this kind of drop off seems anomalous.

This is a pretty clear sign of how hunkered down everyone is. It hasn’t fallen off completely because some people probably use it as exercise – I often plan my walks around where geocaches are present. But the marginal users have completely fallen away.

Does it pay to win the toss?

Something that has always bugged me about cricket is that the coin toss seems to have a huge impact. That’s the framing, anyway. The entire first morning of a test match is usually taken with what the winner should do – bat or bowl first.

Innumerable factors play into this decision, including weather, recent games, psychology and schedule. It sometimes seems more art than science.

But does it matter? Between 2000 and 2018 the toss winner won about 40% of games and the loser about 35%, according to noted cricket statistician Ric Finlay. Considering the sheer number of games, this seems pretty significant. I decided to scrape Cricinfo’s stat page to see if there’s anything else to tease out.

Firstly, as you’d expect from a coin toss, the results of a coin toss are about 50/50. Here’s Australia’s record at home:

tosses

But let’s go a bit deeper and break it down by country. The results of test matches played in Australia roughly line up with what Finlay says. But, perhaps counter-intuitively, it seems winning the toss is slightly more advantageous in the shorter formats. I would have thought the opposite, as pitches deteriorate and there’s more time for poor weather etc. in a test match.

oztosswon

Some of this is probably noise. There have been significantly fewer T20s than test matches played, for instance. Maybe more to unpack in the ODI’s.

Funnily enough, India is pretty dire for my theory. It’s even worse for test matches in India and even better for T20 matches. But, again, relatively few T20 matches. Also significantly fewer test matches played in India than in Australia, so that’s one to watch.

indiatosswon

Let’s look at England. This one is a little closer to what we saw in Australia, which makes me think the quantity of matches played is important. It also makes me question the connection between the toss and weather.

engtosswon

All of this is roughly around with Finlay says, which makes me think there’s something to winning the toss. And the advantage for one dayers is pretty consistent across these countries. I’m not prepared to call it yet. But there could be a marginal effect here. Gonna keep exploring.

The most boring heatmap

With the NBA season about to resume there has been a lot of talk of "rust". That with the long break and short lead-up teams will be sloppier than normal. They'll turn the ball over more. Not pass as much. Take easier shots (more 3 pointers rather than driving to the hoop etc.).

It's a conversation had at the beginning of every season. And one I've always wanted to nail down. I believe its a phenomena. But are we all being fooled? Is this just survivorship bias? After all, we don't talk about it when the team don't play sloppy.

I decided to scrape the team box score for every Boston Celtics game going back to 1946. That's some 6,000 games. And, I can't really find it.

The median turnovers per game, for instance, is about 16 per game over 70-odd seasons. The standard deviation is roughly one. Given this, over 70-odd seasons, six of the first ten games of the seasons are within one standard deviation in terms of turnovers per game.

That's 60% of the initial games within one standard deviation, which is roughly what you'd expect. Over seventy seasons there's little difference if you compare all the first games and all the last games.

I wanted to make a heatmap to visualise this. I grouped all the games (so, all the 1st games of the season are together, as are all the 10th games etc.) I did this for five basic game stats - turnovers, assists, 3 point percentage, field goal percentage and three point attempts.

Just for easy comparison I've normalised everything to a range from 0 to 1.

You can see roughly what I found with turnovers. Some spikes here and there. Field goal percentage appears to get somewhat better as the season progresses. But it's mostly uniform.

You may be able to make a better case for some other, advanced stats. Like true shooting percentage and effective field goal percentage. But my data on the advanced metrics was a little iffy so I decided to put it aside.

I'm still not entirely sold this isn't a phenomena. It just makes too much sense not to be true. So I'm not gonna call it yet. One theory may be that individual players have a greater variance than the team itself, especially over a long period. This would explain the heavy focus by commentators, as starters and stars draw outside attention.

So I might scrape the players box score next. But if anyone can point me at someone else taking a deep dive it would be appreciated.

Why I'm not a professional sportsperson, maybe?

Following up on yesterday’s deep dive into NBA birthdays I’ve been reading more about the relative age effect. This is the apparent phenomena whereby “older” players are over represented in professional sports. By older I mean that professional athletes are more likely to be born at the beginning of the year.

There’s actually a remarkable amount of research on this phenomena, and it appears to hold for some sports, in some countries, especially in Europe. One paper notes:

During the last three decades, researchers have identified overrepresentations of athletes born in the first quartile of the selection year (i.e. January to March if the cut-off date is 1 January) across cultural contexts in sports such as football, ice hockey, handball, baseball, basketball, rugby, volleyball, tennis, ski sports and swimming, Till et al.. demonstrated the possible extent of such over representations of relatively older players in rugby: 47.0 % of the regional and 55.7 % of the national junior representative players were born in the first 3 months of the selection year

It does not appear to hold for the NBA, as I discovered yesterday. That basketball finding was in France. And many of these studies found the phenomena reduces with age. It’s more prevalent in teams of teenagers than the higher grades, for instance.

But what about cricket? Again, a thought that hit me today as I was planning attending some upcoming test matches. One study of Australian cricket players again found no significant difference in relative age among state level players, but did for lower grades.

Players born in the first quartile of the cricket season were significantly over- represented in both male Under-15, Under-17, Under-19 and female Under-15 and Under-18 levels. However, there was no significant difference at the senior state level for either male or female cricketers.

What about Sri Lanka? Where schools play an outsized role in player development and the club system is a mess. I haven’t been able to find research on this so I decided to scrape the birthdays for all 150 Sri Lankan test players from Wikipedia:

image_6f09ae20-5370-4798-ae58-3cce9f67e131

Not much to go on here. Not much of a difference. And not enough data points to say the variation is much more than randomness. I may try to find a larger database of Sri Lankan club players.

But, have I been cursed for my October birth? It seems to depend on a lot. Given I grew up in Australia probably not.