In this article, I will discuss the setup and results of an experiment I did with gathering data using the API offered by Foursquare. If you never heard of Foursquare, it is a (smartphone) application that allows users to share their whereabouts with the rest of the world, also know as checking-in. I can’t really say why I would do this myself but apparently plenty of people love the idea and are checking in literally everywhere to conquer badges and street-cred…
While waiting in line at the local supermarket (Colruyt) I had the idea of using this publicly available crowd-sourced data to graph supermarket visits in time. I was thinking where I could find the necessary data to make this graph and after digging into the Foursquare API, I got an idea on how it could be done.
Using the Foursquare API, we can search for venues that match given criteria, such as contain the word “Colruyt”. This search is limited to results within a 100 kilometer radius of the required “ll” (location) parameter, so in order to gather data for a larger dataset, we would need just a couple of measure points (at least for Belgium). Keeping a list of unique venue ids we would like to monitor could also be an option but that would need too much requests since the script would then have to send a request for each venue, while when using the search function, the number of checkins is also returned within the results.
Based on this idea, I wrote a script that takes as input a set of locations (lat/lon), a time interval n, a query (such as “Colruyt”) and an OAuth2 token (needed for Foursquare API authentication). Every n seconds, the script will send a search request to the foursquare API and dump the results (venueID, number of checkins) to a file. This file is then after the monitoring is done aggregated and plotted.
I created two tools to get the statistics:
- fsquerystats.py : the generic monitoring script that will poll the Foursquare search results and store the number of checkins.
- fsquerystats_export.py : will aggreggate the results of the other script and create graphs using mathplotlib.
The fsquerystats script was run with the following arguments:
The script ran for 3 weeks and polled the search results for the term “Colruyt” in a large part of Belgium, storing the number of checkins to a file. There were 118 establishment watched each 15 minutes, this resulted in 14860724 datapoints.
Results & Chart
Now it gets interesting, here are some charts.
Here are the visit statistics for each day individually, Sunday not included for obvious reasons.
Here is a graph with visits per weekday:
Here is a combined graph for each day, you can clearly see some trends here.
From these charts we can conclude some things:
- The calmest time to visit should be on Thursday, 09:00.
- Saturdays tend to be the most busy day, especially in the morning.
- Visitor count seems to decrease in the early afternoon after 13:00 until it rises again at 15:30.
- You can clearly see the store is open an extra hour on Friday (purple line).
- A lot of people seem to visit between 18:00 and 19:00, after work.
These numbers are of course not really representative, since Foursquare is only used by a really small (but sufficient) number of people and its usage is probably restricted to technology enthousiasts, so that excludes most elderly people. The numbers are not that big but I had expected them to be a lot lower than this. We could clearly see some trends here though I only had three weeks of data.
Don’t hesitate to contact me with any questions or feedback, thanks for reading!
Published on 07 Jan 2012