<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://www.marcolussetti.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://www.marcolussetti.com/" rel="alternate" type="text/html" /><updated>2026-01-17T21:15:12+00:00</updated><id>https://www.marcolussetti.com/feed.xml</id><title type="html">Marco Lussetti</title><subtitle>Personal website of Marco Lussetti</subtitle><entry><title type="html">Import Spotify Extended Streaming History into ListenBrainz</title><link href="https://www.marcolussetti.com/misc/2025/07/03/listenbrainz-extended-history-import.html" rel="alternate" type="text/html" title="Import Spotify Extended Streaming History into ListenBrainz" /><published>2025-07-03T00:30:00+00:00</published><updated>2025-07-03T00:30:00+00:00</updated><id>https://www.marcolussetti.com/misc/2025/07/03/listenbrainz-extended-history-import</id><content type="html" xml:base="https://www.marcolussetti.com/misc/2025/07/03/listenbrainz-extended-history-import.html"><![CDATA[<p>ListenBrainz / MusicBrainz is a service that tracks music listens (it had many other components). It can integrate with Spotify so that Spotify notifies it when a song is played, but that only “starts” with songs at that point in time (well it actually imports the last 30 days before then but).</p>

<p>Spotify however allows users to export the full (-ish, as I only see things since 2014) history as a set of JSONs. This is actually a very cool dataset as it includes even brief plays, so you can track skipped tracks and things like that.</p>

<p>But for our purposes here, I wanted to get those pre-2021 stats into ListenBrainz. There are a number of tools to do this, but currently all of them are missing a bit on the filter side (or are buggy? or I am buggy?), so this is my quick way to get those stats in.</p>

<p><a href="/assets/posts/20250702-listenbrainz.png"><img src="/assets/posts/20250702-listenbrainz.png" alt="ListenBrainz for myself after the import" /></a></p>

<!--more-->

<h2 id="spotify-extended-streaming-history">Spotify Extended Streaming History</h2>

<p>This is a zip file you can request from Spotify. More details on the process and what the data is available <a href="https://support.spotify.com/ca-en/article/understanding-my-data/">from the Spotify website</a>.</p>

<p>To get it, go to your <a href="https://www.spotify.com/us/account/privacy/">Account -&gt; Privacy</a> -&gt; Download your data -&gt; Tick the box that says Extended streaming history. Do be aware that it may take 30 days to get it! You’ll get an email from Spotify eventually and you can flip back to this page then :D</p>

<p>Eventually, you’ll get a zip file that contains a bunch of JSON files. There’s actually quite a few cool tools or examples to analyze some of this data (I have not tested these):</p>

<ul>
  <li><a href="https://explorify.link/">Explorify</a></li>
  <li><a href="https://spotify-history.streamlit.app/">Spotify History</a></li>
  <li><a href="https://ericchiang.github.io/post/spotify/">Eric Chiang’s Pandas work on this</a></li>
</ul>

<p>But this is besides the point here (I should at some point finish my project that works in this space, but that’s neither here not here).</p>

<p>Of interest to us are the files called <code class="language-plaintext highlighter-rouge">Streaming_History_Audio_&lt;some_year(s)&gt;_&lt;number&gt;.json</code>. The number here is in the order, so it’s easy to either go backwards or forwards from here.</p>

<h2 id="filtering">Filtering</h2>

<p>To import the data into ListenBrainz, we need to convert from the “custom” Spotify JSON to the <code class="language-plaintext highlighter-rouge">jsonl</code> expected by ListenBrainz. A tool called <a href="https://github.com/kellnerd/elbisaur?tab=readme-ov-file#parsing-spotify-extended-streaming-history">elbisaur</a> has a pretty good convert and importer, but unfortunately right now its filter is not working.</p>

<p>So the idea here is going to be to use <a href="https://jqlang.org/">jq</a> to filter out the json prior to passing them to <code class="language-plaintext highlighter-rouge">elbisaur</code>.</p>

<p>We want to filter out for four things:</p>

<ol>
  <li>Tracks played for less than X, to remove any tracks we just briefly played. This is subjective of course, but we are working with a 30 seconds time for this to match the defaults.</li>
  <li>Tracks that are missing an artist or track name. I assume this may be tracks that Spotify no longer has perhaps?</li>
  <li>Tracks that you already imported into ListenBrainz. There’s not really an easy way to do this if you have overlapping history, but if you do not, it’s easy to just use a timestamp. This is what we’re doing here.</li>
</ol>

<p><br /></p>

<p>So, ensure you have <code class="language-plaintext highlighter-rouge">jq</code> installed, and prepare some filters. I used:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>jq <span class="s1">'[.[] | select(.master_metadata_track_name != null and .master_metadata_album_artist_name != null and .ms_played &gt;= 30000)]'</span> Streaming_History_Audio_2018-2019_2.json <span class="o">&gt;</span> filtered_2018_2019_2.json
</code></pre></div></div>

<p>This excludes any missing track names or artist names, and requires <code class="language-plaintext highlighter-rouge">ms_played</code> to be 30,000 (30 seconds).</p>

<p><br /></p>

<p>If you need to also filter out by timestamp (I did require this for my most recent one), pick the first timestamp you see in ListenBrainz (e.g. click on Oldest in the listens history) and convert it to UTC (as ListenBrainz will use the local timestamp, but Spotify uses UTC). You can then add it pretty easily, for instance I did</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>jq <span class="s1">'[.[]
  | select(
      .master_metadata_track_name != null and
      .master_metadata_album_artist_name != null and
      .ms_played &gt;= 30000 and
      .ts &lt; "2021-02-10T00:00:00Z"
  )]'</span>   Streaming_History_Audio_2020-2022_5.json <span class="o">&gt;</span> filtered_2020_2021.json
</code></pre></div></div>

<p>You can see here <code class="language-plaintext highlighter-rouge">.ts &lt;</code> as the important part, you can also add two filters with <code class="language-plaintext highlighter-rouge">&gt;</code> and <code class="language-plaintext highlighter-rouge">&lt;</code> to get a range.</p>

<p>You may also want to look into excluding Podcasts from your import if you did listen to podcasts on Spotify (I have not so did not explore this).</p>

<p>This gets us one or more jsons that we can pass to <code class="language-plaintext highlighter-rouge">elbisaur</code> to get jsonl.</p>

<h2 id="convert-to-jsonl">Convert to JSONL</h2>

<p>For this, I used Deno to run elbisaur. With Deno, we will need to add specific permissions or we will be prompted for it. I chose to add most permissions, but leave the y/n on the final upload there.</p>

<p>It also needs environment variables configured in the folder you run it from to get the token and username for ListenBrainz. I believe it requires this even for parsing, not just for importing.
So you need either need a .env file in the local folder formatted like:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>LD_TOKEN=&lt;token&gt;
LD_USER=&lt;your_username&gt;
</code></pre></div></div>

<p>or to set the environment variable as part of your run command:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">LD_TOKEN</span><span class="o">=</span>&lt;token&gt; <span class="nv">LD_USER</span><span class="o">=</span>&lt;your_username&gt; deno run .... <span class="o">[</span>commands are below]
</code></pre></div></div>

<p>The token can be found on <a href="https://listenbrainz.org/settings/">your ListenBrainz settings page</a>.</p>

<p><br /></p>

<p>First, we should check that we can see the songs correctly with the preview mode:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>deno run <span class="nt">--allow-read</span> <span class="nt">--allow-env</span> <span class="nt">--allow-write</span> jsr:@kellnerd/elbisaur parse <span class="nt">--preview</span> filtered_2020_2021.json
</code></pre></div></div>

<p>This may yield any errors if there are additional filters required that I didn’t need (not too sure if anything else can be missing)</p>

<p><br /></p>

<p>Now to generate the real jsonl:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>deno run <span class="nt">--allow-read</span> <span class="nt">--allow-env</span> <span class="nt">--allow-write</span> jsr:@kellnerd/elbisaur parse filtered_2020_2021.json
</code></pre></div></div>

<p>This will create a jsonl at <code class="language-plaintext highlighter-rouge">filtered_2020_2021.json.jsonl</code>, which is fair enough.</p>

<h2 id="upload-to-listenbrainz">Upload to ListenBrainz</h2>

<p>We can again use elbisaur to do the import:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>deno run <span class="nt">--allow-read</span> <span class="nt">--allow-env</span> <span class="nt">--allow-write</span> jsr:@kellnerd/elbisaur import filtered_2020_2021.json.jsonl
</code></pre></div></div>

<p>It will take a bit depending on how many songs you get, but it will do the import.</p>

<h2 id="results">Results</h2>

<p>Now repeat the jq filter –&gt; convert to jsonl –&gt; import for each audio file you may have.</p>

<p>You will see your history update in ListenBrainz over the next couple of days. First you will see the Top Artists / Top Albums / Top Songs update, then the song count in the top of your stats page. And then eventually your name in any top 9 pages for artists.</p>

<p>There’s a little bit more on how stats work on the <a href="https://listenbrainz.readthedocs.io/en/latest/general/data-update-intervals.html">ListenBrainz docs</a> but it does not specify anything for artists. I concidentally did this at the same time as the monthly rollover process (and do not need to do it again :D), so your experience may vary.</p>]]></content><author><name></name></author><category term="misc" /><category term="listenbrainz" /><summary type="html"><![CDATA[ListenBrainz / MusicBrainz is a service that tracks music listens (it had many other components). It can integrate with Spotify so that Spotify notifies it when a song is played, but that only “starts” with songs at that point in time (well it actually imports the last 30 days before then but). Spotify however allows users to export the full (-ish, as I only see things since 2014) history as a set of JSONs. This is actually a very cool dataset as it includes even brief plays, so you can track skipped tracks and things like that. But for our purposes here, I wanted to get those pre-2021 stats into ListenBrainz. There are a number of tools to do this, but currently all of them are missing a bit on the filter side (or are buggy? or I am buggy?), so this is my quick way to get those stats in.]]></summary></entry><entry><title type="html">Ikea Locations Voronoi Diagram</title><link href="https://www.marcolussetti.com/dataviz/2025/06/26/ikea-voronoi.html" rel="alternate" type="text/html" title="Ikea Locations Voronoi Diagram" /><published>2025-06-26T05:29:00+00:00</published><updated>2025-06-26T05:29:00+00:00</updated><id>https://www.marcolussetti.com/dataviz/2025/06/26/ikea-voronoi</id><content type="html" xml:base="https://www.marcolussetti.com/dataviz/2025/06/26/ikea-voronoi.html"><![CDATA[<p>Today <a href="https://etm.id.au/">Evan</a> reminded me that Voronoi diagrams are a thing. Now, I’ve looking for an excuse to play around with the OSM for a while, so here we are.</p>

<p>I’ve been looking at things like overlaps of IKEA locations and Costco locations (clearly the two essentials of North American life for me), and I figured it may be fun to start dividing the world into Ikea locations.</p>

<p>This was a quick and quite fun project using Python/Jupyter Notebooks, and I’d like to do more in this direction in the next little bit!</p>

<p>Here is the final map:</p>

<p><a href="/assets/posts/20250625-final-map.png"><img src="/assets/posts/20250625-final-map.png" alt="Final map" /></a></p>

<!--more-->

<p>I will use the rest of this post to explain how I made it. Full code for this is available as a <a href="https://gist.github.com/marcolussetti/0c16d591b2411126561bddd4fe135bd9">gist</a>.</p>

<p>A note on dependencies: I originally tried to use <code class="language-plaintext highlighter-rouge">geovoronoi</code> for this, but it has not been updated in some time and seems quite challenging to get working with modern versions of <code class="language-plaintext highlighter-rouge">geopandas</code>. I switched to <code class="language-plaintext highlighter-rouge">shapely</code> for the Voronoi portion which was much nicer to work with.</p>

<h2 id="openstreetmap">OpenStreetMap</h2>

<p>To get data out of OpenStreetMap I originally used OSMPythonTools with the Overpass API, however I quickly discovered that the query was too large and timed out.</p>

<p>Here is the original code:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">OSMPythonTools.overpass</span> <span class="kn">import</span> <span class="n">Overpass</span><span class="p">,</span> <span class="n">overpassQueryBuilder</span>
<span class="n">overpass</span> <span class="o">=</span> <span class="nc">Overpass</span><span class="p">()</span>

<span class="n">the_world</span> <span class="o">=</span> <span class="p">[</span><span class="o">-</span><span class="mi">90</span><span class="p">,</span> <span class="o">-</span><span class="mi">180</span><span class="p">,</span> <span class="mi">90</span><span class="p">,</span> <span class="mi">180</span><span class="p">]</span>
<span class="n">overpass_query</span> <span class="o">=</span> <span class="nf">overpassQueryBuilder</span><span class="p">(</span>
    <span class="n">bbox</span><span class="o">=</span><span class="n">the_world</span><span class="p">,</span>
    <span class="n">elementType</span><span class="o">=</span><span class="p">[</span><span class="sh">'</span><span class="s">node</span><span class="sh">'</span><span class="p">,</span> <span class="sh">"</span><span class="s">way</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">relation</span><span class="sh">"</span><span class="p">],</span>
    <span class="n">selector</span><span class="o">=</span><span class="p">[</span><span class="sh">'"</span><span class="s">shop</span><span class="sh">"</span><span class="s">=</span><span class="sh">"</span><span class="s">furniture</span><span class="sh">"'</span><span class="p">,</span> <span class="sh">'"</span><span class="s">brand</span><span class="sh">"</span><span class="s">=</span><span class="sh">"</span><span class="s">IKEA</span><span class="sh">"'</span><span class="p">],</span>
    <span class="n">includeCenter</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">overpass_results</span> <span class="o">=</span> <span class="n">overpass</span><span class="p">.</span><span class="nf">query</span><span class="p">(</span><span class="n">overpass_query</span><span class="p">)</span>
</code></pre></div></div>

<p>This worked fine when I just queried nodes, but as soon as I expanded this to ways and relations I needed the center point, and that’s when it started timing out.</p>

<p><br /></p>

<p>I switched to <a href="https://overpass-turbo.eu/s/271t">overpass turbo (run query from here)</a> with this query and it worked quite well:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[out:json][timeout:60];
(
  node["shop"="furniture"]["brand"="IKEA"];
  way["shop"="furniture"]["brand"="IKEA"];
  relation["shop"="furniture"]["brand"="IKEA"];
);
out center;
</code></pre></div></div>

<p>Already at this stage it’s quite interesting to see the concentration in certain areas.</p>

<p>I remembered hearing about Ikea locations in South Africa from a friend, and those are missing?
It turns out there isn’t really an Ikea in South Africa, but rather something called <a href="https://www.homeswedehome.co.za/">Home Swede Home</a> which is a third party importer of IKEA products. To whoever came up with that name: I appreciate you!</p>

<p><br /></p>

<p>With the Overpass Turbo results I needed a GeoJSON to be able to import it into <code class="language-plaintext highlighter-rouge">geopandas</code> which can be had via Export -&gt; GeoJSON -&gt; Download. I put my copy <a href="https://gist.github.com/marcolussetti/0c16d591b2411126561bddd4fe135bd9#file-ikea-geojson">on the gist</a>.</p>

<p>I guess we can just upload the .geojson file, or we can get it with requests from the gist:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">os</span>
<span class="kn">import</span> <span class="n">requests</span>

<span class="k">if</span> <span class="ow">not</span> <span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="nf">exists</span><span class="p">(</span><span class="sh">"</span><span class="s">ikea.geojson</span><span class="sh">"</span><span class="p">):</span>
    <span class="n">res</span> <span class="o">=</span> <span class="n">requests</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">"</span><span class="s">https://gist.githubusercontent.com/marcolussetti/66bd134ad82c27b0f9da4d9ae55f85fb/raw/9fef8f3f5795ad6401fd07e0ef7dcb2a9fdf3439/ikea.geojson</span><span class="sh">"</span><span class="p">)</span>
    <span class="k">with</span> <span class="nf">open</span><span class="p">(</span><span class="sh">"</span><span class="s">ikea.geojson</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">wb</span><span class="sh">"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
        <span class="n">f</span><span class="p">.</span><span class="nf">write</span><span class="p">(</span><span class="n">res</span><span class="p">.</span><span class="n">content</span><span class="p">)</span>
</code></pre></div></div>

<p><br /></p>

<p>GeoPandas reads it pretty easily, but we have to be careful about the coordinate system (<code class="language-plaintext highlighter-rouge">epsg</code>). The points are coming from OpenStreetMap which uses the GPS coordinate system (<a href="https://osmdata.openstreetmap.de/info/projections.html">source</a>)) or <code class="language-plaintext highlighter-rouge">4326</code>.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">geopandas</span> <span class="k">as</span> <span class="n">gpd</span>
<span class="n">stores_gdf</span> <span class="o">=</span> <span class="n">gpd</span><span class="p">.</span><span class="nf">read_file</span><span class="p">(</span><span class="sh">"</span><span class="s">ikea.geojson</span><span class="sh">"</span><span class="p">)</span>
<span class="n">stores_gdf</span> <span class="o">=</span> <span class="n">stores_gdf</span><span class="p">.</span><span class="nf">set_crs</span><span class="p">(</span><span class="n">epsg</span><span class="o">=</span><span class="mi">4326</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
</code></pre></div></div>

<p><br /></p>

<p>The tiles we will overlay will be using a Mercator projection (queasy as I feel about this :D) as that is what shapely supports for Voronoi. So we need to convert the points to the target Mercator system, or <code class="language-plaintext highlighter-rouge">3395</code>.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">stores_3395</span> <span class="o">=</span> <span class="n">stores_gdf</span><span class="p">.</span><span class="nf">to_crs</span><span class="p">(</span><span class="n">epsg</span><span class="o">=</span><span class="mi">3395</span><span class="p">)</span>
<span class="nf">len</span><span class="p">(</span><span class="n">stores_3395</span><span class="p">)</span>
</code></pre></div></div>

<p><br /></p>

<h2 id="tilesgeometry">Tiles/Geometry</h2>

<p>For the tiles I just a common low fidely dataset, <a href="https://www.naturalearthdata.com/">Natural Earth Vector</a>’s 1:110m. You can download it from their website or fetch the individual files needed from GitHub, which is what I did. In retrospect, I wonder if I should have gone for at least one of their higher resolution datasets given I ended up wanting to export it quite large to be able to tell at least some of the points (which are so close together!).
The main reason I used such a low fidelity dataset is try to emphasize that this is not an overly precise or serious project – we’re just having a bit of fun here.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">geopandas</span> <span class="k">as</span> <span class="n">gpd</span>
<span class="kn">import</span> <span class="n">os</span>
<span class="kn">import</span> <span class="n">requests</span>
<span class="kn">import</span> <span class="n">zipfile</span>

<span class="n">naturalearth_url</span> <span class="o">=</span> <span class="sh">"</span><span class="s">https://github.com/nvkelso/natural-earth-vector/raw/refs/heads/master/110m_cultural/</span><span class="sh">"</span>

<span class="n">base_file</span> <span class="o">=</span> <span class="sh">"</span><span class="s">ne_110m_admin_0_countries</span><span class="sh">"</span>
<span class="n">extensions</span> <span class="o">=</span> <span class="p">[</span><span class="sh">"</span><span class="s">.shp</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">.shx</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">.dbf</span><span class="sh">"</span><span class="p">]</span>

<span class="k">for</span> <span class="n">ext</span> <span class="ow">in</span> <span class="n">extensions</span><span class="p">:</span>
    <span class="n">file_name</span> <span class="o">=</span> <span class="n">base_file</span> <span class="o">+</span> <span class="n">ext</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="nf">exists</span><span class="p">(</span><span class="n">file_name</span><span class="p">):</span>
        <span class="n">res</span> <span class="o">=</span> <span class="n">requests</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">naturalearth_url</span> <span class="o">+</span> <span class="n">file_name</span><span class="p">)</span>
        <span class="k">with</span> <span class="nf">open</span><span class="p">(</span><span class="n">file_name</span><span class="p">,</span> <span class="sh">"</span><span class="s">wb</span><span class="sh">"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
            <span class="n">f</span><span class="p">.</span><span class="nf">write</span><span class="p">(</span><span class="n">res</span><span class="p">.</span><span class="n">content</span><span class="p">)</span>
</code></pre></div></div>

<p><br /></p>

<p>If all we needed were the tiles, we could do just with the <code class="language-plaintext highlighter-rouge">.shp</code> and <code class="language-plaintext highlighter-rouge">.shx</code>, but we actually need to filter out Antarctica. Now I don’t yet understand why, but when you render Antarctica with this dataset, it goes all crazy:</p>

<p><img src="/assets/posts/20250625-enormous-antarctica.png" alt="Antarctica rendering issue" /></p>

<p>That’s hilarious but not entirely useful! So downloading all three files fixes the issue, and they are automatically imported when we read the file with geopandas (see below).</p>

<p><br /></p>

<p>This is where the geo<strong>pandas</strong> part turns out to be useful as we can just “filter out” Antarctica like any normal DataFrame:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">world</span> <span class="o">=</span> <span class="n">gpd</span><span class="p">.</span><span class="nf">read_file</span><span class="p">(</span><span class="sh">"</span><span class="s">ne_110m_admin_0_countries.shp</span><span class="sh">"</span><span class="p">)</span>
<span class="n">world</span> <span class="o">=</span> <span class="n">world</span><span class="p">.</span><span class="nf">set_crs</span><span class="p">(</span><span class="n">epsg</span><span class="o">=</span><span class="mi">4326</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">world</span> <span class="o">=</span> <span class="n">world</span><span class="p">[</span><span class="n">world</span><span class="p">[</span><span class="sh">"</span><span class="s">NAME</span><span class="sh">"</span><span class="p">]</span> <span class="o">!=</span> <span class="sh">"</span><span class="s">Antarctica</span><span class="sh">"</span><span class="p">]</span>
<span class="n">area</span> <span class="o">=</span> <span class="n">world</span><span class="p">.</span><span class="nf">to_crs</span><span class="p">(</span><span class="n">epsg</span><span class="o">=</span><span class="mi">3395</span><span class="p">)</span>
<span class="n">area</span><span class="p">[</span><span class="sh">'</span><span class="s">geometry</span><span class="sh">'</span><span class="p">]</span> <span class="o">=</span> <span class="n">area</span><span class="p">[</span><span class="sh">'</span><span class="s">geometry</span><span class="sh">'</span><span class="p">].</span><span class="nf">buffer</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="n">area_shp</span> <span class="o">=</span> <span class="n">area</span><span class="p">.</span><span class="nf">union_all</span><span class="p">()</span>
</code></pre></div></div>

<p>I do not know why but this is very amusing to me. Also note here how we have to set the <code class="language-plaintext highlighter-rouge">epsg</code> again to get Voronoi eventually working. So the dataset technically is actually also 4326, but again shapely apparently requires Mercator, so here we are.</p>

<h2 id="shapely-prep">Shapely prep</h2>

<p>We now have the points and the area, and they’re both in the Mercator projection. Now it’s a matter of putting it together using Shapely to generate the Voronoi regions (and then plot it).</p>

<p>Before we can generate the regions, we need to convert the points into a format acceptable to Shapely (its own <code class="language-plaintext highlighter-rouge">Point</code> class) and filtered out any duplicates. In our set there’s only one duplicate, but still.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">shapely.geometry</span> <span class="kn">import</span> <span class="n">Point</span>

<span class="n">stores_3395_arr</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">(</span><span class="nf">list</span><span class="p">(</span><span class="nf">set</span><span class="p">([(</span><span class="n">p</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="n">p</span><span class="p">.</span><span class="n">y</span><span class="p">)</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">stores_3395</span><span class="p">.</span><span class="n">geometry</span><span class="p">])))</span>
<span class="n">points</span> <span class="o">=</span> <span class="p">[</span><span class="nc">Point</span><span class="p">(</span><span class="n">p</span><span class="p">)</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">stores_3395_arr</span><span class="p">]</span>
<span class="n">multipoints</span> <span class="o">=</span> <span class="n">gpd</span><span class="p">.</span><span class="nc">GeoSeries</span><span class="p">(</span><span class="n">points</span><span class="p">,</span> <span class="n">crs</span><span class="o">=</span><span class="sh">"</span><span class="s">EPSG:3395</span><span class="sh">"</span><span class="p">).</span><span class="nf">union_all</span><span class="p">()</span>
</code></pre></div></div>

<p>We can actually plot the points (without regions) at this point with a quick GeoPandas Plot:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="nf">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">16</span><span class="p">,</span> <span class="mi">9</span><span class="p">),</span> <span class="n">dpi</span><span class="o">=</span><span class="mi">100</span><span class="p">)</span>
<span class="n">gpd</span><span class="p">.</span><span class="nc">GeoSeries</span><span class="p">(</span><span class="n">area_shp</span><span class="p">).</span><span class="nf">plot</span><span class="p">(</span><span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">,</span> <span class="n">facecolor</span><span class="o">=</span><span class="sh">'</span><span class="s">white</span><span class="sh">'</span><span class="p">,</span> <span class="n">edgecolor</span><span class="o">=</span><span class="sh">'</span><span class="s">gray</span><span class="sh">'</span><span class="p">)</span>
<span class="n">gpd</span><span class="p">.</span><span class="nc">GeoSeries</span><span class="p">(</span><span class="n">points</span><span class="p">).</span><span class="nf">plot</span><span class="p">(</span><span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="sh">'</span><span class="s">red</span><span class="sh">'</span><span class="p">,</span> <span class="n">markersize</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">set_aspect</span><span class="p">(</span><span class="sh">'</span><span class="s">equal</span><span class="sh">'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="nf">title</span><span class="p">(</span><span class="sh">"</span><span class="s">All IKEA locations (no Voronoi)</span><span class="sh">"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="nf">show</span><span class="p">()</span>
</code></pre></div></div>

<p><a href="/assets/posts/20250625-dots-only.png"><img src="/assets/posts/20250625-dots-only.png" alt="All IKEA locations but no Voronoi" /></a></p>

<h2 id="finally-voronoi">Finally Voronoi</h2>

<p>Now to generate the Voronoi regions, all we have to do is call <code class="language-plaintext highlighter-rouge">voronoi_polygons</code>, ensure none of the regions escape the boundaries (probably not affecting anything in this iteration, but it took me a while to get here so :D), and turn the regions into a GeoDataFrame we can use to plot things.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">shapely</span> <span class="kn">import</span> <span class="n">voronoi_polygons</span>
<span class="n">regions</span> <span class="o">=</span> <span class="nf">voronoi_polygons</span><span class="p">(</span><span class="n">multipoints</span><span class="p">,</span> <span class="n">extend_to</span><span class="o">=</span><span class="n">area_shp</span><span class="p">)</span>
<span class="n">regions_clipped</span> <span class="o">=</span> <span class="p">[</span><span class="n">r</span><span class="p">.</span><span class="nf">intersection</span><span class="p">(</span><span class="n">area_shp</span><span class="p">)</span> <span class="k">for</span> <span class="n">r</span> <span class="ow">in</span> <span class="n">regions</span><span class="p">.</span><span class="n">geoms</span><span class="p">]</span>

<span class="n">gdf_voronoi</span> <span class="o">=</span> <span class="n">gpd</span><span class="p">.</span><span class="nc">GeoDataFrame</span><span class="p">(</span><span class="n">geometry</span><span class="o">=</span><span class="n">regions_clipped</span><span class="p">,</span> <span class="n">crs</span><span class="o">=</span><span class="sh">"</span><span class="s">EPSG:3395</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">len</span><span class="p">(</span><span class="n">gdf_voronoi</span><span class="p">)</span>
</code></pre></div></div>

<p>Still 574 regions, so we’re good!</p>

<h2 id="plotting">Plotting</h2>

<p>Plotting turned out to be both easier and hard than thought I thought. Initial plotting was pretty easy, but colouring the regions was a bit harder.</p>

<p>It’s very interesting to me that so many standard tools for the Python Data Science-y stack work well here because GeoDataFrames are… well DataFrames.</p>

<p>So we can use matplotlib to plot things (I normally love turning to <code class="language-plaintext highlighter-rouge">plotnine</code> but good luck doing that for this, I think :D). For the colour, we’re using <code class="language-plaintext highlighter-rouge">networkx</code> to look at neighbouring regions and try to mitigate cases with similar colours.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="n">matplotlib.cm</span> <span class="k">as</span> <span class="n">cm</span>
<span class="kn">import</span> <span class="n">matplotlib.colors</span> <span class="k">as</span> <span class="n">colors</span>
<span class="kn">import</span> <span class="n">networkx</span>
</code></pre></div></div>

<p><br /></p>

<p>We can build a network graph with networkx from the voronoi geometry by checking if they touch.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">network_graph</span> <span class="o">=</span> <span class="n">networkx</span><span class="p">.</span><span class="nc">Graph</span><span class="p">()</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">polygon_i</span> <span class="ow">in</span> <span class="nf">enumerate</span><span class="p">(</span><span class="n">gdf_voronoi</span><span class="p">.</span><span class="n">geometry</span><span class="p">):</span>
    <span class="n">network_graph</span><span class="p">.</span><span class="nf">add_node</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">j</span><span class="p">,</span> <span class="n">polygon_j</span> <span class="ow">in</span> <span class="nf">enumerate</span><span class="p">(</span><span class="n">gdf_voronoi</span><span class="p">.</span><span class="n">geometry</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">::],</span> <span class="n">start</span><span class="o">=</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">):</span>
            <span class="k">if</span> <span class="n">polygon_i</span><span class="p">.</span><span class="nf">touches</span><span class="p">(</span><span class="n">polygon_j</span><span class="p">):</span>
                <span class="n">network_graph</span><span class="p">.</span><span class="nf">add_edge</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">)</span>
</code></pre></div></div>

<p><br /></p>

<p>Now we need to turn this into a colormap that can be used in matplotlib. I ended up using <code class="language-plaintext highlighter-rouge">tab20</code> as the base colormap, but you could easily vary this to any colormap you wish.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">nx_colormap</span> <span class="o">=</span> <span class="n">networkx</span><span class="p">.</span><span class="n">coloring</span><span class="p">.</span><span class="nf">greedy_color</span><span class="p">(</span><span class="n">network_graph</span><span class="p">,</span> <span class="n">strategy</span><span class="o">=</span><span class="n">networkx</span><span class="p">.</span><span class="n">coloring</span><span class="p">.</span><span class="n">strategy_largest_first</span><span class="p">)</span>
<span class="n">max_color</span> <span class="o">=</span> <span class="nf">max</span><span class="p">(</span><span class="n">nx_colormap</span><span class="p">.</span><span class="nf">values</span><span class="p">())</span>
<span class="n">colormap</span> <span class="o">=</span> <span class="n">cm</span><span class="p">.</span><span class="nf">get_cmap</span><span class="p">(</span><span class="sh">"</span><span class="s">tab20</span><span class="sh">"</span><span class="p">,</span> <span class="n">max_color</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span>
<span class="n">gdf_voronoi</span><span class="p">[</span><span class="sh">"</span><span class="s">color</span><span class="sh">"</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span><span class="n">colors</span><span class="p">.</span><span class="nf">to_hex</span><span class="p">(</span><span class="nf">colormap</span><span class="p">(</span><span class="n">nx_colormap</span><span class="p">[</span><span class="n">i</span><span class="p">]))</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="nf">len</span><span class="p">(</span><span class="n">gdf_voronoi</span><span class="p">))]</span>
</code></pre></div></div>

<p><br /></p>

<p>With the colormap in hand, the last thing needed is to turn this into a plot. The <code class="language-plaintext highlighter-rouge">dpi</code> is pretty arbitrary here, I started with like 120 and then started blowing it up so I could see more details.</p>

<p>I am still not entirely happy with the colours, but for the most part it works. The main case where this fails the multiple regions with the same colour adjacent between<code class="language-plaintext highlighter-rouge">Reykjavik, Iceland</code> and <code class="language-plaintext highlighter-rouge">Quebec City, Quebec, Canada</code>.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fig</span><span class="p">,</span> <span class="n">axes</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="nf">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">16</span><span class="p">,</span> <span class="mi">9</span><span class="p">),</span> <span class="n">dpi</span><span class="o">=</span><span class="mi">600</span><span class="p">)</span>
<span class="n">axes</span><span class="p">.</span><span class="nf">set_aspect</span><span class="p">(</span><span class="sh">"</span><span class="s">equal</span><span class="sh">"</span><span class="p">)</span>
<span class="n">axes</span><span class="p">.</span><span class="nf">axis</span><span class="p">(</span><span class="sh">'</span><span class="s">off</span><span class="sh">'</span><span class="p">)</span>
<span class="n">gpd</span><span class="p">.</span><span class="nc">GeoSeries</span><span class="p">(</span><span class="n">area_shp</span><span class="p">).</span><span class="nf">plot</span><span class="p">(</span><span class="n">ax</span><span class="o">=</span><span class="n">axes</span><span class="p">,</span> <span class="n">facecolor</span><span class="o">=</span><span class="sh">"</span><span class="s">none</span><span class="sh">"</span><span class="p">,</span> <span class="n">edgecolor</span><span class="o">=</span><span class="sh">"</span><span class="s">gray</span><span class="sh">"</span><span class="p">)</span>
<span class="n">gdf_voronoi</span><span class="p">.</span><span class="nf">plot</span><span class="p">(</span><span class="n">ax</span><span class="o">=</span><span class="n">axes</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.4</span><span class="p">,</span> <span class="n">edgecolor</span><span class="o">=</span><span class="sh">"</span><span class="s">black</span><span class="sh">"</span><span class="p">,</span> <span class="n">facecolor</span><span class="o">=</span><span class="n">gdf_voronoi</span><span class="p">[</span><span class="sh">"</span><span class="s">color</span><span class="sh">"</span><span class="p">])</span>
<span class="n">gpd</span><span class="p">.</span><span class="nc">GeoSeries</span><span class="p">(</span><span class="n">points</span><span class="p">).</span><span class="nf">plot</span><span class="p">(</span><span class="n">ax</span><span class="o">=</span><span class="n">axes</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="sh">"</span><span class="s">red</span><span class="sh">"</span><span class="p">,</span> <span class="n">markersize</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="nf">title</span><span class="p">(</span><span class="sh">"</span><span class="s">The world divided by nearest IKEA (Voronoi diagram)</span><span class="sh">"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="nf">show</span><span class="p">()</span>
</code></pre></div></div>

<h2 id="final-map">Final map</h2>

<p>Here is the final map (click on it for the full resolution copy):</p>

<p><br /></p>

<p><a href="/assets/posts/20250625-final-map.png"><img src="/assets/posts/20250625-final-map.png" alt="Final map" /></a></p>

<h2 id="conclusions">Conclusions</h2>

<p>I was quite surprised with how easy it was to get Voronoi diagrams once the data was in there. The code to generate the regions was just 3 lines, I’m very impressed with the strength of the libraries in this ecosystem.</p>

<p>The map was interesting, but it is too large a scale to encompass in a single PNG. I think an interactive map would have been way better. I think I may try and do this again in some format with Folium for that, at least for my own exploration.</p>

<p>In terms of the data, I was quite surprised with how many IKEAs there were in the Canary Island. Despite my relative proximity to it (I was born in Italy), I didn’t realize there were over two million people living there, but even with that in mind, that’s quite the density (especially compared to Canada!).</p>

<p>European density was a challenge, and this would require a separate, more focused diagram to show this.</p>

<h2 id="bonus-map-aritzia-stores">Bonus map: Aritzia stores</h2>

<p>As Evan was involved, we figured it would only be natural to get a match one of Aritzia locations.
The changes needed were quite minimal: new overpass turbo query, and a different boundary box so that only Canada and the United States are included.</p>

<p>Here is the <a href="https://overpass-turbo.eu/s/271z">overpass query</a>:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[out:json][timeout:60];
(
  node["shop"="clothes"]["name"~"Aritzia", i];
  way["shop"="clothes"]["name"~"Aritzia", i];
  relation["shop"="clothes"]["name"~"Aritzia", i];
);
out center;
</code></pre></div></div>

<p><br /></p>

<p>The other change is the filtering. You could choose to filter by <code class="language-plaintext highlighter-rouge">world["CONTINENT"] == "North America"</code> or by the individual countries:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">world</span> <span class="o">=</span> <span class="n">gpd</span><span class="p">.</span><span class="nf">read_file</span><span class="p">(</span><span class="sh">"</span><span class="s">ne_110m_admin_0_countries.shp</span><span class="sh">"</span><span class="p">)</span>
<span class="n">world</span> <span class="o">=</span> <span class="n">world</span><span class="p">.</span><span class="nf">set_crs</span><span class="p">(</span><span class="n">epsg</span><span class="o">=</span><span class="mi">4326</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">world</span> <span class="o">=</span> <span class="n">world</span><span class="p">[</span><span class="n">world</span><span class="p">[</span><span class="sh">"</span><span class="s">NAME</span><span class="sh">"</span><span class="p">].</span><span class="nf">isin</span><span class="p">([</span><span class="sh">"</span><span class="s">Canada</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">United States of America</span><span class="sh">"</span><span class="p">])]</span>
</code></pre></div></div>

<p>So here is the result (click to make full size):</p>

<p><a href="/assets/posts/20250625-aritzia.png"><img src="/assets/posts/20250625-aritzia.png" alt="Aritzia locations" /></a></p>

<p>Fun fact: there are two Aritzia in Edmonton, 10 in the Metro Vancouver area, and 19 in the Greater Toronto Area. This produces some pretty bizarre splits.</p>]]></content><author><name></name></author><category term="dataviz" /><category term="dataviz" /><category term="maps" /><category term="voronoi" /><summary type="html"><![CDATA[Today Evan reminded me that Voronoi diagrams are a thing. Now, I’ve looking for an excuse to play around with the OSM for a while, so here we are. I’ve been looking at things like overlaps of IKEA locations and Costco locations (clearly the two essentials of North American life for me), and I figured it may be fun to start dividing the world into Ikea locations. This was a quick and quite fun project using Python/Jupyter Notebooks, and I’d like to do more in this direction in the next little bit! Here is the final map:]]></summary></entry></feed>