Map tiles from OpenStreetMap to PBF

When we started to work on Safia, we thought the hard part would be writing the code. We figured out the big fight will be the infrastructure, especially finding the best way to deliver the map.

We made the decision to use a custom map based on OpenStreetMap (OSM). We had some requirements for our map. We wanted it to be:

Fast to deliver
Cheap to operate
Scalable

While looking for the best solution, we went through almost every format available, .osm.pbf, .mbtiles and .pmtiles before finally choosing a format that has shown how incredible and robust it was, .pbf.

In this article, I'll walk you through our journey explaining how we tested each format and why we ended up choosing .pbf.

Why we gave up on the "Better" formats

If you are looking up at how to host vector maps, you'll find out there are many conflicting solutions. Here we are providing the break-down of what we tried and why we considered it not being the best fit for our specific use case. We also explain why we think the selected format was the best one for us.

The resource-intensive MBTiles format

MBTiles is the standard output for most map generators. It is an open file format for storing map tiles in a single SQLite database file, designed for packaging and sharing map data.

The Plus: Easy to generate.
The Bad: The problem with this format, it's that you can just host the file on a static server. As it's basically an SQLite file, you need also to have an SQLite server running and a service that will expose those tiles on the fly. For that purpose, an open source solution such as TileServer GL can be used.
The Result: This meant paying for one or more VPS, managing Node.js or Docker processes, and dealing with latency. We did not want to deal with that. We needed a solution that would allow us to sleep at night and not worry either the tile server has crashed or not.

As stated early, generating the MBTiles file format is easy. In any case, in the process of generating the other format, starting with this one seems to us to be the easiest path. Here is how to generate this format from .osm.pfb files available on Geofabrik.

docker run -e JAVA_TOOL_OPTIONS="-Xmx92g" \
    -v $(pwd)/data:/data ghcr.io/onthegomap/planetiler:latest \
    --download --area=planet output="/data/planet.mbtiles"

The other advantage is that this format is that, as it's an SQLite file, it can be queried. This feature helped us to decide which region we will support to start. As an example, in the following table you can find how many tiles exist for each country.

Region	Technical zone name	Number of Tiles
South America	south-america	25.016.020
Central America	central-america	3.346.044
Africa	africa	26.368.417
Europe	europe	22.690.435
Mexico	mexico	3.138.875
US North East	us-northEast	310.896
US Midwest	us-midwest	929.178
US West	us-west	1.883.228
US South	us-south	1.412.192
US Pacific	us-pacific	45.349.269
Alask	Alaska	23.759.080
Hawaï	us/hawai	2.288.892
Notrh America	north-america	132.795.233
Australia - Oceania	australia-oceania	82.138.548
Asia	Asia	136.569.674
India	india	1.878.664
New-Zealand	new-zealand	37.199.242
Australia	australia	16.293.369
Canada	canada	29.291.767
Planet	planet	274.803.489

Note that the number of tiles for the whole planet is lower than the sum of tiles for Asia, North America, and Europe.

The monolithic PMTiles format

The PMTiles format is the most recent one. It's a single file archive containing all the tiles and accessible using HTTP Range Requests. Here is what we find out regarding this format.

The Plus: It’s just one file for the whole planet map (approximatevely 80GB). This file can be hosted on any static server that supports HTTP range requests such as NGINX, but also on S3. Even if it's this big, the client won't download the whole 80GB but a range containing the tile needed by to be displayed by the map.
The Bad: It's Worse at caching the tiles. For our setup, we found that caching was a headache. If you update the map, you are invalidating a massive archive. We also ran into limitations regarding how etag headers where generated for the range request. The range was never cached this means that even if a tile has already been fetched from the server, it will be requested again if the user tries to reload a portion of the map that needed it.
We wanted granular control over caching.

Even if we decided to not use this format, we must acknowledge it's the best fit when the goal is to bundle a small region within the application.

To generate the .pmtiles file we used the .mbtiles generated in the previous section. Thanks to Protomaps, the pmtiles CLI they provide was very helpful.

pmtiles convert data/planet.mbtiles data/planet.pmtiles

This command will run for a long time depending on the size of the planet.mbtiles file and resources available on the machine used to perform the extraction.

The static PBF format

The PBF format has been around since 2001 and was made public in 2007. This means broader compatibility with many libraries. Also with this format, you don't need a database.
By extracting the raw files inside the database into a directory structure (/z/x/y.pbf), we could host the map on a simple server like Nginx or Apache, but also on any standard object storage such as Cloudflare R2, AWS S3, etc. and expose them.

What did this mean to us?

No Database: Zero CPU usage to serve a tile.
CDN Magic: We can put Cloudflare in front of it. In this architecture, each file is cached individually, which means invalidating a tile will only invalidate that tile not the whole map.
Cost: Almost zero to operate. We will talk about the cost in another article to give a bit picture of what to consider.

How to go from osp.pfb to pbf?

Here is the workflow we used to convert raw OpenStreetMap data in a static PFB file that can be used for the web. This is not necesserely the easiest one, but during our tries and errors it has shown itself to be effective.

Prerequisites

You will need the following tools installed on either your local machine or a build server.

Docker is needed to run the tools that will take care of generating the vector tile database.
Planetiler is the tool that will perform the extraction of the vector tiles.
Tippecanoe: Expose a command tile-join that we need to extract the tiles.

Depending on the size of the region you want to process, you will need to have an appropriate resource. For the whole planet, an 8 vCPU virtual machine with 128GB of Ram and 600GB of storage did the work in less than an hour. The map does not take 600GB to host, but the process needs that storage.

From osm to MBTiles

For this purpose, we are using Planetiler. After multiple attempts, we found out the easiest way was running it with docker.

In this example, we will be extracting tiles for the whole planet. But if you need to extract for a specific country like belgium, feel free to adapt by replacing planet with belgium.

docker run -e JAVA_TOOL_OPTIONS="-Xmx92g" \
    -v $(pwd)/data:/data ghcr.io/onthegomap/planetiler:latest \
    --download --area=planet output="/data/planet.mbtiles"

Pay attention to the -e JAVA_TOOL_OPTIONS="-Xmx92g". As stated earlier, the extraction of the whole planet is large, and we need to set the ram available to Java to higher values; otherwise the extraction will fail. An extraction for small countries like Belgium will take a few minutes and require less than 4GB of ram dedicated to Java.

From MBTiles to pbf files

We have decided to use the command tile-join from Tippercanoe for this purpose. This choice was led by our goal, which was to extract from the whole planet database zoom levels 1 - 10. Then extract from specific zone or continent zoom levels greater or equal to 11.

Using planet.mbtiles extracted earlier, let's use the following command to generate the expected pfb files in the directory /tiles/planet.

$MAX_ZOOM=10
$MIN_ZOOM=0
mkdir -p ./tiles/planet
tile-join -e ./tiles/planet -z $MAX_ZOOM -Z $MIN_ZOOM -pk --force ./data/planet.mbtiles

Of course, it's important to consider that higher zooms contain more files than lower ones. Extracting zoom levels 0 - 10 will take less time than extracting zoom 14 alone.

The following table shows how many tiles have each zoom level.

Zoom level	Number of tiles
0	1
1	4
2	16
3	64
4	256
5	1.024
6	4.096
7	13.950
8	55.097
9	218.186
10	860.573
11	3.378.771
12	13.258.263
13	52.159.923
14	204.853.265
Total	274.803.489

Extracting the whole planet tiles will generate each tile as a single file. But because each file consumes an inode, for the complete planet having a bunch of them available is mandatory. On our side during this research, we had 300GB of free space but went out of inodes. So it's important to make sure inodes available on your system are greater than the number of tiles.

Summary

The decision to use static .pfb files seems to be the right one for the long run. We went from a simple file management process to a file serving process that was simple to update and to serve.

After generating these tiles, we took a drastic decision to not host them ourselves on a nginx server but to rely on Cloudflare R2.

But transferring 87 million tiles to a remote server, even S3, is not that simple. In the next article, we will take a look into that, especially the cost and the right tools to use.