With the release of DuckDB 1.1.1
, now we have support for reading GeoParquet files! With this exciting update we can query rich datasets from Overture Maps using python via Ibis with the performance of DuckDB
.
But the good news doesn’t stop there, since Ibis 9.2
, lonboard
can plot data directly from an Ibis
table, adding more simplicity and speed to your geospatial analysis.
Let’s dive into how these tools come together.
Installation
First make sure you have duckdb>=1.1.1
, then install Ibis with the dependencies needed to work with geospatial data using DuckDB.
$ pip install 'duckdb>=1.1.1'
$ pip install 'ibis-framework[duckdb,geospatial]' lonboard
Motivation
Overture Maps is an open-source initiative that provides high-quality, interoperable map data by integrating contributions from leading companies and open data sources to support a wide range of applications.
Overture Maps offers a variety of datasets to query. For example, there is plenty of information about power infrastructure.
Let’s create some plots of the U.S. power infrastructure. We’ll look into power plants and power lines for the lower 48 states (excluding Hawaii and Alaska for simplicity of the bounding box).
Download data
First we import Ibis, its deferred expression object _
, and we use our default backend, DuckDB:
import ibis
from ibis import _
= ibis.get_backend() # default duckdb backend con
With Ibis and DuckDB we can be more specific about the data we want thanks to the filter push down. For example, if we want to select only a few columns and look only at the power infrastructure when can do this as follow.
# look into type infrastructure
= (
url "s3://overturemaps-us-west-2/release/2024-07-22.0/theme=base/type=infrastructure/*"
)= con.read_parquet(url, table_name="infra-usa")
t
# filter for USA bounding box, subtype="power", and selecting only few columns
= t.filter(
expr > -125.0,
_.bbox.xmin > 24.8,
_.bbox.ymin < -65.8,
_.bbox.xmax < 49.2,
_.bbox.ymax == "power",
_.subtype "names", "geometry", "bbox", "class", "sources", "source_tags"]) ).select([
If you inspect expr, you can see that the filters and projections get pushed down, meaning you only download the data that you asked for.
"power-infra-usa.geoparquet") con.to_parquet(expr,
Now that we have the data lets explore it in Ibis interactive mode and make some beautiful maps.
Data exploration
To explore the data interactively we turn on the interactive mode:
= True ibis.options.interactive
= con.read_parquet("power-infra-usa.geoparquet")
usa_power_infra usa_power_infra
Let’s quickly rename the class
column, since this is a reserved word and causes conflicts when using the deferred operator:
= usa_power_infra.rename(infra_class="class") usa_power_infra
We take a look at the different classes of infrastructure under the subtype power:
usa_power_infra.infra_class.value_counts().order_by("infra_class_count")
ibis.desc(=15) ).preview(max_rows
Looks like we have plant
, power_line
and minor_line
among others.
= usa_power_infra.filter(_.infra_class=="plant")
plants = usa_power_infra.filter(_.infra_class=="power_line")
power_lines = usa_power_infra.filter(_.infra_class=="minor_line") minor_lines
Plotting with Lonboard
Lonboard is a Python plotting library optimized for efficient visualizations of large geospatial data. It integrates well with Ibis and DuckDB, making interactive plotting scalable.
You can try this in your machine, for the purpose the blog file size, we will show screenshots of the visualization
import lonboard
from lonboard.basemap import CartoBasemap # to choose color of basemap
Let’s visualize the power plants
lonboard.viz(
plants,={"get_fill_color": "red"},
scatterplot_kwargs={"get_fill_color": "red"},
polygon_kwargs={
map_kwargs"basemap_style": CartoBasemap.Positron,
"view_state": {"longitude": -100, "latitude": 36, "zoom": 3},
}, )
If you are visualizing this in your machine, you can zoom in and see some of the geometry where the plants are located. As an example, we can plot in a small area of California:
= plants.filter(
plants_CA -118.6, -117.9), _.bbox.ymin.between(34.5, 35.3)
_.bbox.xmin.between( ).select(_.names.primary, _.geometry)
lonboard.viz(
plants_CA,={"get_fill_color": "red"},
scatterplot_kwargs={"get_fill_color": "red"},
polygon_kwargs={
map_kwargs"basemap_style": CartoBasemap.Positron,
}, )
We can also visualize together the power_lines
and the minor_lines
by doing:
lonboard.viz([minor_lines, power_lines])
and that’s how you can visualize ~7 million coordinates from the comfort of your laptop.
>>> power_lines.geometry.n_points().sum()
5329836
>>> minor_lines.geometry.n_points().sum()
1430042
With Ibis and DuckDB working with geospatial data has never been easier or faster. We saw how to query a dataset from Overture Maps with the simplicity of Python and the performance of DuckDB. Last but not least, we saw how simple and quick Lonboard got us from query-to-plot. Together, these libraries make exploring and handling geospatial data a breeze.
Resources
Chat with us on Zulip: