7  Import GeoPackage into Delta Lake with DuckDB

Note

NOTE: This example is focusing on using DuckDB to parse the GeoPackage. For more complex GeoPackages, you may need to install GDAL yourself and use the GDAL command line tools.

This example in facts works just as much for any spatial formats that DuckDB Spatial supports via its GDAL integration, see the output of ST_Drivers.

%pip install duckdb --quiet
import duckdb

duckdb.sql("install spatial; load spatial")
GPKG_URL = "https://service.pdok.nl/kadaster/bestuurlijkegebieden/atom/v1_0/downloads/BestuurlijkeGebieden_2025.gpkg"

CATALOG = "workspace"
SCHEMA = "default"
VOLUME = "default"
layers = duckdb.sql(
    f"""
with t as (
    select unnest(layers) layer
     from st_read_meta('{GPKG_URL}'))
select
    layer.name layer_name,
    layer.geometry_fields[1].name geom_field
from t"""
).df()

layers

# Returns:

# layer_name    geom_field
# 0 gemeentegebied  geom
# 1 landgebied  geom
# 2 provinciegebied geom
# pick a layer to read
layer_name, geom_field = layers.loc[0, ["layer_name", "geom_field"]]

duckdb.sql(
    f"""copy (
  select * replace(st_aswkb({geom_field}) as {geom_field})
  from
    st_read(
      '{GPKG_URL}',
      layer='{layer_name}')
  ) to '/Volumes/{CATALOG}/{SCHEMA}/{VOLUME}/{layer_name}.parquet' (format parquet)"""
)
spark.read.parquet(
    "/Volumes/{CATALOG}/{SCHEMA}/{VOLUME}/{layer_name}.parquet"
).display()  # noqa: S108

You can store the above spark data frame as a Delta Lake table as needed.