8  Export Delta Lake table to other formats with DuckDB

8.1 Setup

%pip install duckdb --quiet

import duckdb

duckdb.sql("install spatial; load spatial")
CATALOG = "workspace"
SCHEMA = "default"
VOLUME = "default"

GEOMETRY_COLUMN = "geometry"

spark.sql(f"create volume if not exists {CATALOG}.{SCHEMA}.{VOLUME}")

Let’s first create an example table with GEOMETRY columns:

%sql
create or replace table tmp_geometries as
select
  st_point(0, 0, 4326) as geometry,
  "Null Island" as name
union all
select
  st_transform(st_point(155000, 463000, 28992), 4326) as geometry,
  "Onze Lieve Vrouwetoren" as name
union all
select
  st_makepolygon(
    st_makeline(
      array(
        st_point(- 80.1935973, 25.7741566, 4326),
        st_point(- 64.7563086, 32.3040273, 4326),
        st_point(- 66.1166669, 18.4653003, 4326),
        st_point(- 80.1935973, 25.7741566, 4326)
      )
    )
  ) as geometry,
  "Bermuda Triangle" as name;

select
  *
from
  tmp_geometries
-- Returns:

-- _sqldf:pyspark.sql.connect.dataframe.DataFrame
-- geometry:geometry(OGC:CRS84)
-- name:string

-- geometry name
-- SRID=4326;POINT(0 0) Null Island
-- SRID=4326;POINT(5.3872035084137675 52.15517230119224)    Onze Lieve Vrouwetoren
-- SRID=4326;POLYGON((-80.1935973 25.7741566,-64.7563086 32.3040273,-66.1166669 18.4653003,-80.1935973 25.7741566)) Bermuda Triangle

8.2 Parquet files

We’ll use DuckDB Spatial to write he Geoparquet file, so first, we output the above Delta Lake table as a directory of Parquet files, using lon/lat coordinates.

(You could also use Databricks Temporary Table Credentials API to directly read the Delta Lake table with the DuckDB Delta Extension instead.)

from pyspark.sql import functions as F

spark.table("tmp_geometries").withColumn(
    "geometry", F.expr("st_transform(geometry, 4326)")
).write.mode("overwrite").parquet(
    f"/Volumes/{CATALOG}/{SCHEMA}/{VOLUME}/geometries.parquet"
)

We will use the above parquet export as a stepping stone to produce other formats below.

8.3 Geoparquet

We can use duckdb to transform the Parquet files into a valid Geoparquet files:

Note

(Note that if you didn’t load the DuckDB Spatial extension, the below would still succeed but Geoparquet metadata would not be written.)

query = f"""
load spatial;
copy (
select 
    * replace (st_geomfromwkb({GEOMETRY_COLUMN}) as geometry)
from
    read_parquet('/Volumes/{CATALOG}/{SCHEMA}/{VOLUME}/geometries.parquet/part-*.parquet')
) to '/Volumes/{CATALOG}/{SCHEMA}/{VOLUME}/geometries_geo.parquet' (format parquet)"""
duckdb.sql(query)

There are more details around writing Geoparquet such as writing custom CRS’s or defining a “covering” using bounding boxes, but the above example is already a valid Geoparquet. For example, if your QGIS already supports the Parquet format (as of Aug 2025, the latest Windows version does but the latest macOS version doesn’t), then you can open this file in QGIS:

geoparquet in qgis

(in fact, the GDAL Parquet reader used by QGIS can even open parquet files that are not valid geoparquet, as long as they have a WKB or WKT column and the column name and CRS matches the expected defaults or correctly defined)

8.4 Flatgeobuf

Exporting to Flatgeobuf is very similar to the above. Flatgeobuf as a format has two key advantages here: - It is faster to render (e.g. in QGIS) than Geoparquet, and - It can act as input to tippecanoe (see below), which we’ll use to produce PMTiles, which is even better suited for web mapping.

query = f"""
load spatial;
copy (
select 
    * replace (st_geomfromwkb({GEOMETRY_COLUMN}) as geometry)
from
    read_parquet('/Volumes/{CATALOG}/{SCHEMA}/{VOLUME}/geometries.parquet/part-*.parquet')
) to '/Volumes/{CATALOG}/{SCHEMA}/{VOLUME}/geometries.fgb'
(FORMAT GDAL, DRIVER flatgeobuf, LAYER_CREATION_OPTIONS 'TEMPORARY_DIR=/tmp/')"""
duckdb.sql(query)

9 PMTiles

For PMTiles, while theoretically we could keep using DuckDB Spatial with GDAL, we’ll instead use tippecanoe.

%sh
# ~5 min
cd /tmp && git clone https://github.com/felt/tippecanoe.git
cd tippecanoe
make -j
make install PREFIX=$HOME/.local
import os

HOME = os.environ["HOME"]

# see https://github.com/felt/tippecanoe/blob/main/README.md#try-this-first and e.g.
# https://github.com/OvertureMaps/overture-tiles/blob/main/scripts/2024-07-22/places.sh
# for possible options
!{HOME}/.local/bin/tippecanoe -zg -rg -o /tmp/geometries.pmtiles --drop-densest-as-needed --extend-zooms-if-still-dropping --maximum-tile-bytes=2500000 --progress-interval=10 -l geometries --force /Volumes/{CATALOG}/{SCHEMA}/{VOLUME}/geometries.fgb
# NOTE: this mv will emit an error related to updating metadata ("mv: preserving
# permissions for ‘[...]’: Operation not permitted"), this can be ignored.
!mv /tmp/geometries.pmtiles /Volumes/{CATALOG}/{SCHEMA}/{VOLUME}/geometries.pmtiles

To visualize, download the PMTiles from Volumes, and upload it to https://pmtiles.io/ (see below screenshot). To directly visualize it via Databricks Apps via downloading, see TODO:.

pmtiles_io

TO be clear: the advantage the PMTiles format is to be able to visualize very large datasets such as all of OpenStreetMap – this notebook only uses a very simple example but see TODO: for a larger case.

9.1 Cleanup

%sql
-- drop table tmp_geometries