6 Storing spatial data in Delta Lake as WKB
You can store geometry or geography data in a Delta Lake table in a BINARY column as Well-known binary (WKB or EWKB). This is a more compact representation than well-known text (WKT), and widely supported incl. in the Geoparquet specification. On the other hand, unlike the newer GEOMETRY and GEOGRAPHY types, there is no higher level semantic support possible such as automatic spatial indexing (as of Aug 2025, coming soon). Also, you need to use the conversion function st_geomfromwkb or st_geomfromwekb before any other ST function.
Another example of Delta Lake tables with WKB columns are the Overture Maps datasets prepared by CARTO, available via the Databricks Marketplace. Follow the previous link to add any them (at no cost) to your catalog, if you haven’t yet. For example, for the below query, use Divisions (borders of countries and other administrative divisions):
%sql
select
names:primary as name,
geometry
from
carto_overture_maps_divisions.carto.division_area
where
subtype = 'country'
-- Returns:
-- name geometry
-- """Australia""" AQYAAAAjAAAAAQMAAAABAAAA (truncated)
-- """Timor-Leste""" AQYAAAAEAAAAAQMAAAABAAAA (truncated)
-- """Vanuatu""" AQYAAABlAAAAAQMAAAABAAAA (truncated)
-- ... ...
These CARTO tables also show one pattern to organize and cluster tables with geometries: they include the bounding box columns __carto_xmin
, __carto_xmax
, __carto_ymin
, ___carto_ymax
and are clustered by these colums.
Another pattern would be to make use of spatial indexing such as H3.
6.1 Example usage with ST functions
NOTE: as of 2025-08-03, ST functions are only available on DBR 17.1+ and thus not available on Databricks Free Edition, but this limitation might be lifted soon.
But what if you wanted to use
show srid
%sql
-- The distance between the UK and France
with countries as (
select
country,
st_geomfromwkb(geometry) wkb_geometry
from
carto_overture_maps_divisions.carto.division_area
where
subtype = 'country'
and class = 'land'
),
uk as (
select
wkb_geometry
from
countries
where
country = 'GB'
),
fr as (
select
wkb_geometry
from
countries
where
country = 'FR'
)
select
st_distancespheroid(uk.wkb_geometry, fr.wkb_geometry) / 1e3 distance_km
from
uk, fr
Now the question is, if the Strait of Dover is about 32 kms narrow, why did we get more above? For this, we’d need to find out where exactly the shortest line is between the two countries. There is no st_shortestline yet in Databricks ST functions as of 2025-08-03, but we can use a Spark UDF with DuckDB spatial to fill this gap, see here TODO: .