7 Import GeoPackage into Delta Lake with DuckDB
Note
NOTE: This example is focusing on using DuckDB to parse the GeoPackage. For more complex GeoPackages, you may need to install GDAL yourself and use the GDAL command line tools.
This example in facts works just as much for any spatial formats that DuckDB Spatial supports via its GDAL integration, see the output of ST_Drivers.
# pick a layer to read
layer_name, geom_field = layers.loc[0, ["layer_name", "geom_field"]]
duckdb.sql(
f"""copy (
select * replace(st_aswkb({geom_field}) as {geom_field})
from
st_read(
'{GPKG_URL}',
layer='{layer_name}')
) to '/Volumes/{CATALOG}/{SCHEMA}/{VOLUME}/{layer_name}.parquet' (format parquet)"""
)
You can store the above spark data frame as a Delta Lake table as needed.