How to install ogr2ogr
in a Databricks notebook (incl. Parquet support)
This was tested on Serverless notebook, Environment version 2, as well as on a classic notebook with DBR 15.4 LTS.
ogr2ogr
is a geospatial file conversion tool, part of GDAL. For example, you can use it to read in a directory of GML (geo XML) files, and write them out to GeoPackage (.gpkg
), or even GeoParquet. You could first try the duckdb way (also using GDAL under the hood), but in some more complex cases you might need to use gdal
CLI yourself.
Note that this installation method only installs gdal
on the driver, so it won’t take advantage of a multi-node cluster, but you can still run gdal
command line tools on a single-node cluster if needed.
In theory, on a classic (non-serverless) Compute, we could just run apt-get install -y gdal-bin
, but as of Aug 2025 this would not include Parquet support yet. Instead, the libgdal-arrow-parquet
extension package that we need can be installed via conda-forge. So let’s first install conda-forge 1:
Now we can add arrow/parquet support:
And that’s it:
The curl download link comes from conda-forge and their installation instructions on GitHub.↩︎