How to install ogr2ogr in a Databricks notebook (incl. Parquet support)

Note

This was tested on Serverless notebook, Environment version 2, as well as on a classic notebook with DBR 15.4 LTS.

ogr2ogr is a geospatial file conversion tool, part of GDAL. For example, you can use it to read in a directory of GML (geo XML) files, and write them out to GeoPackage (.gpkg), or even GeoParquet. You could first try the duckdb way (also using GDAL under the hood), but in some more complex cases you might need to use gdal CLI yourself.

Note

Note that this installation method only installs gdal on the driver, so it won’t take advantage of a multi-node cluster, but you can still run gdal command line tools on a single-node cluster if needed.

In theory, on a classic (non-serverless) Compute, we could just run apt-get install -y gdal-bin, but as of Aug 2025 this would not include Parquet support yet. Instead, the libgdal-arrow-parquet extension package that we need can be installed via conda-forge. So let’s first install conda-forge 1:

import os
!curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
!bash Miniforge3-$(uname)-$(uname -m).sh -b -p  ~/miniforge
!rm Miniforge3-$(uname)-$(uname -m).sh

os.environ["PATH"] = "~/miniforge/bin:" + os.environ["PATH"]

Now we can add arrow/parquet support:

!conda install libgdal-arrow-parquet -y

os.environ["PROJ_LIB"] = f"{os.path.expanduser('~')}/miniforge/share/proj"

And that’s it:

!ogr2ogr --formats | grep parquet
# Returns:
#   Parquet -vector- (rw+v): (Geo)Parquet (*.parquet)

  1. The curl download link comes from conda-forge and their installation instructions on GitHub.↩︎