16  How to install ogr2ogr in a Databricks notebook (incl. Parquet support)

[!NOTE] This was tested on Serverless notebook, Environment version 2, as well as on a classic notebook with DBR 15.4 LTS. TODO: add that this is for single-node

ogr2ogr is a geospatial file conversion tool, part of GDAL. For example, you can use it to read in a directory of GML (geo XML) files, and write them out to GeoPackage (.gpkg), or even GeoParquet.

TODO: mirroring the colab version, we can add the apt install that doesn’t include parquet yet, spell it out: because you can’t as of june 2025 apt install libgdal-arrow-parquet

%sh
# if http traffic is blocked, we need to use https for `apt` sources
sed -i 's|http://|https://|g' /etc/apt/sources.list.d/ubuntu.sources

# following https://r-spatial.github.io/sf/#ubuntu
sudo apt -y update && apt install -y gdal-bin libgdal-dev

The libgdal-arrow-parquet extension package that we need can be installed via conda-forge. So let’s first install conda-forge 1:

import os
!curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
!bash Miniforge3-$(uname)-$(uname -m).sh -b -p  ~/miniforge
!rm Miniforge3-$(uname)-$(uname -m).sh

os.environ["PATH"] = "~/miniforge/bin:" + os.environ["PATH"]

Now we can add arrow/parquet support:

!conda install libgdal-arrow-parquet -y

os.environ["PROJ_LIB"] = f"{os.path.expanduser('~')}/miniforge/share/proj"

And that’s it:

!ogr2ogr --formats | grep parquet
# Returns:
#   Parquet -vector- (rw+v): (Geo)Parquet (*.parquet)

  1. The curl download link comes from conda-forge and their installation instructions on GitHub.↩︎