Creating a development environment#

To test out code changes, you’ll need to build pandas from source, which requires a C/C++ compiler and Python environment. If you’re making documentation changes, you can skip to contributing to the documentation but if you skip creating the development environment you won’t be able to build the documentation locally before pushing your changes. It’s recommended to also install the pre-commit hooks.

Step 1: install a C compiler#

How to do this will depend on your platform. If you choose to user Docker in the next step, then you can skip this step.

Windows

You will need Build Tools for Visual Studio 2022.

Note

You DO NOT need to install Visual Studio 2022. You only need “Build Tools for Visual Studio 2022” found by scrolling down to “All downloads” -> “Tools for Visual Studio”. In the installer, select the “Desktop development with C++” Workloads.

Alternatively, you can install the necessary components on the commandline using vs_BuildTools.exe

Alternatively, you could use the WSL and consult the Linux instructions below.

macOS

To use the mamba-based compilers, you will need to install the Developer Tools using xcode-select --install. Otherwise information about compiler installation can be found here: https://devguide.python.org/setup/#macos

Linux

For Linux-based mamba installations, you won’t have to install any additional components outside of the mamba environment. The instructions below are only needed if your setup isn’t based on mamba environments.

Some Linux distributions will come with a pre-installed C compiler. To find out which compilers (and versions) are installed on your system:

# for Debian/Ubuntu:
dpkg --list | grep compiler
# for Red Hat/RHEL/CentOS/Fedora:
yum list installed | grep -i --color compiler

GCC (GNU Compiler Collection), is a widely used compiler, which supports C and a number of other languages. If GCC is listed as an installed compiler nothing more is required.

If no C compiler is installed, or you wish to upgrade, or you’re using a different Linux distribution, consult your favorite search engine for compiler installation/update instructions.

Let us know if you have any difficulties by opening an issue or reaching out on our contributor community Slack.

Step 2: create an isolated environment#

Before we begin, please:

  • Make sure that you have cloned the repository

  • cd to the pandas source directory you just created with the clone command

Option 2: using pip#

You’ll need to have at least the minimum Python version that pandas supports. You also need to have setuptools 51.0.0 or later to build pandas.

Unix/macOS with virtualenv

# Create a virtual environment
# Use an ENV_DIR of your choice. We'll use ~/virtualenvs/pandas-dev
# Any parent directories should already exist
python3 -m venv ~/virtualenvs/pandas-dev

# Activate the virtualenv
. ~/virtualenvs/pandas-dev/bin/activate

# Install the build dependencies
python -m pip install -r requirements-dev.txt

Unix/macOS with pyenv

Consult the docs for setting up pyenv here.

# Create a virtual environment
# Use an ENV_DIR of your choice. We'll use ~/Users/<yourname>/.pyenv/versions/pandas-dev
pyenv virtualenv <version> <name-to-give-it>

# For instance:
pyenv virtualenv 3.9.10 pandas-dev

# Activate the virtualenv
pyenv activate pandas-dev

# Now install the build dependencies in the cloned pandas repo
python -m pip install -r requirements-dev.txt

Windows

Below is a brief overview on how to set-up a virtual environment with Powershell under Windows. For details please refer to the official virtualenv user guide.

Use an ENV_DIR of your choice. We’ll use ~\\virtualenvs\\pandas-dev where ~ is the folder pointed to by either $env:USERPROFILE (Powershell) or %USERPROFILE% (cmd.exe) environment variable. Any parent directories should already exist.

# Create a virtual environment
python -m venv $env:USERPROFILE\virtualenvs\pandas-dev

# Activate the virtualenv. Use activate.bat for cmd.exe
~\virtualenvs\pandas-dev\Scripts\Activate.ps1

# Install the build dependencies
python -m pip install -r requirements-dev.txt

Option 3: using Docker#

pandas provides a DockerFile in the root directory to build a Docker image with a full pandas development environment.

Docker Commands

Build the Docker image:

# Build the image
docker build -t pandas-dev .

Run Container:

# Run a container and bind your local repo to the container
# This command assumes you are running from your local repo
# but if not alter ${PWD} to match your local repo path
docker run -it --rm -v ${PWD}:/home/pandas pandas-dev

Even easier, you can integrate Docker with the following IDEs:

Visual Studio Code

You can use the DockerFile to launch a remote session with Visual Studio Code, a popular free IDE, using the .devcontainer.json file. See https://code.visualstudio.com/docs/remote/containers for details.

PyCharm (Professional)

Enable Docker support and use the Services tool window to build and manage images as well as run and interact with containers. See https://www.jetbrains.com/help/pycharm/docker.html for details.

Option 4: using Gitpod#

Gitpod is an open-source platform that automatically creates the correct development environment right in your browser, reducing the need to install local development environments and deal with incompatible dependencies.

If you are a Windows user, unfamiliar with using the command line or building pandas for the first time, it is often faster to build with Gitpod. Here are the in-depth instructions for building pandas with GitPod.

Step 3: build and install pandas#

You can now run:

# Build and install pandas
python setup.py build_ext -j 4
python -m pip install -e . --no-build-isolation --no-use-pep517

At this point you should be able to import pandas from your locally built version:

$ python
>>> import pandas
>>> print(pandas.__version__)  # note: the exact output may differ
2.0.0.dev0+880.g2b9e661fbb.dirty

This will create the new environment, and not touch any of your existing environments, nor any existing Python installation.

Note

You will need to repeat this step each time the C extensions change, for example if you modified any file in pandas/_libs or if you did a fetch and merge from upstream/main.