Contributing to pandas#

All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.

Bug reports and enhancement requests#

Bug reports and enhancement requests are an important part of making pandas more stable and are curated though Github issues. When reporting and issue or request, please select the appropriate category and fill out the issue form fully to ensure others and the core development team can fully understand the scope of the issue.

The issue will then show up to the pandas community and be open to comments/ideas from others.

Finding an issue to contribute to#

If you are brand new to pandas or open-source development, we recommend searching the GitHub “issues” tab to find issues that interest you. Unassigned issues labeled Docs and good first issue are typically good for newer contributors.

Once you’ve found an interesting issue, it’s a good idea to assign the issue to yourself, so nobody else duplicates the work on it. On the Github issue, a comment with the exact text take to automatically assign you the issue (this will take seconds and may require refreshing the page to see it).

If for whatever reason you are not able to continue working with the issue, please unassign it, so other people know it’s available again. You can check the list of assigned issues, since people may not be working in them anymore. If you want to work on one that is assigned, feel free to kindly ask the current assignee if you can take it (please allow at least a week of inactivity before considering work in the issue discontinued).

We have several contributor community communication channels, which you are welcome to join, and ask questions as you figure things out. Among them are regular meetings for new contributors, dev meetings, a dev mailing list, and a Slack for the contributor community. All pandas contributors are welcome to these spaces, where they can connect with each other. Even maintainers who have been with us for a long time felt just like you when they started out, and are happy to welcome you and support you as you get to know how we work, and where things are. Take a look at the next sections to learn more.

Submitting a pull request#

Version control, Git, and GitHub#

pandas is hosted on GitHub, and to contribute, you will need to sign up for a free GitHub account. We use Git for version control to allow many people to work together on the project.

If you are new to Git, you can reference some of these resources for learning Git. Feel free to reach out to the contributor community for help if needed:

Also, the project follows a forking workflow further described on this page whereby contributors fork the repository, make changes and then create a pull request. So please be sure to read and follow all the instructions in this guide.

If you are new to contributing to projects through forking on GitHub, take a look at the GitHub documentation for contributing to projects. GitHub provides a quick tutorial using a test repository that may help you become more familiar with forking a repository, cloning a fork, creating a feature branch, pushing changes and making pull requests.

Below are some useful resources for learning more about forking and pull requests on GitHub:

Getting started with Git#

GitHub has instructions for installing git, setting up your SSH key, and configuring git. All these steps need to be completed before you can work seamlessly between your local repository and GitHub.

Create a fork of pandas#

You will need your own copy of pandas (aka fork) to work on the code. Go to the pandas project page and hit the Fork button. Please uncheck the box to copy only the main branch before selecting Create Fork. You will want to clone your fork to your machine

git clone https://github.com/your-user-name/pandas.git pandas-yourname
cd pandas-yourname
git remote add upstream https://github.com/pandas-dev/pandas.git
git fetch upstream

This creates the directory pandas-yourname and connects your repository to the upstream (main project) pandas repository.

Note

Performing a shallow clone (with --depth==N, for some N greater or equal to 1) might break some tests and features as pd.show_versions() as the version number cannot be computed anymore.

Creating a feature branch#

Your local main branch should always reflect the current state of pandas repository. First ensure it’s up-to-date with the main pandas repository.

git checkout main
git pull upstream main --ff-only

Then, create a feature branch for making your changes. For example

git checkout -b shiny-new-feature

This changes your working branch from main to the shiny-new-feature branch. Keep any changes in this branch specific to one bug or feature so it is clear what the branch brings to pandas. You can have many feature branches and switch in between them using the git checkout command.

When you want to update the feature branch with changes in main after you created the branch, check the section on updating a PR.

Making code changes#

Before modifying any code, ensure you follow the contributing environment guidelines to set up an appropriate development environment.

Then once you have made code changes, you can see all the changes you’ve currently made by running.

git status

For files you intended to modify or add, run.

git add path/to/file-to-be-added-or-changed.py

Running git status again should display

On branch shiny-new-feature

     modified:   /relative/path/to/file-to-be-added-or-changed.py

Finally, commit your changes to your local repository with an explanatory commit message

git commit -m "your commit message goes here"

Pushing your changes#

When you want your changes to appear publicly on your GitHub page, push your forked feature branch’s commits

git push origin shiny-new-feature

Here origin is the default name given to your remote repository on GitHub. You can see the remote repositories

git remote -v

If you added the upstream repository as described above you will see something like

origin  [email protected]:yourname/pandas.git (fetch)
origin  [email protected]:yourname/pandas.git (push)
upstream        git://github.com/pandas-dev/pandas.git (fetch)
upstream        git://github.com/pandas-dev/pandas.git (push)

Now your code is on GitHub, but it is not yet a part of the pandas project. For that to happen, a pull request needs to be submitted on GitHub.

Making a pull request#

One you have finished your code changes, your code change will need to follow the pandas contribution guidelines to be successfully accepted.

If everything looks good, you are ready to make a pull request. A pull request is how code from your local repository becomes available to the GitHub community to review and merged into project to appear the in the next release. To submit a pull request:

  1. Navigate to your repository on GitHub

  2. Click on the Compare & pull request button

  3. You can then click on Commits and Files Changed to make sure everything looks okay one last time

  4. Write a descriptive title that includes prefixes. pandas uses a convention for title prefixes. Here are some common ones along with general guidelines for when to use them:

    • ENH: Enhancement, new functionality

    • BUG: Bug fix

    • DOC: Additions/updates to documentation

    • TST: Additions/updates to tests

    • BLD: Updates to the build process/scripts

    • PERF: Performance improvement

    • TYP: Type annotations

    • CLN: Code cleanup

  5. Write a description of your changes in the Preview Discussion tab

  6. Click Send Pull Request.

This request then goes to the repository maintainers, and they will review the code.

Updating your pull request#

Based on the review you get on your pull request, you will probably need to make some changes to the code. You can follow the code committing steps again to address any feedback and update your pull request.

It is also important that updates in the pandas main branch are reflected in your pull request. To update your feature branch with changes in the pandas main branch, run:

git checkout shiny-new-feature
git fetch upstream
git merge upstream/main

If there are no conflicts (or they could be fixed automatically), a file with a default commit message will open, and you can simply save and quit this file.

If there are merge conflicts, you need to solve those conflicts. See for example at https://help.github.com/articles/resolving-a-merge-conflict-using-the-command-line/ for an explanation on how to do this.

Once the conflicts are resolved, run:

  1. git add -u to stage any files you’ve updated;

  2. git commit to finish the merge.

Note

If you have uncommitted changes at the moment you want to update the branch with main, you will need to stash them prior to updating (see the stash docs). This will effectively store your changes and they can be reapplied after updating.

After the feature branch has been update locally, you can now update your pull request by pushing to the branch on GitHub:

git push origin shiny-new-feature

Any git push will automatically update your pull request with your branch’s changes and restart the Continuous Integration checks.

Updating the development environment#

It is important to periodically update your local main branch with updates from the pandas main branch and update your development environment to reflect any changes to the various packages that are used during development.

If using mamba, run:

git checkout main
git merge upstream/main
mamba activate pandas-dev
mamba env update -f environment.yml --prune

If using pip , do:

git checkout main
git merge upstream/main
# activate the virtual environment based on your platform
python -m pip install --upgrade -r requirements-dev.txt

Tips for a successful pull request#

If you have made it to the Making a pull request phase, one of the core contributors may take a look. Please note however that a handful of people are responsible for reviewing all of the contributions, which can often lead to bottlenecks.

To improve the chances of your pull request being reviewed, you should:

  • Reference an open issue for non-trivial changes to clarify the PR’s purpose

  • Ensure you have appropriate tests. These should be the first part of any PR

  • Keep your pull requests as simple as possible. Larger PRs take longer to review

  • Ensure that CI is in a green state. Reviewers may not even look otherwise

  • Keep Updating your pull request, either by request or every few days