pandas.DataFrame.query#

DataFrame.query(expr, *, inplace=False, **kwargs)[source]#

Query the columns of a DataFrame with a boolean expression.

This method can run arbitrary code which can make you vulnerable to code injection if you pass user input to this function.

Parameters:
exprstr

The query string to evaluate.

See the documentation for eval() for details of supported operations and functions in the query string.

See the documentation for DataFrame.eval() for details on referring to column names and variables in the query string.

inplacebool

Whether to modify the DataFrame rather than creating a new one.

**kwargs

See the documentation for eval() for complete details on the keyword arguments accepted by DataFrame.query().

Returns:
DataFrame or None

DataFrame resulting from the provided query expression or None if inplace=True.

See also

eval

Evaluate a string describing operations on DataFrame columns.

DataFrame.eval

Evaluate a string describing operations on DataFrame columns.

Notes

The result of the evaluation of this expression is first passed to DataFrame.loc and if that fails because of a multidimensional key (e.g., a DataFrame) then the result will be passed to DataFrame.__getitem__().

This method uses the top-level eval() function to evaluate the passed query.

The query() method uses a slightly modified Python syntax by default. For example, the & and | (bitwise) operators have the precedence of their boolean cousins, and and or. This is syntactically valid Python, however the semantics are different.

You can change the semantics of the expression by passing the keyword argument parser='python'. This enforces the same semantics as evaluation in Python space. Likewise, you can pass engine='python' to evaluate an expression using Python itself as a backend. This is not recommended as it is inefficient compared to using numexpr as the engine.

The DataFrame.index and DataFrame.columns attributes of the DataFrame instance are placed in the query namespace by default, which allows you to treat both the index and columns of the frame as a column in the frame. The identifier index is used for the frame index; you can also use the name of the index to identify it in a query. Please note that Python keywords may not be used as identifiers.

For further details and examples see the query documentation in indexing.

Backtick quoted variables

Backtick quoted variables are parsed as literal Python code and are converted internally to a Python valid identifier. This can lead to the following problems.

During parsing a number of disallowed characters inside the backtick quoted string are replaced by strings that are allowed as a Python identifier. These characters include all operators in Python, the space character, the question mark, the exclamation mark, the dollar sign, and the euro sign.

A backtick can be escaped by double backticks.

See also the Python documentation about lexical analysis in combination with the source code in pandas.core.computation.parsing.

Examples

>>> df = pd.DataFrame(
...     {"A": range(1, 6), "B": range(10, 0, -2), "C&C": range(10, 5, -1)}
... )
>>> df
   A   B  C&C
0  1  10   10
1  2   8    9
2  3   6    8
3  4   4    7
4  5   2    6
>>> df.query("A > B")
   A  B  C&C
4  5  2    6

The previous expression is equivalent to

>>> df[df.A > df.B]
   A  B  C&C
4  5  2    6

For columns with spaces in their name, you can use backtick quoting.

>>> df.query("B == `C&C`")
   A   B  C&C
0  1  10   10

The previous expression is equivalent to

>>> df[df.B == df["C&C"]]
   A   B  C&C
0  1  10   10

Using local variable:

>>> local_var = 2
>>> df.query("A <= @local_var")
A   B  C&C
0  1  10   10
1  2   8    9