pandas.DataFrame.query¶
-
DataFrame.
query
(self, expr, inplace=False, **kwargs)[source]¶ Query the columns of a DataFrame with a boolean expression.
Parameters: - expr : str
The query string to evaluate. You can refer to variables in the environment by prefixing them with an ‘@’ character like
@a + b
.New in version 0.25.0.
You can refer to column names that contain spaces by surrounding them in backticks.
For example, if one of your columns is called
a a
and you want to sum it withb
, your query should be`a a` + b
.- inplace : bool
Whether the query should modify the data in place or return a modified copy.
- **kwargs
See the documentation for
eval()
for complete details on the keyword arguments accepted byDataFrame.query()
.New in version 0.18.0.
Returns: - DataFrame
DataFrame resulting from the provided query expression.
See also
eval
- Evaluate a string describing operations on DataFrame columns.
DataFrame.eval
- Evaluate a string describing operations on DataFrame columns.
Notes
The result of the evaluation of this expression is first passed to
DataFrame.loc
and if that fails because of a multidimensional key (e.g., a DataFrame) then the result will be passed toDataFrame.__getitem__()
.This method uses the top-level
eval()
function to evaluate the passed query.The
query()
method uses a slightly modified Python syntax by default. For example, the&
and|
(bitwise) operators have the precedence of their boolean cousins,and
andor
. This is syntactically valid Python, however the semantics are different.You can change the semantics of the expression by passing the keyword argument
parser='python'
. This enforces the same semantics as evaluation in Python space. Likewise, you can passengine='python'
to evaluate an expression using Python itself as a backend. This is not recommended as it is inefficient compared to usingnumexpr
as the engine.The
DataFrame.index
andDataFrame.columns
attributes of theDataFrame
instance are placed in the query namespace by default, which allows you to treat both the index and columns of the frame as a column in the frame. The identifierindex
is used for the frame index; you can also use the name of the index to identify it in a query. Please note that Python keywords may not be used as identifiers.For further details and examples see the
query
documentation in indexing.Examples
>>> df = pd.DataFrame({'A': range(1, 6), ... 'B': range(10, 0, -2), ... 'C C': range(10, 5, -1)}) >>> df A B C C 0 1 10 10 1 2 8 9 2 3 6 8 3 4 4 7 4 5 2 6 >>> df.query('A > B') A B C C 4 5 2 6
The previous expression is equivalent to
>>> df[df.A > df.B] A B C C 4 5 2 6
For columns with spaces in their name, you can use backtick quoting.
>>> df.query('B == `C C`') A B C C 0 1 10 10
The previous expression is equivalent to
>>> df[df.B == df['C C']] A B C C 0 1 10 10