pandas.merge_asof¶
- 
pandas.merge_asof(left, right, on=None, left_on=None, right_on=None, left_index=False, right_index=False, by=None, left_by=None, right_by=None, suffixes=('_x', '_y'), tolerance=None, allow_exact_matches=True, direction='backward')[source]¶
- Perform an asof merge. This is similar to a left-join except that we match on nearest key rather than equal keys. - Both DataFrames must be sorted by the key. - For each row in the left DataFrame: - A “backward” search selects the last row in the right DataFrame whose ‘on’ key is less than or equal to the left’s key.
- A “forward” search selects the first row in the right DataFrame whose ‘on’ key is greater than or equal to the left’s key.
- A “nearest” search selects the row in the right DataFrame whose ‘on’ key is closest in absolute distance to the left’s key.
 - The default is “backward” and is compatible in versions below 0.20.0. The direction parameter was added in version 0.20.0 and introduces “forward” and “nearest”. - Optionally match on equivalent keys with ‘by’ before searching with ‘on’. - New in version 0.19.0. - Parameters: - left : DataFrame - right : DataFrame - on : label - Field name to join on. Must be found in both DataFrames. The data MUST be ordered. Furthermore this must be a numeric column, such as datetimelike, integer, or float. On or left_on/right_on must be given. - left_on : label - Field name to join on in left DataFrame. - right_on : label - Field name to join on in right DataFrame. - left_index : boolean - Use the index of the left DataFrame as the join key. - New in version 0.19.2. - right_index : boolean - Use the index of the right DataFrame as the join key. - New in version 0.19.2. - by : column name or list of column names - Match on these columns before performing merge operation. - left_by : column name - Field names to match on in the left DataFrame. - New in version 0.19.2. - right_by : column name - Field names to match on in the right DataFrame. - New in version 0.19.2. - suffixes : 2-length sequence (tuple, list, ...) - Suffix to apply to overlapping column names in the left and right side, respectively. - tolerance : integer or Timedelta, optional, default None - Select asof tolerance within this range; must be compatible with the merge index. - allow_exact_matches : boolean, default True - If True, allow matching with the same ‘on’ value (i.e. less-than-or-equal-to / greater-than-or-equal-to)
- If False, don’t match the same ‘on’ value (i.e., stricly less-than / strictly greater-than)
 - direction : ‘backward’ (default), ‘forward’, or ‘nearest’ - Whether to search for prior, subsequent, or closest matches. - New in version 0.20.0. - Returns: - merged : DataFrame - See also - Examples - >>> left a left_val 0 1 a 1 5 b 2 10 c - >>> right a right_val 0 1 1 1 2 2 2 3 3 3 6 6 4 7 7 - >>> pd.merge_asof(left, right, on='a') a left_val right_val 0 1 a 1 1 5 b 3 2 10 c 7 - >>> pd.merge_asof(left, right, on='a', allow_exact_matches=False) a left_val right_val 0 1 a NaN 1 5 b 3.0 2 10 c 7.0 - >>> pd.merge_asof(left, right, on='a', direction='forward') a left_val right_val 0 1 a 1.0 1 5 b 6.0 2 10 c NaN - >>> pd.merge_asof(left, right, on='a', direction='nearest') a left_val right_val 0 1 a 1 1 5 b 6 2 10 c 7 - We can use indexed DataFrames as well. - >>> left left_val 1 a 5 b 10 c - >>> right right_val 1 1 2 2 3 3 6 6 7 7 - >>> pd.merge_asof(left, right, left_index=True, right_index=True) left_val right_val 1 a 1 5 b 3 10 c 7 - Here is a real-world times-series example - >>> quotes time ticker bid ask 0 2016-05-25 13:30:00.023 GOOG 720.50 720.93 1 2016-05-25 13:30:00.023 MSFT 51.95 51.96 2 2016-05-25 13:30:00.030 MSFT 51.97 51.98 3 2016-05-25 13:30:00.041 MSFT 51.99 52.00 4 2016-05-25 13:30:00.048 GOOG 720.50 720.93 5 2016-05-25 13:30:00.049 AAPL 97.99 98.01 6 2016-05-25 13:30:00.072 GOOG 720.50 720.88 7 2016-05-25 13:30:00.075 MSFT 52.01 52.03 - >>> trades time ticker price quantity 0 2016-05-25 13:30:00.023 MSFT 51.95 75 1 2016-05-25 13:30:00.038 MSFT 51.95 155 2 2016-05-25 13:30:00.048 GOOG 720.77 100 3 2016-05-25 13:30:00.048 GOOG 720.92 100 4 2016-05-25 13:30:00.048 AAPL 98.00 100 - By default we are taking the asof of the quotes - >>> pd.merge_asof(trades, quotes, ... on='time', ... by='ticker') time ticker price quantity bid ask 0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.96 1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98 2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93 3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93 4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN - We only asof within 2ms betwen the quote time and the trade time - >>> pd.merge_asof(trades, quotes, ... on='time', ... by='ticker', ... tolerance=pd.Timedelta('2ms')) time ticker price quantity bid ask 0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.96 1 2016-05-25 13:30:00.038 MSFT 51.95 155 NaN NaN 2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93 3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93 4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN - We only asof within 10ms betwen the quote time and the trade time and we exclude exact matches on time. However prior data will propogate forward - >>> pd.merge_asof(trades, quotes, ... on='time', ... by='ticker', ... tolerance=pd.Timedelta('10ms'), ... allow_exact_matches=False) time ticker price quantity bid ask 0 2016-05-25 13:30:00.023 MSFT 51.95 75 NaN NaN 1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98 2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93 3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93 4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN