pandas.merge_asof¶
-
pandas.
merge_asof
(left, right, on=None, left_on=None, right_on=None, by=None, suffixes=('_x', '_y'), tolerance=None, allow_exact_matches=True)[source]¶ Perform an asof merge. This is similar to a left-join except that we match on nearest key rather than equal keys.
For each row in the left DataFrame, we select the last row in the right DataFrame whose ‘on’ key is less than or equal to the left’s key. Both DataFrames must be sorted by the key.
Optionally perform group-wise merge. This searches for the nearest match on the ‘on’ key within the same group according to ‘by’.
New in version 0.19.0.
Parameters: left : DataFrame
right : DataFrame
on : label
Field name to join on. Must be found in both DataFrames. The data MUST be ordered. Furthermore this must be a numeric column, such as datetimelike, integer, or float. On or left_on/right_on must be given.
left_on : label
Field name to join on in left DataFrame.
right_on : label
Field name to join on in right DataFrame.
by : column name
Group both the left and right DataFrames by the group column; perform the merge operation on these pieces and recombine.
suffixes : 2-length sequence (tuple, list, ...)
Suffix to apply to overlapping column names in the left and right side, respectively
tolerance : integer or Timedelta, optional, default None
select asof tolerance within this range; must be compatible to the merge index.
allow_exact_matches : boolean, default True
- If True, allow matching the same ‘on’ value (i.e. less-than-or-equal-to)
- If False, don’t match the same ‘on’ value (i.e., stricly less-than)
Returns: merged : DataFrame
See also
Examples
>>> left a left_val 0 1 a 1 5 b 2 10 c
>>> right a right_val 0 1 1 1 2 2 2 3 3 3 6 6 4 7 7
>>> pd.merge_asof(left, right, on='a') a left_val right_val 0 1 a 1 1 5 b 3 2 10 c 7
>>> pd.merge_asof(left, right, on='a', allow_exact_matches=False) a left_val right_val 0 1 a NaN 1 5 b 3.0 2 10 c 7.0
For this example, we can achieve a similar result thru
pd.merge_ordered()
, though its not nearly as performant.>>> (pd.merge_ordered(left, right, on='a') ... .ffill() ... .drop_duplicates(['left_val']) ... ) a left_val right_val 0 1 a 1.0 3 5 b 3.0 6 10 c 7.0
Here is a real-world times-series example
>>> quotes time ticker bid ask 0 2016-05-25 13:30:00.023 GOOG 720.50 720.93 1 2016-05-25 13:30:00.023 MSFT 51.95 51.96 2 2016-05-25 13:30:00.030 MSFT 51.97 51.98 3 2016-05-25 13:30:00.041 MSFT 51.99 52.00 4 2016-05-25 13:30:00.048 GOOG 720.50 720.93 5 2016-05-25 13:30:00.049 AAPL 97.99 98.01 6 2016-05-25 13:30:00.072 GOOG 720.50 720.88 7 2016-05-25 13:30:00.075 MSFT 52.01 52.03
>>> trades time ticker price quantity 0 2016-05-25 13:30:00.023 MSFT 51.95 75 1 2016-05-25 13:30:00.038 MSFT 51.95 155 2 2016-05-25 13:30:00.048 GOOG 720.77 100 3 2016-05-25 13:30:00.048 GOOG 720.92 100 4 2016-05-25 13:30:00.048 AAPL 98.00 100
By default we are taking the asof of the quotes
>>> pd.asof_merge(trades, quotes, ... on='time', ... by='ticker') time ticker price quantity bid ask 0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.96 1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98 2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93 3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93 4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN
We only asof within 2ms betwen the quote time and the trade time
>>> pd.asof_merge(trades, quotes, ... on='time', ... by='ticker', ... tolerance=pd.Timedelta('2ms')) time ticker price quantity bid ask 0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.96 1 2016-05-25 13:30:00.038 MSFT 51.95 155 NaN NaN 2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93 3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93 4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN
We only asof within 10ms betwen the quote time and the trade time and we exclude exact matches on time. However prior data will propogate forward
>>> pd.asof_merge(trades, quotes, ... on='time', ... by='ticker', ... tolerance=pd.Timedelta('10ms'), ... allow_exact_matches=False) time ticker price quantity bid ask 0 2016-05-25 13:30:00.023 MSFT 51.95 75 NaN NaN 1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98 2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93 3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93 4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN