pandas.to_datetime¶
- pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=True)[source]¶
- Convert argument to datetime. - This function converts a scalar, array-like, - Seriesor- DataFrame/dict-like to a pandas datetime object.- Parameters
- argint, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like
- The object to convert to a datetime. If a - DataFrameis provided, the method expects minimally the following columns:- "year",- "month",- "day".
- errors{‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’
- If - 'raise', then invalid parsing will raise an exception.
- If - 'coerce', then invalid parsing will be set as- NaT.
- If - 'ignore', then invalid parsing will return the input.
 
- dayfirstbool, default False
- Specify a date parse order if arg is str or is list-like. If - True, parses dates with the day first, e.g.- "10/11/12"is parsed as- 2012-11-10.- Warning - dayfirst=Trueis not strict, but will prefer to parse with day first. If a delimited date string cannot be parsed in accordance with the given dayfirst option, e.g.- to_datetime(['31-12-2021']), then a warning will be shown.
- yearfirstbool, default False
- Specify a date parse order if arg is str or is list-like. - If - Trueparses dates with the year first, e.g.- "10/11/12"is parsed as- 2010-11-12.
- If both dayfirst and yearfirst are - True, yearfirst is preceded (same as- dateutil).
 - Warning - yearfirst=Trueis not strict, but will prefer to parse with year first.
- utcbool, default None
- Control timezone-related parsing, localization and conversion. - If - True, the function always returns a timezone-aware UTC-localized- Timestamp,- Seriesor- DatetimeIndex. To do this, timezone-naive inputs are localized as UTC, while timezone-aware inputs are converted to UTC.
- If - False(default), inputs will not be coerced to UTC. Timezone-naive inputs will remain naive, while timezone-aware ones will keep their time offsets. Limitations exist for mixed offsets (typically, daylight savings), see Examples section for details.
 - See also: pandas general documentation about timezone conversion and localization. 
- formatstr, default None
- The strftime to parse time, e.g. - "%d/%m/%Y". Note that- "%f"will parse all the way up to nanoseconds. See strftime documentation for more information on choices.
- exactbool, default True
- Control how format is used: - If - True, require an exact format match.
- If - False, allow the format to match anywhere in the target string.
 
- unitstr, default ‘ns’
- The unit of the arg (D,s,ms,us,ns) denote the unit, which is an integer or float number. This will be based off the origin. Example, with - unit='ms'and- origin='unix'(the default), this would calculate the number of milliseconds to the unix epoch start.
- infer_datetime_formatbool, default False
- If - Trueand no format is given, attempt to infer the format of the datetime strings based on the first non-NaN element, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by ~5-10x.
- originscalar, default ‘unix’
- Define the reference date. The numeric values would be parsed as number of units (defined by unit) since this reference date. - If - 'unix'(or POSIX) time; origin is set to 1970-01-01.
- If - 'julian', unit must be- 'D', and origin is set to beginning of Julian Calendar. Julian day number- 0is assigned to the day starting at noon on January 1, 4713 BC.
- If Timestamp convertible, origin is set to Timestamp identified by origin. 
 
- cachebool, default True
- If - True, use a cache of unique, converted dates to apply the datetime conversion. May produce significant speed-up when parsing duplicate date strings, especially ones with timezone offsets. The cache is only used when there are at least 50 values. The presence of out-of-bounds values will render the cache unusable and may slow down parsing.- Changed in version 0.25.0: changed default value from - Falseto- True.
 
- Returns
- datetime
- If parsing succeeded. Return type depends on input (types in parenthesis correspond to fallback in case of unsuccessful timezone or out-of-range timestamp parsing): - scalar: - Timestamp(or- datetime.datetime)
- array-like: - DatetimeIndex(or- Serieswith- objectdtype containing- datetime.datetime)
- Series: - Seriesof- datetime64dtype (or- Seriesof- objectdtype containing- datetime.datetime)
- DataFrame: - Seriesof- datetime64dtype (or- Seriesof- objectdtype containing- datetime.datetime)
 
 
- Raises
- ParserError
- When parsing a date from string fails. 
- ValueError
- When another datetime conversion error happens. For example when one of ‘year’, ‘month’, day’ columns is missing in a - DataFrame, or when a Timezone-aware- datetime.datetimeis found in an array-like of mixed time offsets, and- utc=False.
 
 - See also - DataFrame.astype
- Cast argument to a specified dtype. 
- to_timedelta
- Convert argument to timedelta. 
- convert_dtypes
- Convert dtypes. 
 - Notes - Many input types are supported, and lead to different output types: - scalars can be int, float, str, datetime object (from stdlib - datetimemodule or- numpy). They are converted to- Timestampwhen possible, otherwise they are converted to- datetime.datetime. None/NaN/null scalars are converted to- NaT.
- array-like can contain int, float, str, datetime objects. They are converted to - DatetimeIndexwhen possible, otherwise they are converted to- Indexwith- objectdtype, containing- datetime.datetime. None/NaN/null entries are converted to- NaTin both cases.
- Series are converted to - Serieswith- datetime64dtype when possible, otherwise they are converted to- Serieswith- objectdtype, containing- datetime.datetime. None/NaN/null entries are converted to- NaTin both cases.
- DataFrame/dict-like are converted to - Serieswith- datetime64dtype. For each row a datetime is created from assembling the various dataframe columns. Column keys can be common abbreviations like [‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’]) or plurals of the same.
 - The following causes are responsible for - datetime.datetimeobjects being returned (possibly inside an- Indexor a- Serieswith- objectdtype) instead of a proper pandas designated type (- Timestamp,- DatetimeIndexor- Serieswith- datetime64dtype):- when any input element is before - Timestamp.minor after- Timestamp.max, see timestamp limitations.
- when - utc=False(default) and the input is an array-like or- Seriescontaining mixed naive/aware datetime, or aware with mixed time offsets. Note that this happens in the (quite frequent) situation when the timezone has a daylight savings policy. In that case you may wish to use- utc=True.
 - Examples - Handling various input formats - Assembling a datetime from multiple columns of a - DataFrame. The keys can be common abbreviations like [‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’]) or plurals of the same- >>> df = pd.DataFrame({'year': [2015, 2016], ... 'month': [2, 3], ... 'day': [4, 5]}) >>> pd.to_datetime(df) 0 2015-02-04 1 2016-03-05 dtype: datetime64[ns] - Passing - infer_datetime_format=Truecan often-times speedup a parsing if its not an ISO8601 format exactly, but in a regular format.- >>> s = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000'] * 1000) >>> s.head() 0 3/11/2000 1 3/12/2000 2 3/13/2000 3 3/11/2000 4 3/12/2000 dtype: object - >>> %timeit pd.to_datetime(s, infer_datetime_format=True) 100 loops, best of 3: 10.4 ms per loop - >>> %timeit pd.to_datetime(s, infer_datetime_format=False) 1 loop, best of 3: 471 ms per loop - Using a unix epoch time - >>> pd.to_datetime(1490195805, unit='s') Timestamp('2017-03-22 15:16:45') >>> pd.to_datetime(1490195805433502912, unit='ns') Timestamp('2017-03-22 15:16:45.433502912') - Warning - For float arg, precision rounding might happen. To prevent unexpected behavior use a fixed-width exact type. - Using a non-unix epoch origin - >>> pd.to_datetime([1, 2, 3], unit='D', ... origin=pd.Timestamp('1960-01-01')) DatetimeIndex(['1960-01-02', '1960-01-03', '1960-01-04'], dtype='datetime64[ns]', freq=None) - Non-convertible date/times - If a date does not meet the timestamp limitations, passing - errors='ignore'will return the original input instead of raising any exception.- Passing - errors='coerce'will force an out-of-bounds date to- NaT, in addition to forcing non-dates (or non-parseable dates) to- NaT.- >>> pd.to_datetime('13000101', format='%Y%m%d', errors='ignore') datetime.datetime(1300, 1, 1, 0, 0) >>> pd.to_datetime('13000101', format='%Y%m%d', errors='coerce') NaT - Timezones and time offsets - The default behaviour ( - utc=False) is as follows:- Timezone-naive inputs are converted to timezone-naive - DatetimeIndex:
 - >>> pd.to_datetime(['2018-10-26 12:00', '2018-10-26 13:00:15']) DatetimeIndex(['2018-10-26 12:00:00', '2018-10-26 13:00:15'], dtype='datetime64[ns]', freq=None) - Timezone-aware inputs with constant time offset are converted to timezone-aware - DatetimeIndex:
 - >>> pd.to_datetime(['2018-10-26 12:00 -0500', '2018-10-26 13:00 -0500']) DatetimeIndex(['2018-10-26 12:00:00-05:00', '2018-10-26 13:00:00-05:00'], dtype='datetime64[ns, pytz.FixedOffset(-300)]', freq=None) - However, timezone-aware inputs with mixed time offsets (for example issued from a timezone with daylight savings, such as Europe/Paris) are not successfully converted to a - DatetimeIndex. Instead a simple- Indexcontaining- datetime.datetimeobjects is returned:
 - >>> pd.to_datetime(['2020-10-25 02:00 +0200', '2020-10-25 04:00 +0100']) Index([2020-10-25 02:00:00+02:00, 2020-10-25 04:00:00+01:00], dtype='object') - A mix of timezone-aware and timezone-naive inputs is converted to a timezone-aware - DatetimeIndexif the offsets of the timezone-aware are constant:
 - >>> from datetime import datetime >>> pd.to_datetime(["2020-01-01 01:00 -01:00", datetime(2020, 1, 1, 3, 0)]) DatetimeIndex(['2020-01-01 01:00:00-01:00', '2020-01-01 02:00:00-01:00'], dtype='datetime64[ns, pytz.FixedOffset(-60)]', freq=None) - Finally, mixing timezone-aware strings and - datetime.datetimealways raises an error, even if the elements all have the same time offset.
 - >>> from datetime import datetime, timezone, timedelta >>> d = datetime(2020, 1, 1, 18, tzinfo=timezone(-timedelta(hours=1))) >>> pd.to_datetime(["2020-01-01 17:00 -0100", d]) Traceback (most recent call last): ... ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True - Setting - utc=Truesolves most of the above issues:- Timezone-naive inputs are localized as UTC 
 - >>> pd.to_datetime(['2018-10-26 12:00', '2018-10-26 13:00'], utc=True) DatetimeIndex(['2018-10-26 12:00:00+00:00', '2018-10-26 13:00:00+00:00'], dtype='datetime64[ns, UTC]', freq=None) - Timezone-aware inputs are converted to UTC (the output represents the exact same datetime, but viewed from the UTC time offset +00:00). 
 - >>> pd.to_datetime(['2018-10-26 12:00 -0530', '2018-10-26 12:00 -0500'], ... utc=True) DatetimeIndex(['2018-10-26 17:30:00+00:00', '2018-10-26 17:00:00+00:00'], dtype='datetime64[ns, UTC]', freq=None) - Inputs can contain both naive and aware, string or datetime, the above rules still apply 
 - >>> pd.to_datetime(['2018-10-26 12:00', '2018-10-26 12:00 -0530', ... datetime(2020, 1, 1, 18), ... datetime(2020, 1, 1, 18, ... tzinfo=timezone(-timedelta(hours=1)))], ... utc=True) DatetimeIndex(['2018-10-26 12:00:00+00:00', '2018-10-26 17:30:00+00:00', '2020-01-01 18:00:00+00:00', '2020-01-01 19:00:00+00:00'], dtype='datetime64[ns, UTC]', freq=None)