Pandas arrays¶
For most data types, pandas uses NumPy arrays as the concrete
objects contained with a Index
, Series
, or
DataFrame
.
For some data types, pandas extends NumPy’s type system.
Kind of Data | Pandas Data Type | Scalar | Array |
---|---|---|---|
TZ-aware datetime | DatetimeTZDtype |
Timestamp |
Datetime data |
Timedeltas | (none) | Timedelta |
Timedelta data |
Period (time spans) | PeriodDtype |
Period |
Timespan data |
Intervals | IntervalDtype |
Interval |
Interval data |
Nullable Integer | Int64Dtype , … |
(none) | Nullable integer |
Categorical | CategoricalDtype |
(none) | Categorical data |
Sparse | SparseDtype |
(none) | Sparse data |
Pandas and third-party libraries can extend NumPy’s type system (see Extension types).
The top-level array()
method can be used to create a new array, which may be
stored in a Series
, Index
, or as a column in a DataFrame
.
array (data, dtype, numpy.dtype, …) |
Create an array. |
Datetime data¶
NumPy cannot natively represent timezone-aware datetimes. Pandas supports this
with the arrays.DatetimeArray
extension array, which can hold timezone-naive
or timezone-aware values.
Timestamp
, a subclass of datetime.datetime
, is pandas’
scalar type for timezone-naive or timezone-aware datetime data.
Timestamp |
Pandas replacement for python datetime.datetime object. |
Properties¶
Timestamp.asm8 |
Return numpy datetime64 format in nanoseconds. |
Timestamp.day |
|
Timestamp.dayofweek |
Return day of whe week. |
Timestamp.dayofyear |
Return the day of the year. |
Timestamp.days_in_month |
Return the number of days in the month. |
Timestamp.daysinmonth |
Return the number of days in the month. |
Timestamp.fold |
|
Timestamp.hour |
|
Timestamp.is_leap_year |
Return True if year is a leap year. |
Timestamp.is_month_end |
Return True if date is last day of month. |
Timestamp.is_month_start |
Return True if date is first day of month. |
Timestamp.is_quarter_end |
Return True if date is last day of the quarter. |
Timestamp.is_quarter_start |
Return True if date is first day of the quarter. |
Timestamp.is_year_end |
Return True if date is last day of the year. |
Timestamp.is_year_start |
Return True if date is first day of the year. |
Timestamp.max |
|
Timestamp.microsecond |
|
Timestamp.min |
|
Timestamp.minute |
|
Timestamp.month |
|
Timestamp.nanosecond |
|
Timestamp.quarter |
Return the quarter of the year. |
Timestamp.resolution |
Return resolution describing the smallest difference between two times that can be represented by Timestamp object_state |
Timestamp.second |
|
Timestamp.tz |
Alias for tzinfo |
Timestamp.tzinfo |
|
Timestamp.value |
|
Timestamp.week |
Return the week number of the year. |
Timestamp.weekofyear |
Return the week number of the year. |
Timestamp.year |
Methods¶
Timestamp.astimezone (self, tz) |
Convert tz-aware Timestamp to another time zone. |
Timestamp.ceil (self, freq[, ambiguous, …]) |
return a new Timestamp ceiled to this resolution |
Timestamp.combine (date, time) |
date, time -> datetime with same date and time fields |
Timestamp.ctime () |
Return ctime() style string. |
Timestamp.date () |
Return date object with same year, month and day. |
Timestamp.day_name (self[, locale]) |
Return the day name of the Timestamp with specified locale. |
Timestamp.dst () |
Return self.tzinfo.dst(self). |
Timestamp.floor (self, freq[, ambiguous, …]) |
return a new Timestamp floored to this resolution |
Timestamp.freq |
|
Timestamp.freqstr |
Return the total number of days in the month. |
Timestamp.fromordinal (ordinal[, freq, tz]) |
passed an ordinal, translate and convert to a ts note: by definition there cannot be any tz info on the ordinal itself |
Timestamp.fromtimestamp (ts) |
timestamp[, tz] -> tz’s local time from POSIX timestamp. |
Timestamp.isocalendar () |
Return a 3-tuple containing ISO year, week number, and weekday. |
Timestamp.isoformat (self[, sep]) |
|
Timestamp.isoweekday () |
Return the day of the week represented by the date. |
Timestamp.month_name (self[, locale]) |
Return the month name of the Timestamp with specified locale. |
Timestamp.normalize (self) |
Normalize Timestamp to midnight, preserving tz information. |
Timestamp.now ([tz]) |
Return new Timestamp object representing current time local to tz. |
Timestamp.replace (self[, year, month, day, …]) |
implements datetime.replace, handles nanoseconds |
Timestamp.round (self, freq[, ambiguous, …]) |
Round the Timestamp to the specified resolution |
Timestamp.strftime () |
format -> strftime() style string. |
Timestamp.strptime (string, format) |
Function is not implemented. |
Timestamp.time () |
Return time object with same time but with tzinfo=None. |
Timestamp.timestamp () |
Return POSIX timestamp as float. |
Timestamp.timetuple () |
Return time tuple, compatible with time.localtime(). |
Timestamp.timetz () |
Return time object with same time and tzinfo. |
Timestamp.to_datetime64 () |
Return a numpy.datetime64 object with ‘ns’ precision. |
Timestamp.to_numpy () |
Convert the Timestamp to a NumPy datetime64. |
Timestamp.to_julian_date (self) |
Convert TimeStamp to a Julian Date. |
Timestamp.to_period (self[, freq]) |
Return an period of which this timestamp is an observation. |
Timestamp.to_pydatetime () |
Convert a Timestamp object to a native Python datetime object. |
Timestamp.today (cls[, tz]) |
Return the current time in the local timezone. |
Timestamp.toordinal () |
Return proleptic Gregorian ordinal. |
Timestamp.tz_convert (self, tz) |
Convert tz-aware Timestamp to another time zone. |
Timestamp.tz_localize (self, tz[, ambiguous, …]) |
Convert naive Timestamp to local time zone, or remove timezone from tz-aware Timestamp. |
Timestamp.tzname () |
Return self.tzinfo.tzname(self). |
Timestamp.utcfromtimestamp (ts) |
Construct a naive UTC datetime from a POSIX timestamp. |
Timestamp.utcnow () |
Return a new Timestamp representing UTC day and time. |
Timestamp.utcoffset () |
Return self.tzinfo.utcoffset(self). |
Timestamp.utctimetuple () |
Return UTC time tuple, compatible with time.localtime(). |
Timestamp.weekday () |
Return the day of the week represented by the date. |
A collection of timestamps may be stored in a arrays.DatetimeArray
.
For timezone-aware data, the .dtype
of a DatetimeArray
is a
DatetimeTZDtype
. For timezone-naive data, np.dtype("datetime64[ns]")
is used.
If the data are tz-aware, then every value in the array must have the same timezone.
arrays.DatetimeArray (values[, dtype, freq, copy]) |
Pandas ExtensionArray for tz-naive or tz-aware datetime data. |
DatetimeTZDtype ([unit, tz]) |
An ExtensionDtype for timezone-aware datetime data. |
Timedelta data¶
NumPy can natively represent timedeltas. Pandas provides Timedelta
for symmetry with Timestamp
.
Timedelta |
Represents a duration, the difference between two dates or times. |
Properties¶
Timedelta.asm8 |
Return a numpy timedelta64 array scalar view. |
Timedelta.components |
Return a components namedtuple-like. |
Timedelta.days |
Number of days. |
Timedelta.delta |
Return the timedelta in nanoseconds (ns), for internal compatibility. |
Timedelta.freq |
|
Timedelta.is_populated |
|
Timedelta.max |
|
Timedelta.microseconds |
Number of microseconds (>= 0 and less than 1 second). |
Timedelta.min |
|
Timedelta.nanoseconds |
Return the number of nanoseconds (n), where 0 <= n < 1 microsecond. |
Timedelta.resolution |
Return a string representing the lowest timedelta resolution. |
Timedelta.seconds |
Number of seconds (>= 0 and less than 1 day). |
Timedelta.value |
|
Timedelta.view () |
Array view compatibility. |
Methods¶
Timedelta.ceil (self, freq) |
return a new Timedelta ceiled to this resolution |
Timedelta.floor (self, freq) |
return a new Timedelta floored to this resolution |
Timedelta.isoformat () |
Format Timedelta as ISO 8601 Duration like P[n]Y[n]M[n]DT[n]H[n]M[n]S , where the [n] s are replaced by the values. |
Timedelta.round (self, freq) |
Round the Timedelta to the specified resolution |
Timedelta.to_pytimedelta () |
Convert a pandas Timedelta object into a python timedelta object. |
Timedelta.to_timedelta64 () |
Return a numpy.timedelta64 object with ‘ns’ precision. |
Timedelta.to_numpy () |
Convert the Timestamp to a NumPy timedelta64. |
Timedelta.total_seconds () |
Total duration of timedelta in seconds (to ns precision). |
A collection of timedeltas may be stored in a TimedeltaArray
.
arrays.TimedeltaArray (values[, dtype, freq, …]) |
Pandas ExtensionArray for timedelta data. |
Period¶
Period |
Represents a period of time |
Properties¶
Period.day |
Get day of the month that a Period falls on. |
Period.dayofweek |
Day of the week the period lies in, with Monday=0 and Sunday=6. |
Period.dayofyear |
Return the day of the year. |
Period.days_in_month |
Get the total number of days in the month that this period falls on. |
Period.daysinmonth |
Get the total number of days of the month that the Period falls in. |
Period.end_time |
|
Period.freq |
|
Period.freqstr |
|
Period.hour |
Get the hour of the day component of the Period. |
Period.is_leap_year |
|
Period.minute |
Get minute of the hour component of the Period. |
Period.month |
|
Period.ordinal |
|
Period.quarter |
|
Period.qyear |
Fiscal year the Period lies in according to its starting-quarter. |
Period.second |
Get the second component of the Period. |
Period.start_time |
Get the Timestamp for the start of the period. |
Period.week |
Get the week of the year on the given Period. |
Period.weekday |
Day of the week the period lies in, with Monday=0 and Sunday=6. |
Period.weekofyear |
|
Period.year |
Methods¶
Period.asfreq () |
Convert Period to desired frequency, either at the start or end of the interval |
Period.now () |
|
Period.strftime () |
Returns the string representation of the Period , depending on the selected fmt . |
Period.to_timestamp () |
Return the Timestamp representation of the Period at the target frequency at the specified end (how) of the Period |
A collection of timedeltas may be stored in a arrays.PeriodArray
.
Every period in a PeriodArray
must have the same freq
.
arrays.PeriodArray (values[, freq, dtype, copy]) |
Pandas ExtensionArray for storing Period data. |
PeriodDtype |
An ExtensionDtype for Period data. |
Interval data¶
Arbitrary intervals can be represented as Interval
objects.
Interval |
Immutable object implementing an Interval, a bounded slice-like interval. |
Properties¶
Interval.closed |
Whether the interval is closed on the left-side, right-side, both or neither |
Interval.closed_left |
Check if the interval is closed on the left side. |
Interval.closed_right |
Check if the interval is closed on the right side. |
Interval.is_empty |
Indicates if an interval is empty, meaning it contains no points. |
Interval.left |
Left bound for the interval |
Interval.length |
Return the length of the Interval |
Interval.mid |
Return the midpoint of the Interval |
Interval.open_left |
Check if the interval is open on the left side. |
Interval.open_right |
Check if the interval is open on the right side. |
Interval.overlaps () |
Check whether two Interval objects overlap. |
Interval.right |
Right bound for the interval |
A collection of intervals may be stored in an arrays.IntervalArray
.
arrays.IntervalArray |
Pandas array for interval data that are closed on the same side. |
IntervalDtype |
An ExtensionDtype for Interval data. |
Nullable integer¶
numpy.ndarray
cannot natively represent integer-data with missing values.
Pandas provides this through arrays.IntegerArray
.
arrays.IntegerArray (values, mask[, copy]) |
Array of integer (optional missing) values. |
Int8Dtype |
An ExtensionDtype for int8 integer data. |
Int16Dtype |
An ExtensionDtype for int16 integer data. |
Int32Dtype |
An ExtensionDtype for int32 integer data. |
Int64Dtype |
An ExtensionDtype for int64 integer data. |
UInt8Dtype |
An ExtensionDtype for uint8 integer data. |
UInt16Dtype |
An ExtensionDtype for uint16 integer data. |
UInt32Dtype |
An ExtensionDtype for uint32 integer data. |
UInt64Dtype |
An ExtensionDtype for uint64 integer data. |
Categorical data¶
Pandas defines a custom data type for representing data that can take only a
limited, fixed set of values. The dtype of a Categorical
can be described by
a pandas.api.types.CategoricalDtype
.
CategoricalDtype ([categories]) |
Type for categorical data with the categories and orderedness. |
CategoricalDtype.categories |
An Index containing the unique categories allowed. |
CategoricalDtype.ordered |
Whether the categories have an ordered relationship. |
Categorical data can be stored in a pandas.Categorical
Categorical (values[, categories, ordered, …]) |
Represent a categorical variable in classic R / S-plus fashion. |
The alternative Categorical.from_codes()
constructor can be used when you
have the categories and integer codes already:
Categorical.from_codes (codes[, categories, …]) |
Make a Categorical type from codes and categories or dtype. |
The dtype information is available on the Categorical
Categorical.dtype |
The CategoricalDtype for this instance |
Categorical.categories |
The categories of this categorical. |
Categorical.ordered |
Whether the categories have an ordered relationship. |
Categorical.codes |
The category codes of this categorical. |
np.asarray(categorical)
works by implementing the array interface. Be aware, that this converts
the Categorical back to a NumPy array, so categories and order information is not preserved!
Categorical.__array__ (self[, dtype]) |
The numpy array interface. |
A Categorical
can be stored in a Series
or DataFrame
.
To create a Series of dtype category
, use cat = s.astype(dtype)
or
Series(..., dtype=dtype)
where dtype
is either
- the string
'category'
- an instance of
CategoricalDtype
.
If the Series is of dtype CategoricalDtype
, Series.cat
can be used to change the categorical
data. See Categorical accessor for more.
Sparse data¶
Data where a single value is repeated many times (e.g. 0
or NaN
) may
be stored efficiently as a SparseArray
.
SparseArray (data[, sparse_index, index, …]) |
An ExtensionArray for storing sparse data. |
SparseDtype (dtype, numpy.dtype, …) |
Dtype for data stored in SparseArray . |
The Series.sparse
accessor may be used to access sparse-specific attributes
and methods if the Series
contains sparse values. See
Sparse accessor for more.