Pandas arrays

For most data types, pandas uses NumPy arrays as the concrete objects contained with a Index, Series, or DataFrame.

For some data types, pandas extends NumPy’s type system.

Kind of Data Pandas Data Type Scalar Array
TZ-aware datetime DatetimeTZDtype Timestamp Datetime data
Timedeltas (none) Timedelta Timedelta data
Period (time spans) PeriodDtype Period Timespan data
Intervals IntervalDtype Interval Interval data
Nullable Integer Int64Dtype, … (none) Nullable integer
Categorical CategoricalDtype (none) Categorical data
Sparse SparseDtype (none) Sparse data

Pandas and third-party libraries can extend NumPy’s type system (see Extension types). The top-level array() method can be used to create a new array, which may be stored in a Series, Index, or as a column in a DataFrame.

array(data, dtype, numpy.dtype, …) Create an array.

Datetime data

NumPy cannot natively represent timezone-aware datetimes. Pandas supports this with the arrays.DatetimeArray extension array, which can hold timezone-naive or timezone-aware values.

Timestamp, a subclass of datetime.datetime, is pandas’ scalar type for timezone-naive or timezone-aware datetime data.

Timestamp Pandas replacement for python datetime.datetime object.

Properties

Timestamp.asm8 Return numpy datetime64 format in nanoseconds.
Timestamp.day
Timestamp.dayofweek Return day of whe week.
Timestamp.dayofyear Return the day of the year.
Timestamp.days_in_month Return the number of days in the month.
Timestamp.daysinmonth Return the number of days in the month.
Timestamp.fold
Timestamp.hour
Timestamp.is_leap_year Return True if year is a leap year.
Timestamp.is_month_end Return True if date is last day of month.
Timestamp.is_month_start Return True if date is first day of month.
Timestamp.is_quarter_end Return True if date is last day of the quarter.
Timestamp.is_quarter_start Return True if date is first day of the quarter.
Timestamp.is_year_end Return True if date is last day of the year.
Timestamp.is_year_start Return True if date is first day of the year.
Timestamp.max
Timestamp.microsecond
Timestamp.min
Timestamp.minute
Timestamp.month
Timestamp.nanosecond
Timestamp.quarter Return the quarter of the year.
Timestamp.resolution Return resolution describing the smallest difference between two times that can be represented by Timestamp object_state
Timestamp.second
Timestamp.tz Alias for tzinfo
Timestamp.tzinfo
Timestamp.value
Timestamp.week Return the week number of the year.
Timestamp.weekofyear Return the week number of the year.
Timestamp.year

Methods

Timestamp.astimezone(self, tz) Convert tz-aware Timestamp to another time zone.
Timestamp.ceil(self, freq[, ambiguous, …]) return a new Timestamp ceiled to this resolution
Timestamp.combine(date, time) date, time -> datetime with same date and time fields
Timestamp.ctime() Return ctime() style string.
Timestamp.date() Return date object with same year, month and day.
Timestamp.day_name(self[, locale]) Return the day name of the Timestamp with specified locale.
Timestamp.dst() Return self.tzinfo.dst(self).
Timestamp.floor(self, freq[, ambiguous, …]) return a new Timestamp floored to this resolution
Timestamp.freq
Timestamp.freqstr Return the total number of days in the month.
Timestamp.fromordinal(ordinal[, freq, tz]) passed an ordinal, translate and convert to a ts note: by definition there cannot be any tz info on the ordinal itself
Timestamp.fromtimestamp(ts) timestamp[, tz] -> tz’s local time from POSIX timestamp.
Timestamp.isocalendar() Return a 3-tuple containing ISO year, week number, and weekday.
Timestamp.isoformat(self[, sep])
Timestamp.isoweekday() Return the day of the week represented by the date.
Timestamp.month_name(self[, locale]) Return the month name of the Timestamp with specified locale.
Timestamp.normalize(self) Normalize Timestamp to midnight, preserving tz information.
Timestamp.now([tz]) Return new Timestamp object representing current time local to tz.
Timestamp.replace(self[, year, month, day, …]) implements datetime.replace, handles nanoseconds
Timestamp.round(self, freq[, ambiguous, …]) Round the Timestamp to the specified resolution
Timestamp.strftime() format -> strftime() style string.
Timestamp.strptime(string, format) Function is not implemented.
Timestamp.time() Return time object with same time but with tzinfo=None.
Timestamp.timestamp() Return POSIX timestamp as float.
Timestamp.timetuple() Return time tuple, compatible with time.localtime().
Timestamp.timetz() Return time object with same time and tzinfo.
Timestamp.to_datetime64() Return a numpy.datetime64 object with ‘ns’ precision.
Timestamp.to_numpy() Convert the Timestamp to a NumPy datetime64.
Timestamp.to_julian_date(self) Convert TimeStamp to a Julian Date.
Timestamp.to_period(self[, freq]) Return an period of which this timestamp is an observation.
Timestamp.to_pydatetime() Convert a Timestamp object to a native Python datetime object.
Timestamp.today(cls[, tz]) Return the current time in the local timezone.
Timestamp.toordinal() Return proleptic Gregorian ordinal.
Timestamp.tz_convert(self, tz) Convert tz-aware Timestamp to another time zone.
Timestamp.tz_localize(self, tz[, ambiguous, …]) Convert naive Timestamp to local time zone, or remove timezone from tz-aware Timestamp.
Timestamp.tzname() Return self.tzinfo.tzname(self).
Timestamp.utcfromtimestamp(ts) Construct a naive UTC datetime from a POSIX timestamp.
Timestamp.utcnow() Return a new Timestamp representing UTC day and time.
Timestamp.utcoffset() Return self.tzinfo.utcoffset(self).
Timestamp.utctimetuple() Return UTC time tuple, compatible with time.localtime().
Timestamp.weekday() Return the day of the week represented by the date.

A collection of timestamps may be stored in a arrays.DatetimeArray. For timezone-aware data, the .dtype of a DatetimeArray is a DatetimeTZDtype. For timezone-naive data, np.dtype("datetime64[ns]") is used.

If the data are tz-aware, then every value in the array must have the same timezone.

arrays.DatetimeArray(values[, dtype, freq, copy]) Pandas ExtensionArray for tz-naive or tz-aware datetime data.
DatetimeTZDtype([unit, tz]) An ExtensionDtype for timezone-aware datetime data.

Timedelta data

NumPy can natively represent timedeltas. Pandas provides Timedelta for symmetry with Timestamp.

Timedelta Represents a duration, the difference between two dates or times.

Properties

Timedelta.asm8 Return a numpy timedelta64 array scalar view.
Timedelta.components Return a components namedtuple-like.
Timedelta.days Number of days.
Timedelta.delta Return the timedelta in nanoseconds (ns), for internal compatibility.
Timedelta.freq
Timedelta.is_populated
Timedelta.max
Timedelta.microseconds Number of microseconds (>= 0 and less than 1 second).
Timedelta.min
Timedelta.nanoseconds Return the number of nanoseconds (n), where 0 <= n < 1 microsecond.
Timedelta.resolution Return a string representing the lowest timedelta resolution.
Timedelta.seconds Number of seconds (>= 0 and less than 1 day).
Timedelta.value
Timedelta.view() Array view compatibility.

Methods

Timedelta.ceil(self, freq) return a new Timedelta ceiled to this resolution
Timedelta.floor(self, freq) return a new Timedelta floored to this resolution
Timedelta.isoformat() Format Timedelta as ISO 8601 Duration like P[n]Y[n]M[n]DT[n]H[n]M[n]S, where the [n] s are replaced by the values.
Timedelta.round(self, freq) Round the Timedelta to the specified resolution
Timedelta.to_pytimedelta() Convert a pandas Timedelta object into a python timedelta object.
Timedelta.to_timedelta64() Return a numpy.timedelta64 object with ‘ns’ precision.
Timedelta.to_numpy() Convert the Timestamp to a NumPy timedelta64.
Timedelta.total_seconds() Total duration of timedelta in seconds (to ns precision).

A collection of timedeltas may be stored in a TimedeltaArray.

arrays.TimedeltaArray(values[, dtype, freq, …]) Pandas ExtensionArray for timedelta data.

Timespan data

Pandas represents spans of times as Period objects.

Period

Period Represents a period of time

Properties

Period.day Get day of the month that a Period falls on.
Period.dayofweek Day of the week the period lies in, with Monday=0 and Sunday=6.
Period.dayofyear Return the day of the year.
Period.days_in_month Get the total number of days in the month that this period falls on.
Period.daysinmonth Get the total number of days of the month that the Period falls in.
Period.end_time
Period.freq
Period.freqstr
Period.hour Get the hour of the day component of the Period.
Period.is_leap_year
Period.minute Get minute of the hour component of the Period.
Period.month
Period.ordinal
Period.quarter
Period.qyear Fiscal year the Period lies in according to its starting-quarter.
Period.second Get the second component of the Period.
Period.start_time Get the Timestamp for the start of the period.
Period.week Get the week of the year on the given Period.
Period.weekday Day of the week the period lies in, with Monday=0 and Sunday=6.
Period.weekofyear
Period.year

Methods

Period.asfreq() Convert Period to desired frequency, either at the start or end of the interval
Period.now()
Period.strftime() Returns the string representation of the Period, depending on the selected fmt.
Period.to_timestamp() Return the Timestamp representation of the Period at the target frequency at the specified end (how) of the Period

A collection of timedeltas may be stored in a arrays.PeriodArray. Every period in a PeriodArray must have the same freq.

arrays.PeriodArray(values[, freq, dtype, copy]) Pandas ExtensionArray for storing Period data.
PeriodDtype An ExtensionDtype for Period data.

Interval data

Arbitrary intervals can be represented as Interval objects.

Interval Immutable object implementing an Interval, a bounded slice-like interval.

Properties

Interval.closed Whether the interval is closed on the left-side, right-side, both or neither
Interval.closed_left Check if the interval is closed on the left side.
Interval.closed_right Check if the interval is closed on the right side.
Interval.is_empty Indicates if an interval is empty, meaning it contains no points.
Interval.left Left bound for the interval
Interval.length Return the length of the Interval
Interval.mid Return the midpoint of the Interval
Interval.open_left Check if the interval is open on the left side.
Interval.open_right Check if the interval is open on the right side.
Interval.overlaps() Check whether two Interval objects overlap.
Interval.right Right bound for the interval

A collection of intervals may be stored in an arrays.IntervalArray.

arrays.IntervalArray Pandas array for interval data that are closed on the same side.
IntervalDtype An ExtensionDtype for Interval data.

Nullable integer

numpy.ndarray cannot natively represent integer-data with missing values. Pandas provides this through arrays.IntegerArray.

arrays.IntegerArray(values, mask[, copy]) Array of integer (optional missing) values.
Int8Dtype An ExtensionDtype for int8 integer data.
Int16Dtype An ExtensionDtype for int16 integer data.
Int32Dtype An ExtensionDtype for int32 integer data.
Int64Dtype An ExtensionDtype for int64 integer data.
UInt8Dtype An ExtensionDtype for uint8 integer data.
UInt16Dtype An ExtensionDtype for uint16 integer data.
UInt32Dtype An ExtensionDtype for uint32 integer data.
UInt64Dtype An ExtensionDtype for uint64 integer data.

Categorical data

Pandas defines a custom data type for representing data that can take only a limited, fixed set of values. The dtype of a Categorical can be described by a pandas.api.types.CategoricalDtype.

CategoricalDtype([categories]) Type for categorical data with the categories and orderedness.
CategoricalDtype.categories An Index containing the unique categories allowed.
CategoricalDtype.ordered Whether the categories have an ordered relationship.

Categorical data can be stored in a pandas.Categorical

Categorical(values[, categories, ordered, …]) Represent a categorical variable in classic R / S-plus fashion.

The alternative Categorical.from_codes() constructor can be used when you have the categories and integer codes already:

Categorical.from_codes(codes[, categories, …]) Make a Categorical type from codes and categories or dtype.

The dtype information is available on the Categorical

Categorical.dtype The CategoricalDtype for this instance
Categorical.categories The categories of this categorical.
Categorical.ordered Whether the categories have an ordered relationship.
Categorical.codes The category codes of this categorical.

np.asarray(categorical) works by implementing the array interface. Be aware, that this converts the Categorical back to a NumPy array, so categories and order information is not preserved!

Categorical.__array__(self[, dtype]) The numpy array interface.

A Categorical can be stored in a Series or DataFrame. To create a Series of dtype category, use cat = s.astype(dtype) or Series(..., dtype=dtype) where dtype is either

  • the string 'category'
  • an instance of CategoricalDtype.

If the Series is of dtype CategoricalDtype, Series.cat can be used to change the categorical data. See Categorical accessor for more.

Sparse data

Data where a single value is repeated many times (e.g. 0 or NaN) may be stored efficiently as a SparseArray.

SparseArray(data[, sparse_index, index, …]) An ExtensionArray for storing sparse data.
SparseDtype(dtype, numpy.dtype, …) Dtype for data stored in SparseArray.

The Series.sparse accessor may be used to access sparse-specific attributes and methods if the Series contains sparse values. See Sparse accessor for more.

Scroll To Top