pandas.DataFrame.select_dtypes#
- DataFrame.select_dtypes(include=None, exclude=None)[source]#
Return a subset of the DataFrame’s columns based on the column dtypes.
This method allows for filtering columns based on their data types. It is useful when working with heterogeneous DataFrames where operations need to be performed on a specific subset of data types.
- Parameters:
- include, excludescalar or list-like
A selection of dtypes or strings to be included/excluded. At least one of these parameters must be supplied.
- Returns:
- DataFrame
The subset of the frame including the dtypes in
includeand excluding the dtypes inexclude.
- Raises:
- ValueError
If both of
includeandexcludeare emptyIf
includeandexcludehave overlapping elements
- TypeError
If any kind of string dtype is passed in.
See also
DataFrame.dtypesReturn Series with the data type of each column.
Notes
To select all numeric types, use
np.numberor'number'To select strings you must use the
objectdtype, but note that this will return all object dtype columns. Withpd.options.future.infer_stringenabled, using"str"will work to select all string columns.See the numpy dtype hierarchy
To select datetimes, use
np.datetime64,'datetime'or'datetime64'To select timedeltas, use
np.timedelta64,'timedelta'or'timedelta64'To select Pandas categorical dtypes, use
'category'To select Pandas datetimetz dtypes, use
'datetimetz'or'datetime64[ns, tz]'
Examples
>>> df = pd.DataFrame( ... {"a": [1, 2] * 3, "b": [True, False] * 3, "c": [1.0, 2.0] * 3} ... ) >>> df a b c 0 1 True 1.0 1 2 False 2.0 2 1 True 1.0 3 2 False 2.0 4 1 True 1.0 5 2 False 2.0
>>> df.select_dtypes(include="bool") b 0 True 1 False 2 True 3 False 4 True 5 False
>>> df.select_dtypes(include=["float64"]) c 0 1.0 1 2.0 2 1.0 3 2.0 4 1.0 5 2.0
>>> df.select_dtypes(exclude=["int64"]) b c 0 True 1.0 1 False 2.0 2 True 1.0 3 False 2.0 4 True 1.0 5 False 2.0