当前位置:网站首页>df.describe() 详解+用法+示例
df.describe() 详解+用法+示例
2022-07-22 01:56:00 【懒笑翻】
Python 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.22.0 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 7.22.0
Python 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)] on win32
describe(self: 'FrameOrSeries', percentiles=None, include=None, exclude=None, datetime_is_numeric=False) -> 'FrameOrSeries'
percentile:列出像0-1之间的数字的数据类型以返回各自的百分位数
include:描述 DataFrame 时要包括的数据类型列表。默认为无
exclude:描述 DataFrame 时要排除的数据类型列表。默认为无
# 读取Excel文件
df = pd.read_excel('./data/all.xlsx')
# 数据的基本描述
view = df.describe(percentiles=[], include='all').T
view.to_excel('./data/result111.xlsx')
all.xlsx 数据展示:
result111.xlsx 数据展示:
Generate descriptive statistics.
Descriptive statistics include those that summarize the central
tendency, dispersion and shape of a
dataset's distribution, excluding ``NaN`` values.
Analyzes both numeric and object series, as well
as ``DataFrame`` column sets of mixed data types. The output
will vary depending on what is provided. Refer to the notes
below for more detail.
参数详解:
----------
percentiles : list-like of numbers, optional
The percentiles to include in the output. All should
fall between 0 and 1. The default is
``[.25, .5, .75]``, which returns the 25th, 50th, and
75th percentiles.
include : 'all', list-like of dtypes or None (default), optional
A white list of data types to include in the result. Ignored
for ``Series``. Here are the options:
- 'all' : All columns of the input will be included in the output. include= ”all“则是对所有属性的描述。
- A list-like of dtypes : Limits the results to the
provided data types.
To limit the result to numeric types submit
``numpy.number``. To limit it instead to object columns submit the ``numpy.object`` data type. Strings can also be used in the style of
``select_dtypes`` (e.g. ``df.describe(include=['O'])``). To
select pandas categorical columns, use ``'category'``
- None (default) : The result will include all numeric columns.
exclude : list-like of dtypes or None (default), optional,
A black list of data types to omit from the result. Ignored
for ``Series``. Here are the options:
- A list-like of dtypes : Excludes the provided data types
from the result. To exclude numeric types submit
``numpy.number``. To exclude object columns submit the data
type ``numpy.object``. Strings can also be used in the style of
``select_dtypes`` (e.g. ``df.describe(include=['O'])``). To
exclude pandas categorical columns, use ``'category'``
- None (default) : The result will exclude nothing.
datetime_is_numeric : bool, default False
Whether to treat datetime dtypes as numeric. This affects statistics
calculated for the column. For DataFrame input, this also
controls whether datetime columns are included by default.
.. versionadded:: 1.1.0
返回:
Series or DataFrame
Summary statistics of the Series or Dataframe provided.
Notes:
-----
For numeric data, the result's index will include ``count``,
``mean``, ``std``, ``min``, ``max`` as well as lower, ``50`` and
upper percentiles. By default the lower percentile is ``25`` and the
upper percentile is ``75``. The ``50`` percentile is the
same as the median.
For object data (e.g. strings or timestamps), the result's index
will include ``count``, ``unique``, ``top``, and ``freq``. The ``top``
is the most common value. The ``freq`` is the most common value's
frequency. Timestamps also include the ``first`` and ``last`` items.
If multiple object values have the highest count, then the
``count`` and ``top`` results will be arbitrarily chosen from
among those with the highest count.
For mixed data types provided via a ``DataFrame``, the default is to
return only an analysis of numeric columns. If the dataframe consists
only of object and categorical data without any numeric columns, the
default is to return an analysis of both the object and categorical
columns. If ``include='all'`` is provided as an option, the result
will include a union of attributes of each type.
The `include` and `exclude` parameters can be used to limit
which columns in a ``DataFrame`` are analyzed for the output.
The parameters are ignored when analyzing a ``Series``.
示例:
Describing a numeric ``Series``.
>>> s = pd.Series([1, 2, 3])
>>> s.describe()
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
dtype: float64
Describing a categorical ``Series``.
>>> s = pd.Series(['a', 'a', 'b', 'c'])
>>> s.describe()
count 4
unique 3
top a
freq 2
dtype: object
Describing a timestamp ``Series``.
>>> s = pd.Series([
... np.datetime64("2000-01-01"),
... np.datetime64("2010-01-01"),
... np.datetime64("2010-01-01")
... ])
>>> s.describe(datetime_is_numeric=True)
count 3
mean 2006-09-01 08:00:00
min 2000-01-01 00:00:00
25% 2004-12-31 12:00:00
50% 2010-01-01 00:00:00
75% 2010-01-01 00:00:00
max 2010-01-01 00:00:00
dtype: object
Describing a ``DataFrame``. By default only numeric fields
are returned.
>>> df = pd.DataFrame({'categorical': pd.Categorical(['d','e','f']),
... 'numeric': [1, 2, 3],
... 'object': ['a', 'b', 'c']
... })
>>> df.describe()
numeric
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
Describing all columns of a ``DataFrame`` regardless of data type.
>>> df.describe(include='all') # doctest: +SKIP
categorical numeric object
count 3 3.0 3
unique 3 NaN 3
top f NaN a
freq 1 NaN 1
mean NaN 2.0 NaN
std NaN 1.0 NaN
min NaN 1.0 NaN
25% NaN 1.5 NaN
50% NaN 2.0 NaN
75% NaN 2.5 NaN
max NaN 3.0 NaN
Describing a column from a ``DataFrame`` by accessing it as
an attribute.
>>> df.numeric.describe()
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
Name: numeric, dtype: float64
Including only numeric columns in a ``DataFrame`` description.
>>> df.describe(include=[np.number])
numeric
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
Including only string columns in a ``DataFrame`` description.
>>> df.describe(include=[object]) # doctest: +SKIP
object
count 3
unique 3
top a
freq 1
Including only categorical columns from a ``DataFrame`` description.
>>> df.describe(include=['category'])
categorical
count 3
unique 3
top d
freq 1
Excluding numeric columns from a ``DataFrame`` description.
>>> df.describe(exclude=[np.number]) # doctest: +SKIP
categorical object
count 3 3
unique 3 3
top f a
freq 1 1
Excluding object columns from a ``DataFrame`` description.
>>> df.describe(exclude=[object]) # doctest: +SKIP
categorical numeric
count 3 3.0
unique 3 NaN
top f NaN
freq 1 NaN
mean NaN 2.0
std NaN 1.0
min NaN 1.0
25% NaN 1.5
50% NaN 2.0
75% NaN 2.5
max NaN 3.0
边栏推荐
- Silicon Valley classroom notes (Part 2)
- 《PyTorch深度学习实践》-1-Overview
- Redis的conf配置
- How to solve the "last mile of delivery" of community group purchase
- RPC core module summary
- 10 baseline papers required for entry recommendation system
- 等保合规2022系列 | 20余年来,等级保护在如何“与时俱进”?
- How to optimize this SQL?
- Worthington deoxyribonucleic acid and related research tools
- 你可知道“开源女王”是谁吗?--她面临被解雇的威胁,仍然坚持开源了某个著名项目
猜你喜欢
随机推荐
Paper reading | point voxel CNN for efficient 3D deep learning
Architecture design scheme (continuously updating ing)
ZCMU--1925: hx & xh‘s game(C语言)
Worthington peptide synthesis application chymotrypsin scheme
Redis的conf配置
MySQL练习一数据库的知识
92. (leaflet chapter) leaflet situation plotting - acquisition of attack direction
Redis master-slave replication
投票不能重复
Redis持久化(RDB和AOF)
MySQL Exercise one database Knowledge
Event handling of lvgl
Redis persistence (RDB and AOF)
Wafer thickness measurement
等保合规2022系列 | 一个中心+三重防护,助力企业等级保护建设更科学
硅谷课堂笔记(下)
NFC介绍(2)
MySQL exercise 1 knowledge of database
弹性蛋白酶丨Worthington 核心酶详细参考资料
定时的时间测试