当前位置:网站首页>df. Describe() detailed + usage + Example
df. Describe() detailed + usage + Example
2022-07-22 15:23:00 【Lazy smile】
Python 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.22.0 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 7.22.0
Python 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)] on win32
describe(self: 'FrameOrSeries', percentiles=None, include=None, exclude=None, datetime_is_numeric=False) -> 'FrameOrSeries'
percentile: List images 0-1 Data types between numbers to return their respective percentiles
include: describe DataFrame List of data types to be included in . Default to none
exclude: describe DataFrame List of data types to be excluded during . Default to none
# Read Excel file
df = pd.read_excel('./data/all.xlsx')
# Basic description of data
view = df.describe(percentiles=[], include='all').T
view.to_excel('./data/result111.xlsx')
all.xlsx Data presentation :
result111.xlsx Data presentation :
Generate descriptive statistics.
Descriptive statistics include those that summarize the central
tendency, dispersion and shape of a
dataset's distribution, excluding ``NaN`` values.
Analyzes both numeric and object series, as well
as ``DataFrame`` column sets of mixed data types. The output
will vary depending on what is provided. Refer to the notes
below for more detail.
Parameters, :
----------
percentiles : list-like of numbers, optional
The percentiles to include in the output. All should
fall between 0 and 1. The default is
``[.25, .5, .75]``, which returns the 25th, 50th, and
75th percentiles.
include : 'all', list-like of dtypes or None (default), optional
A white list of data types to include in the result. Ignored
for ``Series``. Here are the options:
- 'all' : All columns of the input will be included in the output. include= ”all“ Is a description of all attributes .
- A list-like of dtypes : Limits the results to the
provided data types.
To limit the result to numeric types submit
``numpy.number``. To limit it instead to object columns submit the ``numpy.object`` data type. Strings can also be used in the style of
``select_dtypes`` (e.g. ``df.describe(include=['O'])``). To
select pandas categorical columns, use ``'category'``
- None (default) : The result will include all numeric columns.
exclude : list-like of dtypes or None (default), optional,
A black list of data types to omit from the result. Ignored
for ``Series``. Here are the options:
- A list-like of dtypes : Excludes the provided data types
from the result. To exclude numeric types submit
``numpy.number``. To exclude object columns submit the data
type ``numpy.object``. Strings can also be used in the style of
``select_dtypes`` (e.g. ``df.describe(include=['O'])``). To
exclude pandas categorical columns, use ``'category'``
- None (default) : The result will exclude nothing.
datetime_is_numeric : bool, default False
Whether to treat datetime dtypes as numeric. This affects statistics
calculated for the column. For DataFrame input, this also
controls whether datetime columns are included by default.
.. versionadded:: 1.1.0
return :
Series or DataFrame
Summary statistics of the Series or Dataframe provided.
Notes:
-----
For numeric data, the result's index will include ``count``,
``mean``, ``std``, ``min``, ``max`` as well as lower, ``50`` and
upper percentiles. By default the lower percentile is ``25`` and the
upper percentile is ``75``. The ``50`` percentile is the
same as the median.
For object data (e.g. strings or timestamps), the result's index
will include ``count``, ``unique``, ``top``, and ``freq``. The ``top``
is the most common value. The ``freq`` is the most common value's
frequency. Timestamps also include the ``first`` and ``last`` items.
If multiple object values have the highest count, then the
``count`` and ``top`` results will be arbitrarily chosen from
among those with the highest count.
For mixed data types provided via a ``DataFrame``, the default is to
return only an analysis of numeric columns. If the dataframe consists
only of object and categorical data without any numeric columns, the
default is to return an analysis of both the object and categorical
columns. If ``include='all'`` is provided as an option, the result
will include a union of attributes of each type.
The `include` and `exclude` parameters can be used to limit
which columns in a ``DataFrame`` are analyzed for the output.
The parameters are ignored when analyzing a ``Series``.
Example :
Describing a numeric ``Series``.
>>> s = pd.Series([1, 2, 3])
>>> s.describe()
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
dtype: float64
Describing a categorical ``Series``.
>>> s = pd.Series(['a', 'a', 'b', 'c'])
>>> s.describe()
count 4
unique 3
top a
freq 2
dtype: object
Describing a timestamp ``Series``.
>>> s = pd.Series([
... np.datetime64("2000-01-01"),
... np.datetime64("2010-01-01"),
... np.datetime64("2010-01-01")
... ])
>>> s.describe(datetime_is_numeric=True)
count 3
mean 2006-09-01 08:00:00
min 2000-01-01 00:00:00
25% 2004-12-31 12:00:00
50% 2010-01-01 00:00:00
75% 2010-01-01 00:00:00
max 2010-01-01 00:00:00
dtype: object
Describing a ``DataFrame``. By default only numeric fields
are returned.
>>> df = pd.DataFrame({'categorical': pd.Categorical(['d','e','f']),
... 'numeric': [1, 2, 3],
... 'object': ['a', 'b', 'c']
... })
>>> df.describe()
numeric
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
Describing all columns of a ``DataFrame`` regardless of data type.
>>> df.describe(include='all') # doctest: +SKIP
categorical numeric object
count 3 3.0 3
unique 3 NaN 3
top f NaN a
freq 1 NaN 1
mean NaN 2.0 NaN
std NaN 1.0 NaN
min NaN 1.0 NaN
25% NaN 1.5 NaN
50% NaN 2.0 NaN
75% NaN 2.5 NaN
max NaN 3.0 NaN
Describing a column from a ``DataFrame`` by accessing it as
an attribute.
>>> df.numeric.describe()
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
Name: numeric, dtype: float64
Including only numeric columns in a ``DataFrame`` description.
>>> df.describe(include=[np.number])
numeric
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
Including only string columns in a ``DataFrame`` description.
>>> df.describe(include=[object]) # doctest: +SKIP
object
count 3
unique 3
top a
freq 1
Including only categorical columns from a ``DataFrame`` description.
>>> df.describe(include=['category'])
categorical
count 3
unique 3
top d
freq 1
Excluding numeric columns from a ``DataFrame`` description.
>>> df.describe(exclude=[np.number]) # doctest: +SKIP
categorical object
count 3 3
unique 3 3
top f a
freq 1 1
Excluding object columns from a ``DataFrame`` description.
>>> df.describe(exclude=[object]) # doctest: +SKIP
categorical numeric
count 3 3.0
unique 3 NaN
top f NaN
freq 1 NaN
mean NaN 2.0
std NaN 1.0
min NaN 1.0
25% NaN 1.5
50% NaN 2.0
75% NaN 2.5
max NaN 3.0
边栏推荐
- 配置Eureka时Status显示的是电脑名而不是localhost及ipAddr显示为本机ip的问题
- Two people line up to install locally & x360ce simulation handle Tutorial & xpadder handle simulation keyboard and mouse
- halcon 使用txt文件格式显示点云
- adb常见命令
- 海康、大华、宇视拉实时流url规则总结
- df.drop_duplicates() 详解+用法
- 2022-07-13 mysql/stonedb subquery optimizer processing record
- 涂鸦Wi-Fi&BLE SoC开发幻彩灯带(5)----烧录授权
- 双人成行本地安装&X360ce模拟手柄教程&xpadder手柄模拟键盘鼠标
- NFC介绍(2)
猜你喜欢
华为云从入门到实战 | AI云开发ModelArts入门与WAF应用与部署
2022-07-13 comparison of fast subquery and slow subquery execution of mysql/stonedb
配置Eureka时Status显示的是电脑名而不是localhost及ipAddr显示为本机ip的问题
Worthington cholinesterase, butyryl related instructions
Redis主从复制
2022-07-13 mysql/stonedb subquery perf analysis
Distributed link tracking skywalking practice
2022-07-15 MySQL receives new connection processing
Redis缓存穿透和雪崩
The installation and use of harbor+trivy -- the way to build a dream
随机推荐
DM8: query the data file size limit of Dameng database
2022-07-13 mysql/stonedb subquery optimizer processing record
static变量和全局变量的区别
Conf configuration of redis
这个sql怎么优化?
Record online double write failure log MySQL error troubleshooting reasons
Equal protection compliance 2022 series | one center + triple protection, helping the construction of enterprise level protection to be more scientific
3d点云txt文件中删减nan点
Redis' expansion plan
Huawei cloud computing fuisoncompute8.0 installation
df.describe() 详解+用法+示例
plt 画图并保存结果
OpenSSL 自签名证书颁发脚本 —— 筑梦之路
df.drop_duplicates() 详解+用法
Myocardial xanthase -- characteristics of myocardial xanthase of Clostridium crenatum Worthington
JMeter笔记1 | JMeter简介及体系结构
How Linux queries Oracle error logs
2022-07-14 mysql/stonedb exists clause condition const test
什么是探索性测试?探索性测试有哪些方法?
[information collection] write data from fofa API interface into txt and excel