当前位置:网站首页>df. drop_ Duplicates() explanation + usage
df. drop_ Duplicates() explanation + usage
2022-07-22 15:24:00 【Lazy smile】
drop_duplicates()
1、 No parameters defined , Completely delete duplicate row data
2、 Remove duplicate rows of data
Catalog
3、 ... and 、 Detailed explanation :
One 、 Code example :
import pandas as pd
df = pd.DataFrame({
'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
'rating': [4, 4, 3.5, 15, 5]})
print("--------------------- Raw data :")
print(df)
print("------------------------df.drop_duplicates()")
print(df.drop_duplicates())
print("------------------------ Delete on brand Duplicate data rows in column ")
print(df.drop_duplicates(subset='brand'))
print("------------------------ Repeat rows keep the first row , Delete other lines ")
print(df.drop_duplicates(keep="first"))
print("----------------------inplace Boolean value , The default is False, Whether to delete duplicate items directly on the original data or return to the copy after deleting duplicate items ")
print("-----------------inplace=False Delete duplicates and return to the copy ")
print(df.drop_duplicates(inplace=False))
print("-------------df1")
print(df)
print("-----------------inplace=True Delete duplicates directly on the original data ")
print(df.drop_duplicates(inplace=True))
print("-------------df2")
print(df)
Two 、 Running results :
--------------------- Raw data :
brand style rating
0 Yum Yum cup 4.0
1 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
------------------------df.drop_duplicates()
brand style rating
0 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
------------------------ Delete on brand Duplicate data rows in column
brand style rating
0 Yum Yum cup 4.0
2 Indomie cup 3.5
------------------------ Repeat rows keep the first row , Delete other lines
brand style rating
0 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
----------------------inplace Boolean value , The default is False, Whether to delete duplicate items directly on the original data or return to the copy after deleting duplicate items
-----------------inplace=False Delete duplicates and return to the copy
brand style rating
0 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
-------------df1
brand style rating
0 Yum Yum cup 4.0
1 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
-----------------inplace=True Delete duplicates directly on the original data
None
-------------df2
brand style rating
0 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
3、 ... and 、 Detailed explanation :
drop_duplicates(self, subset: 'Optional[Union[Hashable, Sequence[Hashable]]]' = None, keep: 'Union[str, bool]' = 'first', inplace: 'bool' = False, ignore_index: 'bool' = False)
return :
DataFrame with duplicate rows removed.
Considering certain columns is optional. Indexes, including time indexes
are ignored.
Parameters :
----------
subset : Specify the column where the duplicate data is located .column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by
default use all of the columns.
keep : {'first', 'last', False}, default 'first'
Determines which duplicates (if any) to keep.
- ``first`` : Except for the first time , Delete duplicates .Drop duplicates except for the first occurrence.
- ``last`` : Except for the first time , Delete duplicates .Drop duplicates except for the last occurrence.
- False : Remove all duplicates .Drop all duplicates.
inplace : True: Delete directly in the original data ,False: Do not delete directly in the original data , And make a copy .bool, default False
Whether to drop duplicates in place or to return a copy.
ignore_index : bool, default False
If True, the resulting axis will be labeled 0, 1, …, n - 1.
.. versionadded:: 1.0.0
Returns
-------
DataFrame or None
DataFrame with duplicates removed or None if ``inplace=True``.
See Also
--------
DataFrame.value_counts: Count unique combinations of columns.
Example :
--------
Consider dataset containing ramen rating.
>>> df = pd.DataFrame({
... 'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
... 'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
... 'rating': [4, 4, 3.5, 15, 5]
... })
>>> df
brand style rating
0 Yum Yum cup 4.0
1 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
By default, it removes duplicate rows based on all columns.
>>> df.drop_duplicates()
brand style rating
0 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
To remove duplicates on specific column(s), use ``subset``.
>>> df.drop_duplicates(subset=['brand'])
brand style rating
0 Yum Yum cup 4.0
2 Indomie cup 3.5
To remove duplicates and keep last occurrences, use ``keep``.
>>> df.drop_duplicates(subset=['brand', 'style'], keep='last')
brand style rating
1 Yum Yum cup 4.0
2 Indomie cup 3.5
4 Indomie pack 5.0
边栏推荐
- K-均值聚类建模以及编程实现
- JMeter笔记1 | JMeter简介及体系结构
- NFC介绍(2)
- For more than 20 years, how has classified protection "kept pace with the times"?
- 让安全动起来 | 甭管什么行业网络架构,这六招拿下靶标
- A Recommendation for interface-based programming
- 等保合规2022系列 | 20余年来,等级保护在如何“与时俱进”?
- The installation and use of harbor+trivy -- the way to build a dream
- Elastase Worthington core enzyme detailed reference
- DM8: query the data file size limit of Dameng database
猜你喜欢
MySQL的锁机制:MyISAM 表锁、InnoDB行锁
2022-07-18 Jenkins pipeline use and create your own pipeline
华为云从入门到实战 | AI云开发ModelArts入门与WAF应用与部署
MySQL locking mechanism: MyISAM table lock, InnoDB row lock
NFC Introduction (2)
等保合规2022系列 | 20余年来,等级保护在如何“与时俱进”?
JDBC编程
DM8:查询达梦数据库数据文件使用大小限制
Graffiti Wi Fi & ble SoC development slide strip (5) -- burning authorization
A Recommendation for interface-based programming
随机推荐
Redis主从复制
Test the function of voting
MySQL Workbench使用教程
MySQL的增删查改(第二话)
【华为机试真题】组成最大数【2022 Q3 | 100分】
[independent station operation] Shopify sellers: how to improve the store experience? Two moves are done!
What is exploratory testing? What are the methods of exploratory testing?
等保合规2022系列 | 一个中心+三重防护,助力企业等级保护建设更科学
Introduction to microservices
Pytorch deep learning practice-1-overview
2022-07-12 use perf to count the performance of MySQL execution
MySQL練習一數據庫的知識
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or
Worthington cholinesterase, butyryl related instructions
DM8:查询达梦数据库数据文件使用大小限制
JVM (I) -- Introduction to JVM
2022-07-19 mysql/stonedb sub query hashjoin logic processing
Waiting insurance compliance 2022 series | what should you know about waiting insurance this year?
Wechat applet Decompilation
visual studio踩坑记录