当前位置：网站首页>df. drop_ Duplicates() explanation + usage

df. drop_ Duplicates() explanation + usage

2022-07-22 15:24:00 【Lazy smile】

drop_duplicates()

1、 No parameters defined , Completely delete duplicate row data

2、 Remove duplicate rows of data

Catalog

One 、 Code example ：

Two 、 Running results ：

3、 ... and 、 Detailed explanation ：

One 、 Code example ：

import pandas as pd


df = pd.DataFrame({
    'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
    'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
    'rating': [4, 4, 3.5, 15, 5]})
print("--------------------- Raw data ：")
print(df)
print("------------------------df.drop_duplicates()")
print(df.drop_duplicates())
print("------------------------ Delete on brand Duplicate data rows in column ")
print(df.drop_duplicates(subset='brand'))
print("------------------------ Repeat rows keep the first row , Delete other lines ")
print(df.drop_duplicates(keep="first"))
print("----------------------inplace  Boolean value , The default is False, Whether to delete duplicate items directly on the original data or return to the copy after deleting duplicate items ")
print("-----------------inplace=False  Delete duplicates and return to the copy ")
print(df.drop_duplicates(inplace=False))
print("-------------df1")
print(df)
print("-----------------inplace=True  Delete duplicates directly on the original data ")
print(df.drop_duplicates(inplace=True))
print("-------------df2")
print(df)

Two 、 Running results ：

--------------------- Raw data ：
     brand style  rating
0  Yum Yum   cup     4.0
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0
------------------------df.drop_duplicates()
     brand style  rating
0  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0
------------------------ Delete on brand Duplicate data rows in column 
     brand style  rating
0  Yum Yum   cup     4.0
2  Indomie   cup     3.5
------------------------ Repeat rows keep the first row , Delete other lines 
     brand style  rating
0  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0
----------------------inplace  Boolean value , The default is False, Whether to delete duplicate items directly on the original data or return to the copy after deleting duplicate items 
-----------------inplace=False  Delete duplicates and return to the copy 
     brand style  rating
0  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0
-------------df1
     brand style  rating
0  Yum Yum   cup     4.0
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0
-----------------inplace=True  Delete duplicates directly on the original data 
None
-------------df2
     brand style  rating
0  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

3、 ... and 、 Detailed explanation ：

drop_duplicates(self, subset: 'Optional[Union[Hashable, Sequence[Hashable]]]' = None, keep: 'Union[str, bool]' = 'first', inplace: 'bool' = False, ignore_index: 'bool' = False)

return ：

        DataFrame with duplicate rows removed.

    Considering certain columns is optional. Indexes, including time indexes
    are ignored.

Parameters ：
    ----------
    subset : Specify the column where the duplicate data is located .column label or sequence of labels, optional
        Only consider certain columns for identifying duplicates, by
        default use all of the columns.
    keep : {'first', 'last', False}, default 'first'
        Determines which duplicates (if any) to keep.
        - ``first`` : Except for the first time , Delete duplicates .Drop duplicates except for the first occurrence.
        - ``last`` : Except for the first time , Delete duplicates .Drop duplicates except for the last occurrence.
        - False : Remove all duplicates .Drop all duplicates.
    inplace : True: Delete directly in the original data ,False： Do not delete directly in the original data , And make a copy .bool, default False
        Whether to drop duplicates in place or to return a copy.
    ignore_index : bool, default False
        If True, the resulting axis will be labeled 0, 1, …, n - 1.

        .. versionadded:: 1.0.0

    Returns
    -------
    DataFrame or None
        DataFrame with duplicates removed or None if ``inplace=True``.

    See Also
    --------
    DataFrame.value_counts: Count unique combinations of columns.

    Example ：
    --------
    Consider dataset containing ramen rating.

    >>> df = pd.DataFrame({
    ...     'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
    ...     'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
    ...     'rating': [4, 4, 3.5, 15, 5]
    ... })
    >>> df
        brand style rating
    0 Yum Yum   cup     4.0
    1 Yum Yum   cup     4.0
    2 Indomie   cup     3.5
    3 Indomie pack    15.0
    4 Indomie pack     5.0

    By default, it removes duplicate rows based on all columns.

    >>> df.drop_duplicates()
        brand style rating
    0 Yum Yum   cup     4.0
    2 Indomie   cup     3.5
    3 Indomie pack    15.0
    4 Indomie pack     5.0

    To remove duplicates on specific column(s), use ``subset``.

    >>> df.drop_duplicates(subset=['brand'])
        brand style rating
    0 Yum Yum   cup     4.0
    2 Indomie   cup     3.5

    To remove duplicates and keep last occurrences, use ``keep``.

    >>> df.drop_duplicates(subset=['brand', 'style'], keep='last')
        brand style rating
    1 Yum Yum   cup     4.0
    2 Indomie   cup     3.5
    4 Indomie pack     5.0

原网站

版权声明
本文为[Lazy smile]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/203/202207220155381197.html

当前位置：网站首页>df. drop_ Duplicates() explanation + usage

df. drop_ Duplicates() explanation + usage

One 、 Code example ：

Two 、 Running results ：

3、 ... and 、 Detailed explanation ：

边栏推荐

猜你喜欢

随机推荐