Hướng dẫn dùng sql dedupe python
One of the most important processes a data engineer can master is deduplicating values in order to provide clean data for data consumers. Since raw data can vary in format and cleanliness it is vital that data… Return DataFrame with duplicate rows removed. Considering certain columns is optional. Indexes, including time indexes are ignored. Parameterssubsetcolumn label or sequence of labels, optionalOnly consider certain columns for identifying duplicates, by default use all of the columns. keep{‘first’, ‘last’, False}, default ‘first’Determines which duplicates (if any) to keep. - Whether to drop duplicates in place or to return a copy. ignore_indexbool, default FalseIf True, the resulting axis will be labeled 0, 1, …, n - 1. New in version 1.0.0. DataFrame with duplicates removed or None if Examples Consider dataset containing ramen rating. >>> df = pd.DataFrame({ ... 'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'], ... 'style': ['cup', 'cup', 'cup', 'pack', 'pack'], ... 'rating': [4, 4, 3.5, 15, 5] ... }) >>> df brand style rating 0 Yum Yum cup 4.0 1 Yum Yum cup 4.0 2 Indomie cup 3.5 3 Indomie pack 15.0 4 Indomie pack 5.0 By default, it removes duplicate rows based on all columns. >>> df.drop_duplicates() brand style rating 0 Yum Yum cup 4.0 2 Indomie cup 3.5 3 Indomie pack 15.0 4 Indomie pack 5.0 To remove duplicates on specific column(s), use >>> df.drop_duplicates(subset=['brand']) brand style rating 0 Yum Yum cup 4.0 2 Indomie cup 3.5 To remove duplicates and keep last occurrences, use >>> df.drop_duplicates(subset=['brand', 'style'], keep='last') brand style rating 1 Yum Yum cup 4.0 2 Indomie cup 3.5 4 Indomie pack 5.0 I'm looking to remove the names that are repeated. My code is connected to retrieve information from SQL server to Python.
The artist_name that I am retrieving on SQL are:
However, I'd only like to remove the duplicates on Python without removing any rows in my SQL table: |