Python how to remove duplicate lists

Learn how to remove duplicates from a List in Python.

Example

Remove any duplicates from a List:

mylist = ["a", "b", "a", "c", "c"]
mylist = list[dict.fromkeys[mylist]]
print[mylist]

Try it Yourself »

Example Explained

First we have a List that contains duplicates:

A List with Duplicates

mylist = ["a", "b", "a", "c", "c"]
mylist = list[dict.fromkeys[mylist]]
print[mylist]

Create a dictionary, using the List items as keys. This will automatically remove any duplicates because dictionaries cannot have duplicate keys.

Create a Dictionary

mylist = ["a", "b", "a", "c", "c"]
mylist = list[dict.fromkeys[mylist]]
print[mylist]

Then, convert the dictionary back into a list:

Convert Into a List

mylist = ["a", "b", "a", "c", "c"]
mylist = list[dict.fromkeys[mylist]]
print[mylist]

Now we have a List without any duplicates, and it has the same order as the original List.

Print the List to demonstrate the result

Print the List

mylist = ["a", "b", "a", "c", "c"]
mylist = list[dict.fromkeys[mylist]]
print[mylist]

Create a Function

If you like to have a function where you can send your lists, and get them back without duplicates, you can create a function and insert the code from the example above.

Example

def my_function[x]:
  return list[dict.fromkeys[x]]

mylist = my_function[["a", "b", "a", "c", "c"]]

print[mylist]

Try it Yourself »

Example Explained

Create a function that takes a List as an argument.

Create a Function

def my_function[x]:
  return list[dict.fromkeys[x]]

mylist = my_function[["a", "b", "a", "c", "c"]]

print[mylist]

Create a dictionary, using this List items as keys.

Create a Dictionary

def my_function[x]:
  return list[
dict.fromkeys[x]]

mylist = my_function[["a", "b", "a", "c", "c"]]

print[mylist]

Convert the dictionary into a list.

Convert Into a List

def my_function[x]:
  return
list[dict.fromkeys[x]]

mylist = my_function[["a", "b", "a", "c", "c"]]

print[mylist]

Return the list

Return List

def my_function[x]:
 
return list[dict.fromkeys[x]]

mylist = my_function[["a", "b", "a", "c", "c"]]

print[mylist]

Call the function, with a list as a parameter:

Call the Function

def my_function[x]:
  return list[dict.fromkeys[x]]
mylist = my_function[["a", "b", "a", "c", "c"]]print[mylist]

Print the result:

Print the Result

def my_function[x]:
  return list[dict.fromkeys[x]]

mylist = my_function[["a", "b", "a", "c", "c"]]

print[mylist]


In this answer, there will be two sections: Two unique solutions, and a graph of speed for specific solutions.

Removing Duplicate Items

Most of these answers only remove duplicate items which are hashable, but this question doesn't imply it doesn't just need hashable items, meaning I'll offer some solutions which don't require hashable items.

collections.Counter is a powerful tool in the standard library which could be perfect for this. There's only one other solution which even has Counter in it. However, that solution is also limited to hashable keys.

To allow unhashable keys in Counter, I made a Container class, which will try to get the object's default hash function, but if it fails, it will try its identity function. It also defines an eq and a hash method. This should be enough to allow unhashable items in our solution. Unhashable objects will be treated as if they are hashable. However, this hash function uses identity for unhashable objects, meaning two equal objects that are both unhashable won't work. I suggest you override this, and changing it to use the hash of an equivalent mutable type [like using hash[tuple[my_list]] if my_list is a list].

I also made two solutions. Another solution which keeps the order of the items, using a subclass of both OrderedDict and Counter which is named 'OrderedCounter'. Now, here are the functions:

from collections import OrderedDict, Counter

class Container:
    def __init__[self, obj]:
        self.obj = obj
    def __eq__[self, obj]:
        return self.obj == obj
    def __hash__[self]:
        try:
            return hash[self.obj]
        except:
            return id[self.obj]

class OrderedCounter[Counter, OrderedDict]:
     'Counter that remembers the order elements are first encountered'

     def __repr__[self]:
         return '%s[%r]' % [self.__class__.__name__, OrderedDict[self]]

     def __reduce__[self]:
         return self.__class__, [OrderedDict[self],]
    
def remd[sequence]:
    cnt = Counter[]
    for x in sequence:
        cnt[Container[x]] += 1
    return [item.obj for item in cnt]

def oremd[sequence]:
    cnt = OrderedCounter[]
    for x in sequence:
        cnt[Container[x]] += 1
    return [item.obj for item in cnt]

remd is non-ordered sorting, while oremd is ordered sorting. You can clearly tell which one is faster, but I'll explain anyways. The non-ordered sorting is slightly faster, since it doesn't store the order of the items.

Now, I also wanted to show the speed comparisons of each answer. So, I'll do that now.

Which Function is the Fastest?

For removing duplicates, I gathered 10 functions from a few answers. I calculated the speed of each function and put it into a graph using matplotlib.pyplot.

I divided this into three rounds of graphing. A hashable is any object which can be hashed, an unhashable is any object which cannot be hashed. An ordered sequence is a sequence which preserves order, an unordered sequence does not preserve order. Now, here are a few more terms:

Unordered Hashable was for any method which removed duplicates, which didn't necessarily have to keep the order. It didn't have to work for unhashables, but it could.

Ordered Hashable was for any method which kept the order of the items in the list, but it didn't have to work for unhashables, but it could.

Ordered Unhashable was any method which kept the order of the items in the list, and worked for unhashables.

On the y-axis is the amount of seconds it took.

On the x-axis is the number the function was applied to.

I generated sequences for unordered hashables and ordered hashables with the following comprehension: [list[range[x]] + list[range[x]] for x in range[0, 1000, 10]]

For ordered unhashables: [[list[range[y]] + list[range[y]] for y in range[x]] for x in range[0, 1000, 10]]

Note there is a step in the range because without it, this would've taken 10x as long. Also because in my personal opinion, I thought it might've looked a little easier to read.

Also note the keys on the legend are what I tried to guess as the most vital parts of the implementation of the function. As for what function does the worst or best? The graph speaks for itself.

With that settled, here are the graphs.

Unordered Hashables

[Zoomed in]

Ordered Hashables

[Zoomed in]

Ordered Unhashables

[Zoomed in]

Chủ Đề