For many use cases, the answer you want is:
ys = set[y]
[item for item in x if item not in ys]
This is a hybrid between aaronasterling's answer and quantumSoup's answer.
aaronasterling's version does len[y]
item comparisons for each element in x
, so it takes quadratic time. quantumSoup's version uses sets, so it does a single constant-time set lookup
for each element in x
—but, because it converts both x
and y
into sets, it loses the order of your elements.
By converting only y
into a set, and iterating x
in order, you get the best of both worlds—linear time, and order preservation.*
However, this still has a problem from quantumSoup's version: It requires your elements to be hashable. That's pretty much built into the nature of sets.** If you're trying to, e.g., subtract a list of dicts from another list of dicts, but the list to subtract is large, what do you do?
If you can decorate your values in some way that they're hashable, that solves the problem. For example, with a flat dictionary whose values are themselves hashable:
ys = {tuple[item.items[]] for item in y}
[item for item in x if tuple[item.items[]] not in ys]
If your types are a bit more complicated [e.g., often you're dealing with JSON-compatible values, which are hashable, or lists or dicts whose values are recursively the same type], you can still use this solution. But some types just can't be converted into anything hashable.
If your items aren't, and can't be made, hashable, but they are comparable, you can at least get log-linear time [O[N*log M]
, which is a lot better than the O[N*M]
time of the list solution, but not as good as the O[N+M]
time of the set solution] by sorting and using bisect
:
ys = sorted[y]
def bisect_contains[seq, item]:
index = bisect.bisect[seq, item]
return index < len[seq] and seq[index] == item
[item for item in x if bisect_contains[ys, item]]
If your items are neither hashable nor comparable, then you're stuck with the quadratic solution.
* Note that you could also do
this by using a pair of OrderedSet
objects, for which you can find recipes and third-party modules. But I think this is simpler.
** The reason set lookups are constant time is that all it has to do is hash the value and see if there's an entry for that hash. If it can't hash the value, this won't work.
Created: November-16, 2020 | Updated: July-18, 2021 This tutorial demonstrates how
to perform the list subtraction, or in other words, list minus list in Python.set
to Perform List Subtraction in Python
As defined by the set theory in mathematics, the difference of two sets refers to the elements from one set that do not exist in the other set.
For example, if we declare these two lists:
list1 = [1, 2, 4]
list2 = [2, 3]
The difference of list1 - list2
would be [1, 4]
, while list2 - list1
would be [3]
.
Convert List to set
to Perform List Subtraction in Python
Set theory operations are supported in Python. However, only the set
data type support these operations. Therefore, to use the set
operation, lists have to be converted into sets. This is possible by wrapping a list around the function set[]
.
Note
Converting a list to a set will remove any type of order and remove duplicate values from the list.
listA = [1, 2, 4, 7, 9, 11, 11, 14, 14]
listB = [2, 3, 7, 8, 11, 13, 13, 16]
setA = set[listA]
setB = set[listB]
print['A - B = ', setA - setB]
Output:
A - B = {1, 4, 9, 14}
The result outputs the difference between the two sets and removes the duplicate values.
We can use the function list[]
to convert the result from a set
to a list.
listA = [1, 2, 4, 7, 9, 11, 11, 14, 14]
listB = [2, 3, 7, 8, 11, 13, 13, 16]
setA = set[listA]
setB = set[listB]
list_diff = list[setA - setB]
print['A - B: ', list_diff]
Output:
A - B: [1, 4, 9, 14]
Use List Comprehension to Get List Difference in Python
List comprehension can be used to check if an element exists only in the first list but does not exist in the second list. This solution makes it possible to perform the difference operation without converting the list to a set.
listA = [1, 2, 4, 7, 9, 11, 11, 14, 14]
listB = [2, 3, 7, 8, 11, 13, 13, 16]
listSub = [elem for elem in listA if elem not in listB]
print['A - B =', listSub]
Output:
A - B = [1, 4, 9, 14, 14]
This solution does not mess with the order of the list and removes duplicates.
However, the value 11
is repeated twice in listA
, and both iterations of 11
are removed from the result of A - B
since 11
exists in both sets. This behavior is as expected.