Hướng dẫn generate subsets python

I hadn't come across the more_itertools.powerset function and would recommend using that. I also recommend not using the default ordering of the output from itertools.combinations, often instead you want to minimise the distance between the positions and sort the subsets of items with shorter distance between them above/before the items with larger distance between them.

The itertools recipes page shows it uses chain.from_iterable

  • Note that the r here matches the standard notation for the lower part of a binomial coefficient, the s is usually referred to as n in mathematics texts and on calculators (“n Choose r”)
def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

The other examples here give the powerset of [1,2,3,4] in such a way that the 2-tuples are listed in "lexicographic" order (when we print the numbers as integers). If I write the distance between the numbers alongside it (i.e. the difference), it shows my point:

12 ⇒ 1
13 ⇒ 2
14 ⇒ 3
23 ⇒ 1
24 ⇒ 2
34 ⇒ 1

The correct order for subsets should be the order which 'exhausts' the minimal distance first, like so:

12 ⇒ 1
23 ⇒ 1
34 ⇒ 1
13 ⇒ 2
24 ⇒ 2
14 ⇒ 3

Using numbers here makes this ordering look 'wrong', but consider for example the letters ["a","b","c","d"] it is clearer why this might be useful to obtain the powerset in this order:

ab ⇒ 1
bc ⇒ 1
cd ⇒ 1
ac ⇒ 2
bd ⇒ 2
ad ⇒ 3

This effect is more pronounced with more items, and for my purposes it makes the difference between being able to describe the ranges of the indexes of the powerset meaningfully.

(There is a lot written on Gray codes etc. for the output order of algorithms in combinatorics, I don't see it as a side issue).

I actually just wrote a fairly involved program which used this fast integer partition code to output the values in the proper order, but then I discovered more_itertools.powerset and for most uses it's probably fine to just use that function like so:

from more_itertools import powerset
from numpy import ediff1d

def ps_sorter(tup):
    l = len(tup)
    d = ediff1d(tup).tolist()
    return l, d

ps = powerset([1,2,3,4])

ps = sorted(ps, key=ps_sorter)

for x in ps:
    print(x)

()
(1,)
(2,)
(3,)
(4,)
(1, 2)
(2, 3)
(3, 4)
(1, 3)
(2, 4)
(1, 4)
(1, 2, 3)
(2, 3, 4)
(1, 2, 4)
(1, 3, 4)
(1, 2, 3, 4)

I wrote some more involved code which will print the powerset nicely (see the repo for pretty printing functions I've not included here: print_partitions, print_partitions_by_length, and pprint_tuple).

  • Repo: ordered-powerset, specifically pset_partitions.py

This is all pretty simple, but still might be useful if you want some code that'll let you get straight to accessing the different levels of the powerset:

from itertools import permutations as permute
from numpy import cumsum

# http://jeromekelleher.net/generating-integer-partitions.html
# via
# https://stackoverflow.com/questions/10035752/elegant-python-code-for-integer-partitioning#comment25080713_10036764

def asc_int_partitions(n):
    a = [0 for i in range(n + 1)]
    k = 1
    y = n - 1
    while k != 0:
        x = a[k - 1] + 1
        k -= 1
        while 2 * x <= y:
            a[k] = x
            y -= x
            k += 1
        l = k + 1
        while x <= y:
            a[k] = x
            a[l] = y
            yield tuple(a[:k + 2])
            x += 1
            y -= 1
        a[k] = x + y
        y = x + y - 1
        yield tuple(a[:k + 1])

# https://stackoverflow.com/a/6285330/2668831
def uniquely_permute(iterable, enforce_sort=False, r=None):
    previous = tuple()
    if enforce_sort: # potential waste of effort (default: False)
        iterable = sorted(iterable)
    for p in permute(iterable, r):
        if p > previous:
            previous = p
            yield p

def sum_min(p):
    return sum(p), min(p)

def partitions_by_length(max_n, sorting=True, permuting=False):
    partition_dict = {0: ()}
    for n in range(1,max_n+1):
        partition_dict.setdefault(n, [])
        partitions = list(asc_int_partitions(n))
        for p in partitions:
            if permuting:
                perms = uniquely_permute(p)
                for perm in perms:
                    partition_dict.get(len(p)).append(perm)
            else:
                partition_dict.get(len(p)).append(p)
    if not sorting:
        return partition_dict
    for k in partition_dict:
        partition_dict.update({k: sorted(partition_dict.get(k), key=sum_min)})
    return partition_dict

def print_partitions_by_length(max_n, sorting=True, permuting=True):
    partition_dict = partitions_by_length(max_n, sorting=sorting, permuting=permuting)
    for k in partition_dict:
        if k == 0:
            print(tuple(partition_dict.get(k)), end="")
        for p in partition_dict.get(k):
            print(pprint_tuple(p), end=" ")
        print()
    return

def generate_powerset(items, subset_handler=tuple, verbose=False):
    """
    Generate the powerset of an iterable `items`.

    Handling of the elements of the iterable is by whichever function is passed as
    `subset_handler`, which must be able to handle the `None` value for the
    empty set. The function `string_handler` will join the elements of the subset
    with the empty string (useful when `items` is an iterable of `str` variables).
    """
    ps = {0: [subset_handler()]}
    n = len(items)
    p_dict = partitions_by_length(n-1, sorting=True, permuting=True)
    for p_len, parts in p_dict.items():
        ps.setdefault(p_len, [])
        if p_len == 0:
            # singletons
            for offset in range(n):
                subset = subset_handler([items[offset]])
                if verbose:
                    if offset > 0:
                        print(end=" ")
                    if offset == n - 1:
                        print(subset, end="\n")
                    else:
                        print(subset, end=",")
                ps.get(p_len).append(subset)
        for pcount, partition in enumerate(parts):
            distance = sum(partition)
            indices = (cumsum(partition)).tolist()
            for offset in range(n - distance):
                subset = subset_handler([items[offset]] + [items[offset:][i] for i in indices])
                if verbose:
                    if offset > 0:
                        print(end=" ")
                    if offset == n - distance - 1:
                        print(subset, end="\n")
                    else:
                        print(subset, end=",")
                ps.get(p_len).append(subset)
        if verbose and p_len < n-1:
            print()
    return ps

As an example, I wrote a CLI demo program which takes a string as a command line argument:

python string_powerset.py abcdef

a, b, c, d, e, f

ab, bc, cd, de, ef
ac, bd, ce, df
ad, be, cf
ae, bf
af

abc, bcd, cde, def
abd, bce, cdf
acd, bde, cef
abe, bcf
ade, bef
ace, bdf
abf
aef
acf
adf

abcd, bcde, cdef
abce, bcdf
abde, bcef
acde, bdef
abcf
abef
adef
abdf
acdf
acef

abcde, bcdef
abcdf
abcef
abdef
acdef

abcdef