Compression of two tuples in python



Description

Python tuple method cmp() compares elements of two tuples.

Syntax

Following is the syntax for cmp() method −

cmp(tuple1, tuple2)

Parameters

  • tuple1 − This is the first tuple to be compared

  • tuple2 − This is the second tuple to be compared

Return Value

If elements are of the same type, perform the compare and return the result. If elements are different types, check to see if they are numbers.

  • If numbers, perform numeric coercion if necessary and compare.

  • If either element is a number, then the other element is "larger" (numbers are "smallest").

  • Otherwise, types are sorted alphabetically by name.

If we reached the end of one of the tuples, the longer tuple is "larger." If we exhaust both tuples and share the same data, the result is a tie, meaning that 0 is returned.

Example

The following example shows the usage of cmp() method.

#!/usr/bin/python

tuple1, tuple2 = (123, 'xyz'), (456, 'abc')
print cmp(tuple1, tuple2)
print cmp(tuple2, tuple1)
tuple3 = tuple2 + (786,);
print cmp(tuple2, tuple3)

When we run above program, it produces following result −

-1
1
-1

python_tuples.htm

Is there a way to reversibly compress a tuple of integers in Python? I have a large number of 5 digit tuples (values range from 0-100000) that I want to be able to store in a more memory efficient way but I also need to use their original values at a later point.

If I had a tuple like this:

test_tuple = (520, 203, 9721, 12, 4839)

I'd like to be able to compress it to a single integer value, similar to pythons default hash function, except I need to be able to recreate the tuple from the integer value, which is not possible with the hash function.

So something like:

compressed = compress(test_tuple)
og_tuple = decompress(compressed)

Where compressed is an integer (or other small memory) representation of test_tuple and og_tuple is the original tuple extracted from the compressed representation. Ideally the solution should also be fast.

Compression of two tuples in python

martineau

115k25 gold badges160 silver badges284 bronze badges

asked Oct 23, 2021 at 20:09

Compression of two tuples in python

8

This probably isn’t the fastest or smallest form of compressing but it’s prolly the easiest to follow. Get the lengths of all the integers and append that to the integers as a string:

test_tuple = (520, 203, 9721, 12, 4839)

strs = list(map(str, test_tuple))
compressed = int(''.join(map(str, map(len, strs))) + ''.join(strs))

Output:

334245202039721124839

Then to decompress, since all tuples contain 5 integers and the lengths won’t be over 9 you know the first 5 characters will be the lengths, so just use that to unpack the integers again. I used itertools.islice for this btw:

from itertools import islice 

s = str(compressed)
idxs, nums = map(int, s[:5]), [iter(s[5:])]*5
decompressed = tuple(int(''.join(islice(n, i))) for i, n in zip(idxs, nums))

Output:

(520, 203, 9721, 12, 4839)

answered Oct 23, 2021 at 20:42

Compression of two tuples in python

JabJab

25.8k21 gold badges74 silver badges113 bronze badges

2

array stores fixed length integers efficiently. You could use that to build a single bytes object holding all of your data and then use a compression algorithm like LZMA to slim it down further still.

import array
import lzma

test_list = [tuple(range(i, i+5)) for i in range(100_000)]
arr_bytes = b"".join(array.array("I", tup).tobytes() for tup in test_list)
compressed = lzma.compress(arr_bytes)

A longer version that includes checking sizes and unwinding back to the original array is

import array
import lzma
import sys

# generate a test list and get its size
test_list = [tuple(range(i, i+5)) for i in range(100_000)]
sz = sum(sum(sys.getsizeof(i) for i in tup) for tup in test_list)
sz += sum(sys.getsizeof(tup) for tup in test_list)
sz += sys.getsizeof(test_list)
print("orig size" , sz)

# convert tuples to fixed size array of 4 byte ints, then concat to bytes
arr_format = "I"
arr_item_size = 4
assert array.array("I").itemsize == arr_item_size, "pack to 4 bytes"
arr_len = arr_item_size * 5
arr_bytes = b"".join(array.array(arr_format, tup).tobytes() for tup in test_list)
print("as bytes", len(arr_bytes), f"{len(arr_bytes)/sz*100:.2f}%")

# compress
compressed = lzma.compress(arr_bytes)
print("compressed", len(compressed), f"{len(compressed)/sz*100:.2f}%")

# sanity check that we got the same stuff back
arr_bytes_decompressed = lzma.decompress(compressed)
assert arr_bytes_decompressed == arr_bytes, "decompressed right"

test_list_decompressed = [tuple(array.array(arr_format, arr_bytes_decompressed[i:i+arr_len]))
        for i in range(0, len(arr_bytes_decompressed), arr_len)]
assert test_list_decompressed == test_list, "test list"

Running I get

orig size 22800980
as bytes 2000000 8.77%
compressed 55972 0.25%

The compressed size depends on how random your integers are, but still, that ain't bad!

answered Oct 23, 2021 at 21:38

tdelaneytdelaney

66k5 gold badges74 silver badges106 bronze badges

Sometimes the obvious solution pops up after going through an answer. Since this is the best solution (IMHO), I'm posting it separately....

Just pickle it.

import pickle
import lzma

test_list = [tuple(range(i, i+5)) for i in range(100_000)]
compressed = lzma.compress(pickle.dumps(test_list))

More complicated with data measurements...

# generate a test list and get its size
test_list = [tuple(range(i, i+5)) for i in range(100_000)]
sz = sum(sum(sys.getsizeof(i) for i in tup) for tup in test_list)
sz += sum(sys.getsizeof(tup) for tup in test_list)
sz += sys.getsizeof(test_list)
print("orig size" , sz)

pickled = pickle.dumps(test_list)
print("pickled", len(pickled), f"{len(pickled)/sz*100:.2f}%")

compressed = lzma.compress(pickled)
print("compressed", len(compressed), f"{len(compressed)/sz*100:.2f}%")

Output

orig size 22800980
pickled 2143892 9.40%
compressed 152068 0.67%

answered Oct 23, 2021 at 22:08

tdelaneytdelaney

66k5 gold badges74 silver badges106 bronze badges

2

Can we compare 2 tuples in Python?

Practical Data Science using Python Tuples are compared position by position: the first item of the first tuple is compared to the first item of the second tuple; if they are not equal, this is the result of the comparison, else the second item is considered, then the third and so on.

Can you zip two tuples?

12.5 Lists and tuples zip is a built-in function that takes two or more sequences and “zips” them into a list of tuples where each tuple contains one element from each sequence. In Python 3, zip returns an iterator of tuples, but for most purposes, an iterator behaves like a list.

How do you merge two tuples in Python?

Concatenation is done with the + operator, and multiplication is done with the * operator. Because the + operator can concatenate, it can be used to combine tuples to form a new tuple, though it cannot modify an existing tuple. The * operator can be used to multiply tuples.

Is there tuple comprehension in Python?

There is no tuple comprehension in Python. Comprehension works by looping or iterating over items and assigning them into a container, a Tuple is unable to receive assignments.