Is not empty in python?

Best way to check if a list is empty

For example, if passed the following:

a = []

How do I check to see if a is empty?

Short Answer:

Place the list in a boolean context [for example, with an if or while statement]. It will test False if it is empty, and True otherwise. For example:

if not a:                           # do this!
    print['a is an empty list']

PEP 8

PEP 8, the official Python style guide for Python code in Python's standard library, asserts:

For sequences, [strings, lists, tuples], use the fact that empty sequences are false.

Yes: if not seq:
     if seq:

No: if len[seq]:
    if not len[seq]:

We should expect that standard library code should be as performant and correct as possible. But why is that the case, and why do we need this guidance?

Explanation

I frequently see code like this from experienced programmers new to Python:

if len[a] == 0:                     # Don't do this!
    print['a is an empty list']

And users of lazy languages may be tempted to do this:

if a == []:                         # Don't do this!
    print['a is an empty list']

These are correct in their respective other languages. And this is even semantically correct in Python.

But we consider it un-Pythonic because Python supports these semantics directly in the list object's interface via boolean coercion.

From the docs [and note specifically the inclusion of the empty list, []]:

By default, an object is considered true unless its class defines either a __bool__[] method that returns False or a __len__[] method that returns zero, when called with the object. Here are most of the built-in objects considered false:

  • constants defined to be false: None and False.
  • zero of any numeric type: 0, 0.0, 0j, Decimal[0], Fraction[0, 1]
  • empty sequences and collections: '', [], [], {}, set[], range[0]

And the datamodel documentation:

object.__bool__[self]

Called to implement truth value testing and the built-in operation bool[]; should return False or True. When this method is not defined, __len__[] is called, if it is defined, and the object is considered true if its result is nonzero. If a class defines neither __len__[] nor __bool__[], all its instances are considered true.

and

object.__len__[self]

Called to implement the built-in function len[]. Should return the length of the object, an integer >= 0. Also, an object that doesn’t define a __bool__[] method and whose __len__[] method returns zero is considered to be false in a Boolean context.

So instead of this:

if len[a] == 0:                     # Don't do this!
    print['a is an empty list']

or this:

if a == []:                     # Don't do this!
    print['a is an empty list']

Do this:

if not a:
    print['a is an empty list']

Doing what's Pythonic usually pays off in performance:

Does it pay off? [Note that less time to perform an equivalent operation is better:]

>>> import timeit
>>> min[timeit.repeat[lambda: len[[]] == 0, repeat=100]]
0.13775854044661884
>>> min[timeit.repeat[lambda: [] == [], repeat=100]]
0.0984637276455409
>>> min[timeit.repeat[lambda: not [], repeat=100]]
0.07878462291455435

For scale, here's the cost of calling the function and constructing and returning an empty list, which you might subtract from the costs of the emptiness checks used above:

>>> min[timeit.repeat[lambda: [], repeat=100]]
0.07074015751817342

We see that either checking for length with the builtin function len compared to 0 or checking against an empty list is much less performant than using the builtin syntax of the language as documented.

Why?

For the len[a] == 0 check:

First Python has to check the globals to see if len is shadowed.

Then it must call the function, load 0, and do the equality comparison in Python [instead of with C]:

>>> import dis
>>> dis.dis[lambda: len[[]] == 0]
  1           0 LOAD_GLOBAL              0 [len]
              2 BUILD_LIST               0
              4 CALL_FUNCTION            1
              6 LOAD_CONST               1 [0]
              8 COMPARE_OP               2 [==]
             10 RETURN_VALUE

And for the [] == [] it has to build an unnecessary list and then, again, do the comparison operation in Python's virtual machine [as opposed to C]

>>> dis.dis[lambda: [] == []]
  1           0 BUILD_LIST               0
              2 BUILD_LIST               0
              4 COMPARE_OP               2 [==]
              6 RETURN_VALUE

The "Pythonic" way is a much simpler and faster check since the length of the list is cached in the object instance header:

>>> dis.dis[lambda: not []]
  1           0 BUILD_LIST               0
              2 UNARY_NOT
              4 RETURN_VALUE

Evidence from the C source and documentation

PyVarObject

This is an extension of PyObject that adds the ob_size field. This is only used for objects that have some notion of length. This type does not often appear in the Python/C API. It corresponds to the fields defined by the expansion of the PyObject_VAR_HEAD macro.

From the c source in Include/listobject.h:

typedef struct {
    PyObject_VAR_HEAD
    /* Vector of pointers to list elements.  list[0] is ob_item[0], etc. */
    PyObject **ob_item;

    /* ob_item contains space for 'allocated' elements.  The number
     * currently in use is ob_size.
     * Invariants:
     *     0 

Chủ Đề