In Python, you can generate a new list from a list of strings by extracting, replacing, or transforming elements that satisfy certain conditions with list comprehensions.
This article briefly explains the list comprehensions and then describes the following with sample code.
- Extract strings that contain or do not contain a specific string
- Replace a specific string in a list
- Extract strings that begin or do not begin with a specific string
- Extract strings that end or do not end with a specific string
- Extract strings by uppercase or lowercase
- Convert uppercase and lowercase
- Extract strings by alphabetic or numeric
- Multiple conditions
- Regular expression [regex]
See the following article for more information on how to extract and replace strings.
- Extract a substring from a string in Python [position, regex]
- Replace strings in Python [replace, translate, re.sub, re.subn]
List comprehensions
To generate a new list from a list, you can use list comprehensions, which is simpler to write than the for
loop.
- List comprehensions in Python
[expression for variable_name in iterable if condition]
If you just want to extract elements that meet condition
, you don't need to process them with expression
; just use variable_name
.
[variable_name for variable_name in iterable if condition]
If you change if condition
to if not condition
, you can extract elements that do not satisfy condition
.
For more information about extracting, replacing, and converting list elements using list comprehensions, please refer to the following article.
- Extract, replace, convert elements of a list in Python
specific_string in target_string
returns True
if the target_string
contains a specific_string
. Use not in
for negation.
- in operator in Python [for list, string, dictionary, etc.]
l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']
l_in = [s for s in l if 'XXX' in s]
print[l_in]
# ['oneXXXaaa', 'twoXXXbbb']
l_in_not = [s for s in l if 'XXX' not in s]
print[l_in_not]
# ['three999aaa', '000111222']
Replace a specific string in a list
If you
want to replace the string of elements of a list, use the string method replace[]
for each element with the list comprehension.
If there is no string to be replaced, applying replace[]
will not change it, so you don't need to select an element with if condition
.
l_replace = [s.replace['XXX', 'ZZZ'] for s in l]
print[l_replace]
# ['oneZZZaaa', 'twoZZZbbb', 'three999aaa', '000111222']
To replace an entire element containing a specific string, extract it with in
and use conditional expressions [ternary operator], X if condition else Y
.
- Conditional expressions in Python
Use conditional expressions for the expression
part of list comprehensions.
- Extract, replace, convert elements of a list in Python
l_replace_all = ['ZZZ' if 'XXX' in s else s for s in l]
print[l_replace_all]
# ['ZZZ', 'ZZZ', 'three999aaa', '000111222']
It may be easier to understand and avoid mistakes with parentheses. Grammatically, it doesn't matter if there are parentheses or not.
[['ZZZ' if ['XXX' in s] else s] for s in l]
The string method startswith[]
returns True
if the string starts with the specific string.
l_start = [s for s in l if s.startswith['t']]
print[l_start]
# ['twoXXXbbb', 'three999aaa']
l_start_not = [s for s in l if not s.startswith['t']]
print[l_start_not]
# ['oneXXXaaa', '000111222']
The string method endswith[]
returns True
if the string ends with the specific string.
l_end = [s for s in l if s.endswith['aaa']]
print[l_end]
# ['oneXXXaaa', 'three999aaa']
l_end_not = [s for s in l if not s.endswith['aaa']]
print[l_end_not]
# ['twoXXXbbb', '000111222']
You can use the string methods isupper[]
, islower[]
to determine whether a string is all uppercase or all lowercase.
l_lower = [s for s in l if s.islower[]]
print[l_lower]
# ['three999aaa']
Convert uppercase and lowercase
If you want
to convert all letters to uppercase or lowercase, use the string methods upper[]
or lower[]
. Other methods are also provided, such as capitalize[]
to capitalize the first letter and swapcase[]
to swap upper and lower case.
Use conditional expressions to convert only those elements that satisfy the conditions.
l_upper_all = [s.upper[] for s in l]
print[l_upper_all]
# ['ONEXXXAAA', 'TWOXXXBBB', 'THREE999AAA', '000111222']
l_lower_to_upper = [s.upper[] if s.islower[] else s for s in l]
print[l_lower_to_upper]
# ['oneXXXaaa', 'twoXXXbbb', 'THREE999AAA', '000111222']
You can use the string methods isalpha[]
and isnumeric[]
to determine whether a string is all alphabetic or all numeric.
- Check if a string is numeric, alphabetic, alphanumeric, or ASCII
l_isalpha = [s for s in l if s.isalpha[]]
print[l_isalpha]
# ['oneXXXaaa', 'twoXXXbbb']
l_isnumeric = [s for s in l if s.isnumeric[]]
print[l_isnumeric]
# ['000111222']
Multiple conditions
You can also specify multiple conditions using and
and or
in the condition
part of the list comprehensions. You can also use negation not
.
If you use more than three conditions, enclosing each group with []
is safer since the results will differ
depending on the order.
l_multi = [s for s in l if s.isalpha[] and not s.startswith['t']]
print[l_multi]
# ['oneXXXaaa']
l_multi_or = [s for s in l if [s.isalpha[] and not s.startswith['t']] or ['bbb' in s]]
print[l_multi_or]
# ['oneXXXaaa', 'twoXXXbbb']
Regular expression [regex]
You can use regular expressions [regex] for more flexible processing.
- re — Regular expression operations — Python 3.9.7 documentation
re.match[]
returns a match
object if it matches, or None
if it does not match.
Since match
objects are evaluated as True
and None
as
False
, if you want to extract only the elements that match a regex pattern, you should apply re.match[]
to the condition
part of the list comprehensions as in the previous examples.
import re
l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']
l_re_match = [s for s in l if re.match['.*XXX.*', s]]
print[l_re_match]
# ['oneXXXaaa', 'twoXXXbbb']
You can also use re.sub[]
to replace the part that matches a regex pattern. If you want to extract and replace only matched elements, add if condition
.
l_re_sub_all = [re.sub['[.*]XXX[.*]', r'\2---\1', s] for s in l]
print[l_re_sub_all]
# ['aaa---one', 'bbb---two', 'three999aaa', '000111222']
l_re_sub = [re.sub['[.*]XXX[.*]', r'\2---\1', s] for s in l if re.match['.*XXX.*', s]]
print[l_re_sub]
# ['aaa---one', 'bbb---two']