Regex remove duplicate characters python

>>> import re
>>> re.sub[r'[[a-z]]\1+', r'\1', 'ffffffbbbbbbbqqq']
'fbq'

The [] around the [a-z] specify a capture group, and then the \1 [a backreference] in both the pattern and the replacement refer to the contents of the first capture group.

Thus, the regex reads "find a letter, followed by one or more occurrences of that same letter" and then entire found portion is replaced with a single occurrence of the found letter.

On side note...

Your example code for just a is actually buggy:

>>> re.sub['a*', 'a', 'aaabbbccc']
'abababacacaca'

You really would want to use 'a+' for your regex instead of 'a*', since the * operator matches "0 or more" occurrences, and thus will match empty strings in between two non-a characters, whereas the + operator matches "1 or more".

We are given a string and we need to remove all duplicates from it? What will be the output if the order of character matters? Examples:

Input : geeksforgeeks 
Output : efgkors

This problem has existing solution please refer Remove all duplicates from a given string. 

Method 1: 

Python3

from collections import OrderedDict

def removeDupWithoutOrder[str]:

    return "".join[set[str]]

def removeDupWithOrder[str]:

    return "".join[OrderedDict.fromkeys[str]]

if __name__ == "__main__":

    str = "geeksforgeeks"

    print ["Without Order = ",removeDupWithoutOrder[str]]

    print ["With Order = ",removeDupWithOrder[str]]

Output

Without Order =  foskerg
With Order =  geksfor

Time complexity: O[n]
Auxiliary Space: O[n]

Method 2: 

Python3

def removeDuplicate[str]:

    s=set[str]

    s="".join[s]

    print["Without Order:",s]

    t=""

    for i in str:

        if[i in t]:

            pass

        else:

            t=t+i

        print["With Order:",t]

str="geeksforgeeks"

removeDuplicate[str]

Output

Without Order: kogerfs
With Order: g
With Order: ge
With Order: ge
With Order: gek
With Order: geks
With Order: geksf
With Order: geksfo
With Order: geksfor
With Order: geksfor
With Order: geksfor
With Order: geksfor
With Order: geksfor
With Order: geksfor

Time complexity: O[n]
Auxiliary Space: O[n]

What do OrderedDict and fromkeys[] do ?

An OrderedDict is a dictionary that remembers the order of the keys that were inserted first. If a new entry overwrites an existing entry, the original insertion position is left unchanged.

For example see below code snippet : 

Python3

from collections import OrderedDict

ordinary_dictionary = {}

ordinary_dictionary['a'] = 1

ordinary_dictionary['b'] = 2

ordinary_dictionary['c'] = 3

ordinary_dictionary['d'] = 4

ordinary_dictionary['e'] = 5

print [ordinary_dictionary]    

ordered_dictionary = OrderedDict[]

ordered_dictionary['a'] = 1

ordered_dictionary['b'] = 2

ordered_dictionary['c'] = 3

ordered_dictionary['d'] = 4

ordered_dictionary['e'] = 5

print [ordered_dictionary]    

Output

{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
OrderedDict[[['a', 1], ['b', 2], ['c', 3], ['d', 4], ['e', 5]]]

Time complexity: O[n]
Auxiliary Space: O[1]

fromkeys[] creates a new dictionary with keys from seq and values set to value and returns list of keys, fromkeys[seq[, value]] is the syntax for fromkeys[] method. Parameters :

  • seq : This is the list of values which would be used for dictionary keys preparation.
  • value : This is optional, if provided then value would be set to this value.

For example see below code snippet : 

Python3

from collections import OrderedDict

seq = ['name', 'age', 'gender']

dict = OrderedDict.fromkeys[seq]

print [str[dict]]

dict = OrderedDict.fromkeys[seq, 10]

print [str[dict]]      

Output

OrderedDict[[['name', None], ['age', None], ['gender', None]]]
OrderedDict[[['name', 10], ['age', 10], ['gender', 10]]]

Time complexity: O[n]
Auxiliary Space: O[1]

This article is contributed by Shashank Mishra [Gullu]. If you like GeeksforGeeks and would like to contribute, you can also write an article using write.geeksforgeeks.org or mail your article to . See your article appearing on the GeeksforGeeks main page and help other Geeks. 


How do you remove duplicate characters in Python?

Program to remove duplicate characters from a given string in Python.
d := a dictionary where keys are stored in order by their insertion order..
for each character c in s, do. if c is not present in d, then. d[c] := 0. d[c] := d[c] + 1..
join the keys one after another in proper order to make the output string and return..

How do you remove all duplicates from a string in Python?

Given a string S, the task is to remove all the duplicates in the given string..
Sort the elements..
Now in a loop, remove duplicates by comparing the current character with previous character..
Remove extra characters at the end of the resultant string..

How do you find duplicate characters in a string in python?

Python.
string = "Great responsibility";.
print["Duplicate characters in a given string: "];.
#Counts each character present in the string..
for i in range[0, len[string]]:.
count = 1;.
for j in range[i+1, len[string]]:.
if[string[i] == string[j] and string[i] != ' ']:.
count = count + 1;.

How do I get rid of consecutive duplicates in Python?

How to remove consecutive duplicates from a string?.
Iterate over each character in the string..
For each character check if it's the same as the previous character [stored in a variable]. If it is, skip to the next iteration, else add the character to our result string..
Return the result string..

Chủ Đề