How do you count repeated letters in python?
Grand Performance ComparisonScroll to the end for a TL;DR graph Show
Since I had "nothing better to do" (understand: I had just a lot of work), I decided to do a little performance contest. I assembled the most sensible or interesting answers and did some simple
Be aware that results might vary for different inputs, be it different length of the string or different number of distinct characters, or different average number of occurrences per character. Don't reinvent the wheelPython has made it simple for us. The taken from @oefe, nice find
¿Dictionary, comprende?Let's try using a simple I came up with this myself...
This will go through I came up with this myself, and so did @IrshadBhat
Better. But we still have to search through the string to count the occurrences. One search for each distinct character. That means we're going to read the string more than once. We can do better than that! But for that, we have to get off our declarativist high horse and descend into an imperative mindset. Exceptional codeAKA Gotta catch 'em all! inspired by @anthony
Well, it was worth a try. If you dig into the Python source (I can't say with certainty because I have never really done that), you will probably find that when you do made by @anthony
It does save some time, so one might be tempted to use this as some sort of optimization. INTERLUDE 1
You see? It catches INTERLUDE 2
Now back to counting letters and numbers and other characters. Playing catch-upExceptions aren't the way to go. You have to try hard to catch up with them, and when you finally do, they just throw up on you and then raise their eyebrows like it's your fault. Luckily brave fellows have paved our way so we can do away with exceptions, at least in this little exercise. The credit goes to @Usman
Almost as fast as the set-based dict comprehension. On larger inputs, this one would probably be even faster. Use the right tool for the jobFor at least mildly knowledgeable Python programmer, the first thing that comes to mind is probably hope @AlexMartelli won't crucify me for
Not that bad. I'd say the increase in execution time is a small tax to pay for the improved readability. However, we also favor performance, and we will not stop here. Let's take it further and prepopulate the dictionary with zeros. Then we won't have to check every time if the item is already there. hats off to @sqram
That's good. Over three times as fast as
Practicality beats purity (except when it's not really practical)Now a bit different kind of counter. @IdanK has come up with something interesting. Instead of using a hash table (a.k.a. dictionary a.k.a. As a side note, this technique is used in a linear-time sorting algorithm known as count sort or counting sort. It's very efficient, but the range of values being sorted is limited, since each value has to have its own counter. To sort a sequence of 32-bit integers, 4.3 billion counters would be needed.
Ouch! Not cool! Let's try and see how long it takes when we omit building the dictionary.
Still bad. But wait, what's
Considerably. Now let's put the dictionary back in.
Almost six times slower. Why does it take so long? Because when we
It probably won't get much better than that, at least not for such a small input. Plus it's only usable for 8-bit EASCII characters. О блять! And the winner is...
Yep. Even if you have to check every time whether That's all folksThis little exercise teaches us a lesson: when optimizing, always measure performance, ideally with your expected inputs. Optimize for the common case. Don't presume something is actually more efficient just because its asymptotic complexity is lower. And last but not least, keep readability in mind. Try to find a compromise between "computer-friendly" and "human-friendly". UPDATEI have been informed by @MartijnPieters of the function Help on built-in function _count_elements in module _collections: _count_elements(...) _count_elements(mapping, iterable) -> None Count elements in the iterable, updating the mappping This function is implemented in C, so it should be faster, but this extra performance comes at a price. The price is incompatibility with Python 2 and possibly even future versions, since we're using a private function. From the documentation:
That said, if you still want to save those 620 nanoseconds per iteration:
UPDATE 2: Large stringsI thought it might be a good idea to re-run the tests on some larger input, since a 16 character string is such a small input that all the possible solutions were quite comparably fast (1,000 iterations in under 30 milliseconds). I decided to use the complete works of Shakespeare as a testing corpus, which turned out to be quite a challenge (since it's over 5MiB in size 😅). I just used the first 100,000 characters of it, and I had to limit the number of iterations from 1,000,000 to 1,000.
Naïve Θ(n2) time dictionary comprehension simply doesn't work
Smart Θ(n) time dictionary comprehension works fine
Exceptions are clumsy and slow
Omitting the exception type check doesn't save time (since the exception is only thrown a few times)
Using
Leaving out the final conversion to
It doesn't matter how you construct the
If you convert
The
Final verdict: Use collections.Counter unless you cannot or don't want to :)Appendix: NumPyThe The way this method works is very different from all the above methods:
For the test input (first 100,000 characters of the complete works of Shakespeare), this method performs better than any other tested here. But note that on a different input, this approach might yield worse performance than the other methods. Pre-sortedness of the input and number of repetitions per element are important factors affecting the performance.
That considered, it seems reasonable to use Appendix 2: A somewhat useful plotI ran the 13 different methods above on prefixes of the complete works of Shakespeare and made an interactive plot. Note that in the plot, both prefixes and durations are displayed in logarithmic scale (the used prefixes are of exponentially increasing length). Click on the items in the legend to show/hide them in the plot. Click to open! How do you count repeating letters in Python?Python. string = "Great responsibility";. print("Duplicate characters in a given string: ");. #Counts each character present in the string.. for i in range(0, len(string)):. count = 1;. for j in range(i+1, len(string)):. if(string[i] == string[j] and string[i] != ' '):. count = count + 1;. How do you find the number of repeated letters in a string?Approach:. Find the occurrences of character 'a' in the given string.. Find the No. of repetitions which are required to find the 'a' occurrences.. Multiply the single string occurrences to the No. ... . If given n is not the multiple of given string size then we will find the 'a' occurrences in the remaining substring.. How do I find a repeated character in a string Python?First, we will find the duplicate characters of a string using the count method.. Initialize a string.. Initialize an empty list.. Loop over the string. Check whether the char frequency is greater than one or not using the count method.. |