Extract words after a specific word in python

r'((?:\S+\s+){0,3}\bwhite\b\s*(?:\S+\s+){0,3})

The result of above regex for below sentence is

sentence = This is a white floral garment.
result = This is a white floral

But I want the result as

wanted result = This is a white floral garment

I want 3 words before and after the white word. If there are not three word after white than at least get all which are present there.

asked Apr 6, 2018 at 8:50

Extract words after a specific word in python

2

You can fix it by adding a word boundary \b to the subattern for words after white and making the space optional

((?:\S+\s+){0,3}\bwhite\b\s*(?:\S+\b\s*){0,3})

Demo

answered Apr 6, 2018 at 8:58

mrzasamrzasa

22.5k11 gold badges53 silver badges93 bronze badges

try this

((?:\S+\s+){3,}\bwhite\b\s*(?:\S+\b\s*){3,})

(?:\S+\s+){3,}\bwhite\b minimum 3 words before the word 'white'

s*(?:\S+\b\s*){3,} after minimum 3 word

Demo

answered Apr 6, 2018 at 9:13

AssenKhanAssenKhan

5624 silver badges15 bronze badges

Perhaps you could match one or more word characters \w+ followed by a whitespace character \s and repeat that 3 times {3} to match 3 words before the word "white".

Then match "white" and after that match a whitespace character and one or more times a word character and repeat that 0 - 3 times {0,3} so if there are 3 or less words following you would match that.

(?:\w+\s){3}white(?:\s\w+){0,3}

answered Apr 6, 2018 at 9:57

The fourth birdThe fourth bird

138k16 gold badges45 silver badges65 bronze badges

I used the above, but had issues if a comma came after the target word.

Playing around with the regex, this seemed to work on sentences like:

I want a white floral garment, etc.

((?:\S+\s+){0,3}\bwhite\b\w*(?:\W+\b\w*){0,3})

answered Aug 8 at 19:50

Extract words after a specific word in python

We sometimes come through situations where we require to get all the words present in the string, this can be a tedious task done using the native method. Hence having shorthands to perform this task is always useful. Additionally, this article also includes the cases in which punctuation marks have to be ignored.
Method #1 : Using split() 
Using the split function, we can split the string into a list of words and this is the most generic and recommended method if one wished to accomplish this particular task. But the drawback is that it fails in cases the string contains punctuation marks.
 

Python3

test_string = "Geeksforgeeks is best Computer Science Portal"

print ("The original string is : " +  test_string)

res = test_string.split()

print ("The list of words is : " +  str(res))

Output: 
The original string is : Geeksforgeeks is best Computer Science Portal 
The list of words is : [‘Geeksforgeeks’, ‘is’, ‘best’, ‘Computer’, ‘Science’, ‘Portal’] 
 

  
Method #2 : Using regex( findall() ) 
In the cases which contain all the special characters and punctuation marks, as discussed above, the conventional method of finding words in string using split can fail and hence requires regular expressions to perform this task. findall function returns the list after filtering the string and extracting words ignoring punctuation marks.
 

Python3

import re

test_string = "Geeksforgeeks,    is best @# Computer Science Portal.!!!"

print ("The original string is : " +  test_string)

res = re.findall(r'\w+', test_string)

print ("The list of words is : " +  str(res))

Output: 
The original string is : Geeksforgeeks, is best @# Computer Science Portal.!!! 
The list of words is : [‘Geeksforgeeks’, ‘is’, ‘best’, ‘Computer’, ‘Science’, ‘Portal’] 
 

  
Method #3 : Using regex() + string.punctuation 
This method also used regular expressions, but string function of getting all the punctuations is used to ignore all the punctuation marks and get the filtered result string.
 

Python3

import re

import string

test_string = "Geeksforgeeks,    is best @# Computer Science Portal.!!!"

print ("The original string is : " +  test_string)

res = re.sub('['+string.punctuation+']', '', test_string).split()

print ("The list of words is : " +  str(res))

Output: 
The original string is : Geeksforgeeks, is best @# Computer Science Portal.!!! 
The list of words is : [‘Geeksforgeeks’, ‘is’, ‘best’, ‘Computer’, ‘Science’, ‘Portal’] 
 


How do I extract a word from a specific word in python?

Using regular expressions to extract any specific word We can use search() method from re module to find the first occurrence of the word and then we can obtain the word using slicing. re.search() method will take the word to be extracted in regular expression form and the string as input and and returns a re.

How do I extract certain text from a string in python?

Extract a substring from a string in Python (position, regex).
Extract a substring by specifying the position and number of characters. Extract a character by index. ... .
Extract a substring with regular expressions: re.search() , re.findall().
Regular expression pattern examples. Wildcard-like patterns..

How do I extract words from a string?

To extract words from a string vector, we can use word function of stringr package. For example, if we have a vector called x that contains 100 words then first 20 words can be extracted by using the command word(x,start=1,end=20,sep=fixed(" ")).

How do you print each word in a string in python?

Approach: Split the string using split() function. Iterate in the words of a string using for loop. Calculate the length of the word using len() function. If the length is even, then print the word.