Điều kiện tiên quyết: Python Regex
Bạn có thể quen thuộc với việc tìm kiếm văn bản bằng cách nhấn Ctrl-F và gõ vào các từ mà bạn đang tìm kiếm. Các biểu thức chính quy đi thêm một bước nữa: chúng cho phép bạn chỉ định một mẫu văn bản để tìm kiếm. Các biểu thức được gọi là Regexes, là mô tả cho một mẫu văn bản. Ví dụ: a \ d trong một regex là viết tắt của một ký tự chữ số - nghĩa là, bất kỳ số đơn số 0 đến 9.
Regular expressions, called regexes for short, are descriptions for a pattern of text. For example, a \d in a regex stands for a digit character — that is, any single numeral
0 to 9.
- Sau Regex được sử dụng trong Python để khớp một chuỗi ba số, dấu gạch nối, ba số nữa, một dấu gạch nối khác và bốn số .________ 0
- Biểu cảm thường xuyên có thể tinh vi hơn nhiều. Ví dụ: thêm 3 trong dấu ngoặc xoăn [{3}] sau khi một mẫu giống như nói, khớp với mẫu này ba lần. Vì vậy, Regex
\d{3}-\d{3}-\d{4}
ngắn hơn một chút[Nó phù hợp với định dạng số điện thoại chính xác.]
Tạo đối tượng Regex
Tất cả các chức năng regex trong Python đều nằm trong mô -đun RE
import re
Để tạo một đối tượng Regex khớp với mẫu số điện thoại, hãy nhập phần sau vào vỏ tương tác.
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
Bây giờ biến phonenumregex chứa một đối tượng regex.
Phù hợp với các đối tượng regex
Phương thức tìm kiếm đối tượng Regex [] tìm kiếm chuỗi nó được truyền cho bất kỳ trận đấu nào cho regex. Các đối tượng khớp có phương thức nhóm [] sẽ trả về văn bản phù hợp thực tế từ chuỗi được tìm kiếm.
import re2
import re3
import re4
import re5
import re6
import re7
import re8
import re9
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']1
import re5
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']3
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']4
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']6
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']7
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']8
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']9
Phone number found: 415-555-42420
Output:
Phone number found: 415-555-4242
Các bước phù hợp với biểu thức thông thường
Mặc dù có một số bước để sử dụng các biểu thức thông thường trong Python, mỗi bước khá đơn giản.
- Nhập mô -đun regex với nhập RE.
- Tạo một đối tượng regex với hàm re.compile []. [Hãy nhớ sử dụng một chuỗi thô.]
- Chuyển chuỗi bạn muốn tìm kiếm vào phương thức tìm kiếm đối tượng REGEX []. Điều này trả về một đối tượng khớp.
- Gọi phương thức nhóm đối tượng đối tượng [] để trả về một chuỗi của văn bản phù hợp thực tế.
- Đối tượng phù hợp: Giả sử bạn muốn tách mã vùng khỏi phần còn lại của số điện thoại. Thêm dấu ngoặc đơn sẽ tạo các nhóm trong regex: [\ d \ d \ d]-[\ d \ d \ d- \ d \ d \ d \ d]. Sau đó, bạn có thể sử dụng phương thức đối tượng nhóm [] để lấy văn bản phù hợp chỉ từ một nhóm.Say you want to separate the area code from the rest of the phone number. Adding parentheses will create groups in the regex: [\d\d\d]-[\d\d\d-\d\d\d\d].
Then you can use the group[] match object method to grab the matching text from just one group.
import re
2import re
3import re
4import re
5import re
6import re
7import re
8import re
9phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
0phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
1import re
5phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
3phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
4phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
0phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
6'415'
6'415'
7'415'
8OUTPUT:
'415'
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
6phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
7phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
8phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
9Phone number found: 415-555-4242
0If you would like to retrieve all the groups at once, use the groups[], method—note the plural form for the name.import re
2import re
3import re
4import re
5import re
6import re
7import re
8import re
9phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
0phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
1import re
5phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
3phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
4phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
0phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
6'555-4242'
4OUTPUT:
['415', '555-4242']
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
6phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
7phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
8phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
9Phone number found: 415-555-4242
0 mo.groups[] will return a tuple of multiple values, you can use the multiple-assignment trick to assign each value to a separate variable, as in the following areaCode, mainNumber = mo.groups[] line.import re
2import re
3import re
4import re
5import re
6import re
7import re
8import re
9phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
0phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
1import re
5phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
3phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
4phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
0phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
6phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
7phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
8phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
9Phone number found: 415-555-4242
0phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
6'Batman'
3OUTPUT:
'555-4242'
- Các bước phù hợp với biểu thức thông thường Parentheses have a special meaning in regular expressions, but what do you do if you need to match a parenthesis in your text. For instance, maybe the phone numbers you are trying to match have the area code set in parentheses. In this
case, you need to escape the [ and ] characters with a backslash. Enter the following into the interactive shell:
import re
2import re
3import re
4import re
5import re
6import re
7import re
8import re
9phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
0phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
1import re
5phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
3phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
4phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
0phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
6'415'
6'415'
7'415'
8OUTPUT:
'[415]'
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
6phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
7phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
8phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
9Phone number found: 415-555-4242
0
Nhóm với dấu ngoặc đơn
Các bước phù hợp với biểu thức thông thường
Mặc dù có một số bước để sử dụng các biểu thức thông thường trong Python, mỗi bước khá đơn giản.
Nhập mô -đun regex với nhập RE.
import re2
import re3
import re4
import re5
import re6
import re7
import re8
import re9
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']1
import re5
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']3
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']4
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']6
\d{3}-\d{3}-\d{4}27
OUTPUT:
'Batman'
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
6phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
7phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
8 phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']
9 Phone number found: 415-555-4242
0
Nếu bạn có một nhóm mà bạn muốn lặp lại một số lần cụ thể, hãy theo dõi nhóm trong Regex của bạn với một số trong dấu ngoặc xoăn. Ví dụ: regex [ha] {3} sẽ khớp với chuỗi ‘hahaha, nhưng nó sẽ không khớp với‘ haha, vì phần sau chỉ có hai lần lặp lại của nhóm [ha].
Thay vì một số, bạn có thể chỉ định một phạm vi bằng cách viết tối thiểu, dấu phẩy và tối đa giữa các dấu ngoặc xoăn. Ví dụ, regex [ha] {3, 5} sẽ phù hợp với ‘hahaha,‘ hahahaha, và ‘hahahahaha.
Bạn cũng có thể bỏ số thứ nhất hoặc thứ hai trong các dấu ngoặc xoăn để lại mức tối thiểu hoặc tối đa không bị ràng buộc. Ví dụ: [ha] {3,} sẽ khớp với ba hoặc nhiều trường hợp của nhóm [ha], trong khi [ha] {, 5} sẽ khớp với từ 0 đến năm trường hợp. Dấu ngoặc xoăn có thể giúp làm cho biểu thức thông thường của bạn ngắn hơn. Hai biểu thức chính quy này phù hợp với các mẫu giống hệt nhau:
\d{3}-\d{3}-\d{4}0
Và hai biểu thức thông thường này cũng phù hợp với các mẫu giống hệt nhau:
\d{3}-\d{3}-\d{4}1
Nhập phần sau vào vỏ tương tác:
import re2
import re3
\d{3}-\d{3}-\d{4}30
import re5
import re6
import re7
import re8
\d{3}-\d{3}-\d{4}35
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0
\d{3}-\d{3}-\d{4}21
import re5
\d{3}-\d{3}-\d{4}39
\d{3}-\d{3}-\d{4}40
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']6
\d{3}-\d{3}-\d{4}27
OUTPUT:
\d{3}-\d{3}-\d{4}2
import re2
import re3
\d{3}-\d{3}-\d{4}30
import re5
import re6
import re7
import re8
\d{3}-\d{3}-\d{4}35
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0
\d{3}-\d{3}-\d{4}21
import re5
\d{3}-\d{3}-\d{4}39
\d{3}-\d{3}-\d{4}40
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']6
\d{3}-\d{3}-\d{4}62
OUTPUT:
\d{3}-\d{3}-\d{4}3
\d{3}-\d{3}-\d{4}53
import re5
\d{3}-\d{3}-\d{4}39
\d{3}-\d{3}-\d{4}56
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0____25
import re5
\d{3}-\d{3}-\d{4}60
Ở đây, [ha] {3} khớp với ‘hahaha, nhưng không phải‘ ha. Vì nó không phù hợp với ‘ha, search [] không trả về không.
Phù hợp tùy chọn với dấu hỏi? character flags the group that precedes it as an optional part of the pattern. For example, enter the following into the interactive shell:
import re2
import re3
\d{3}-\d{3}-\d{4}30
import re5
import re6
import re7
import re8
\d{3}-\d{3}-\d{4}35
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0
\d{3}-\d{3}-\d{4}21
import re5
\d{3}-\d{3}-\d{4}39
\d{3}-\d{3}-\d{4}40
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']6
\d{3}-\d{3}-\d{4}27
OUTPUT:
'Batman'
import re2
import re3
\d{3}-\d{3}-\d{4}30
import re5
import re6
import re7
import re8
\d{3}-\d{3}-\d{4}35
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0
\d{3}-\d{3}-\d{4}21
import re5
\d{3}-\d{3}-\d{4}39
\d{3}-\d{3}-\d{4}40
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']6
\d{3}-\d{3}-\d{4}94
OUTPUT:
\d{3}-\d{3}-\d{4}5
\d{3}-\d{3}-\d{4}53
import re5
\d{3}-\d{3}-\d{4}39
\d{3}-\d{3}-\d{4}56
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0____25
import re5
\d{3}-\d{3}-\d{4}60
You can think of the ? as saying, “Match zero or one of the group preceding this question mark.”
If you need to match an actual question mark character, escape it with \?.
Ở đây, [ha] {3} khớp với ‘hahaha, nhưng không phải‘ ha. Vì nó không phù hợp với ‘ha, search [] không trả về không.
Phù hợp tùy chọn với dấu hỏi
import re2
import re3
\d{3}-\d{3}-\d{4}30
import re5
import re6
import re7
import re8
\d{3}-\d{3}-\d{4}35
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0
\d{3}-\d{3}-\d{4}21
import re5
\d{3}-\d{3}-\d{4}39
\d{3}-\d{3}-\d{4}40
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']6
\d{3}-\d{3}-\d{4}27
OUTPUT:
'Batman'
import re2
import re3
\d{3}-\d{3}-\d{4}30
import re5
import re6
import re7
import re8
\d{3}-\d{3}-\d{4}35
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0
\d{3}-\d{3}-\d{4}21
import re5
\d{3}-\d{3}-\d{4}39
\d{3}-\d{3}-\d{4}40
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']6
\d{3}-\d{3}-\d{4}94
OUTPUT:
\d{3}-\d{3}-\d{4}5
import re2
import re3
\d{3}-\d{3}-\d{4}30
import re5
import re6
import re7
import re8
\d{3}-\d{3}-\d{4}35
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0
\d{3}-\d{3}-\d{4}21
import re5
\d{3}-\d{3}-\d{4}39
\d{3}-\d{3}-\d{4}40
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']6
import re42
OUTPUT:
\d{3}-\d{3}-\d{4}8
\d{3}-\d{3}-\d{4}53
import re5
\d{3}-\d{3}-\d{4}39
\d{3}-\d{3}-\d{4}56
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0____25
import re5
\d{3}-\d{3}-\d{4}60
Ở đây, [ha] {3} khớp với ‘hahaha, nhưng không phải‘ ha. Vì nó không phù hợp với ‘ha, search [] không trả về không.
Phù hợp tùy chọn với dấu hỏi
Đôi khi có một mẫu mà bạn muốn chỉ phù hợp tùy chọn. Đó là, Regex nên tìm một trận đấu cho dù bit văn bản đó có ở đó hay không. Các ? ký tự đánh dấu nhóm đi trước nó như một phần tùy chọn của mẫu. Ví dụ: nhập phần sau vào vỏ tương tác:“match one or more.” Unlike the star, which does not require its group to appear in the matched string, the group preceding a plus must appear at least once. It is not optional. Enter the following into the interactive shell, and compare it with the star regexes in the previous section:
import re2
import re3
\d{3}-\d{3}-\d{4}30
import re5
import re6
import re7
import re8
\d{3}-\d{3}-\d{4}35
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0
\d{3}-\d{3}-\d{4}21
import re5
\d{3}-\d{3}-\d{4}39
\d{3}-\d{3}-\d{4}40
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']6
\d{3}-\d{3}-\d{4}27
OUTPUT:
\d{3}-\d{3}-\d{4}5
import re2
import re3
\d{3}-\d{3}-\d{4}30
import re5
import re6
import re7
import re8
\d{3}-\d{3}-\d{4}35
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0
\d{3}-\d{3}-\d{4}21
import re5
\d{3}-\d{3}-\d{4}39
\d{3}-\d{3}-\d{4}40
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']6
\d{3}-\d{3}-\d{4}94
OUTPUT:
\d{3}-\d{3}-\d{4}8
\d{3}-\d{3}-\d{4}53
import re5
\d{3}-\d{3}-\d{4}39
\d{3}-\d{3}-\d{4}56
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0____25
import re5
\d{3}-\d{3}-\d{4}60
import re2
import re3
\d{3}-\d{3}-\d{4}65
import re5
import re6
import re7
import re8
import re50
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0
import re36
import re5
\d{3}-\d{3}-\d{4}74
\d{3}-\d{3}-\d{4}75
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']0
import re5__25
phoneNumRegex = re.compile[r'\d\d\d-\d\d\d-\d\d\d\d']6
import re93
OUTPUT:
import re1
Regex Bat [WO]+Man sẽ không khớp với chuỗi Cuộc phiêu lưu của Batman, vì ít nhất một WO được yêu cầu bởi dấu cộng.
Nếu bạn cần khớp một ký tự dấu cộng thực tế, hãy đặt tiền tố dấu cộng với dấu gạch chéo ngược để thoát khỏi nó: \+.