Hướng dẫn can python open docx files? - python có thể mở tệp docx không?

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

6 cho phép bạn tạo các tài liệu mới cũng như thay đổi các tài liệu hiện có. Trên thực tế, nó chỉ cho phép bạn thay đổi các tài liệu hiện có; Nó chỉ là nếu bạn bắt đầu với một tài liệu không có nội dung nào, ban đầu nó có thể cảm thấy giống như bạn tạo ra một tài liệu từ đầu.

Nội dung chính Show

Mở một tài liệu
Thực sự mở một tài liệu
Mở một tài liệu giống như tệp
Tài liệu PDF
Trích xuất văn bản từ PDFS
Giải mã PDFS
Trang sao chép
Lớp phủ trang
Mã hóa PDFS
Dự án: Kết hợp các trang chọn lọc từ nhiều tệp PDF
Bước 1: Tìm tất cả các tệp PDF
Bước 2: Mở từng tệp PDF
Bước 3: Thêm từng trang
Bước 4: Lưu kết quả
Ý tưởng cho các chương trình tương tự
Tài liệu từ
Đọc tài liệu từ
Nhận toàn bộ văn bản từ tệp .docx
Đoạn đồ tạo kiểu và các đối tượng chạy
Tạo tài liệu Word với kiểu chữ Nondefault
Chạy các thuộc tính
Viết tài liệu từ
Thêm các tiêu đề
Thêm dòng và ngắt trang
Thêm hình ảnh
Bản tóm tắt
Thực hành câu hỏi
Thực hành các dự án
PDF hoang tưởng
Lời mời tùy chỉnh làm tài liệu từ
Bộ ngắt mật khẩu PDF Brute-Force
Python có thể đọc các tệp Word không?
Làm cách nào để mở một tài liệu từ trong Python?
Tài liệu DOCX trong Python là gì?

Đặc điểm này là một trong những mạnh mẽ. Rất nhiều tài liệu trông được xác định bởi các phần còn lại khi bạn xóa tất cả các nội dung. Những thứ như phong cách và tiêu đề trang và chân trang được chứa tách biệt với nội dung chính, cho phép bạn đặt nhiều tùy chỉnh trong tài liệu bắt đầu của bạn sau đó xuất hiện trong tài liệu bạn sản xuất.

Hãy cùng bước đi qua các bước để tạo một tài liệu một ví dụ tại một thời điểm, bắt đầu với hai trong số những điều chính bạn có thể làm với một tài liệu, mở nó và lưu nó.

Mở một tài liệu

Cách đơn giản nhất để bắt đầu là mở một tài liệu mới mà không cần chỉ định một tệp để mở:

from docx import Document

document = Document()
document.save('test.docx')

Điều này tạo ra một tài liệu mới từ mẫu mặc định tích hợp và lưu nó không thay đổi vào một tệp có tên ‘test.docx. Cái gọi là mẫu mặc định, thực sự chỉ là một tệp Word không có nội dung, được lưu trữ với gói

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

6 được cài đặt. Nó rất giống với bạn nhận được bằng cách chọn mẫu tài liệu Word sau khi chọn tệp Word Word> Mới từ mục menu mẫu.File > New from Template… menu item.

Thực sự mở một tài liệu

Nếu bạn muốn kiểm soát nhiều hơn đối với tài liệu cuối cùng hoặc nếu bạn muốn thay đổi một tài liệu hiện có, bạn cần mở một tài liệu bằng tên tệp:

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

Những điều cần lưu ý:

Bạn có thể mở bất kỳ từ 2007 hoặc sau đó tệp theo cách này (các tệp .doc từ Word 2003 và trước đó đã giành được công việc). Mặc dù bạn có thể không thể thao túng tất cả các nội dung, nhưng bất cứ thứ gì đã có trong đó sẽ tải và tiết kiệm tốt. Bộ tính năng vẫn đang được xây dựng, vì vậy bạn không thể thêm hoặc thay đổi những thứ như tiêu đề hoặc chú thích, nhưng nếu tài liệu có chúng
```
f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)
```
6 đủ lịch sự để để chúng một mình và đủ thông minh để lưu chúng mà không thực sự hiểu chúng là gì .

Nếu bạn sử dụng cùng một tên tệp để mở và lưu tệp,

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

6 sẽ ngoan ngoãn ghi đè lên tệp gốc mà không cần nhìn trộm. Bạn muốn đảm bảo rằng, những gì bạn dự định.

Mở một tài liệu giống như tệp

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

6 có thể mở một tài liệu từ một đối tượng giống như tệp. Nó cũng có thể lưu vào một đối tượng giống như tệp. Điều này có thể hữu ích khi bạn muốn lấy nguồn hoặc tài liệu đích qua kết nối mạng hoặc từ cơ sở dữ liệu và don sắt muốn (hoặc aren được phép) tương tác với hệ thống tệp. Trong thực tế, điều này có nghĩa là bạn có thể truyền một tệp mở hoặc chuỗi Stringio/byteo để mở hoặc lưu tài liệu như vậy:

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

Tham số chế độ mở tệp

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

1 được yêu cầu trên tất cả các hệ điều hành. Nó mặc định là

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

2, đôi khi đủ, nhưng ’B, (chọn chế độ nhị phân) được yêu cầu trên Windows và ít nhất một số phiên bản Linux để cho phép ZipFile mở tệp.

Được rồi, vì vậy bạn đã mở một tài liệu và khá chắc chắn rằng bạn có thể lưu nó ở đâu đó sau này. Bước tiếp theo là có được một số nội dung trong đó

Các tài liệu PDF và Word là các tệp nhị phân, giúp chúng phức tạp hơn nhiều so với các tệp PlainText. Ngoài văn bản, họ lưu trữ rất nhiều thông tin phông chữ, màu sắc và bố cục. Nếu bạn muốn các chương trình của mình đọc hoặc ghi vào PDF hoặc tài liệu từ, bạn sẽ cần phải làm nhiều hơn là chỉ cần chuyển tên tệp của họ sang

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

May mắn thay, có các mô -đun Python giúp bạn dễ dàng tương tác với PDF và tài liệu từ. Chương này sẽ bao gồm hai mô-đun như vậy: PYPDF2 và Python-docx.

Tài liệu PDF

PDF là viết tắt của định dạng tài liệu di động và sử dụng tiện ích mở rộng tệp .pdf. Mặc dù PDF hỗ trợ nhiều tính năng, chương này sẽ tập trung vào hai điều bạn sẽ làm thường xuyên nhất với chúng: đọc nội dung văn bản từ PDF và chế tạo các tệp PDF mới từ các tài liệu hiện có. stands for Portable Document Format and uses the .pdf file extension. Although PDFs support many features, this chapter will focus on the two things you’ll be doing most often with them: reading text content from PDFs and crafting new PDFs from existing documents.

Mô -đun mà bạn sẽ sử dụng để làm việc với PDFS là PYPDF2. Để cài đặt nó, chạy

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

4 từ dòng lệnh. Tên mô -đun này nhạy cảm, vì vậy hãy đảm bảo Y là chữ thường và mọi thứ khác là chữ hoa. .y is lowercase and everything else is uppercase. (Check out Appendix A for full details about installing third-party modules.) If the module was installed correctly, running

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

5 in the interactive shell shouldn’t display any errors.

Trích xuất văn bản từ PDFS

PYPDF2 không có cách trích xuất hình ảnh, biểu đồ hoặc phương tiện khác từ các tài liệu PDF, nhưng nó có thể trích xuất văn bản và trả về nó dưới dạng chuỗi Python. Để bắt đầu tìm hiểu cách thức hoạt động của PYPDF2, chúng tôi sẽ sử dụng nó trên ví dụ PDF được hiển thị trong Hình & NBSP; 13-1.

Hướng dẫn can python open docx files? - python có thể mở tệp docx không?

Hình & nbsp; 13-1. & Nbsp; trang pdf mà chúng tôi sẽ trích xuất văn bản từ

Tải xuống bản PDF này từ http://nostarch.com/automatestuff/ và nhập phần sau vào vỏ tương tác:http://nostarch.com/automatestuff/, and enter the following into the interactive shell:

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

Đầu tiên, nhập mô -đun

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

6. Sau đó mở meetminutes.pdf trong chế độ nhị phân đọc và lưu trữ nó trong

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

7. Để có được một đối tượng

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

8 đại diện cho PDF này, hãy gọi

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

9 và vượt qua nó ____37. Lưu trữ đối tượng

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

8 này trong

   >>> import PyPDF2
   >>> pdfReader = PyPDF2.PdfFileReader(open('encrypted.pdf', 'rb'))
❶ >>> pdfReader.isEncrypted
   True
   >>> pdfReader.getPage(0)
❷ Traceback (most recent call last):
     File "", line 1, in 
       pdfReader.getPage()
     --snip--
     File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1173, in getObject
       raise utils.PdfReadError("file has not been decrypted")
   PyPDF2.utils.PdfReadError: file has not been decrypted
❸ >>> pdfReader.decrypt('rosebud')
   1
   >>> pageObj = pdfReader.getPage(0)

2.meetingminutes.pdf in read binary mode and store it in

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

7. To get a

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

8 object that represents this PDF, call

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

9 and pass it

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

7. Store this

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

8 object in

   >>> import PyPDF2
   >>> pdfReader = PyPDF2.PdfFileReader(open('encrypted.pdf', 'rb'))
❶ >>> pdfReader.isEncrypted
   True
   >>> pdfReader.getPage(0)
❷ Traceback (most recent call last):
     File "", line 1, in 
       pdfReader.getPage()
     --snip--
     File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1173, in getObject
       raise utils.PdfReadError("file has not been decrypted")
   PyPDF2.utils.PdfReadError: file has not been decrypted
❸ >>> pdfReader.decrypt('rosebud')
   1
   >>> pageObj = pdfReader.getPage(0)

Tổng số trang trong tài liệu được lưu trữ trong thuộc tính

   >>> import PyPDF2
   >>> pdfReader = PyPDF2.PdfFileReader(open('encrypted.pdf', 'rb'))
❶ >>> pdfReader.isEncrypted
   True
   >>> pdfReader.getPage(0)
❷ Traceback (most recent call last):
     File "", line 1, in 
       pdfReader.getPage()
     --snip--
     File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1173, in getObject
       raise utils.PdfReadError("file has not been decrypted")
   PyPDF2.utils.PdfReadError: file has not been decrypted
❸ >>> pdfReader.decrypt('rosebud')
   1
   >>> pageObj = pdfReader.getPage(0)

3 của đối tượng

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

8. Ví dụ PDF có 19 trang, nhưng hãy để trích xuất văn bản chỉ từ trang đầu tiên.

Để trích xuất văn bản từ một trang, bạn cần lấy một đối tượng

   >>> import PyPDF2
   >>> pdfReader = PyPDF2.PdfFileReader(open('encrypted.pdf', 'rb'))
❶ >>> pdfReader.isEncrypted
   True
   >>> pdfReader.getPage(0)
❷ Traceback (most recent call last):
     File "", line 1, in 
       pdfReader.getPage()
     --snip--
     File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1173, in getObject
       raise utils.PdfReadError("file has not been decrypted")
   PyPDF2.utils.PdfReadError: file has not been decrypted
❸ >>> pdfReader.decrypt('rosebud')
   1
   >>> pageObj = pdfReader.getPage(0)

5, đại diện cho một trang duy nhất của PDF, từ đối tượng

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

8. Bạn có thể nhận được một đối tượng

   >>> import PyPDF2
   >>> pdfReader = PyPDF2.PdfFileReader(open('encrypted.pdf', 'rb'))
❶ >>> pdfReader.isEncrypted
   True
   >>> pdfReader.getPage(0)
❷ Traceback (most recent call last):
     File "", line 1, in 
       pdfReader.getPage()
     --snip--
     File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1173, in getObject
       raise utils.PdfReadError("file has not been decrypted")
   PyPDF2.utils.PdfReadError: file has not been decrypted
❸ >>> pdfReader.decrypt('rosebud')
   1
   >>> pageObj = pdfReader.getPage(0)

5 bằng cách gọi phương thức

   >>> import PyPDF2
   >>> pdfReader = PyPDF2.PdfFileReader(open('encrypted.pdf', 'rb'))
❶ >>> pdfReader.isEncrypted
   True
   >>> pdfReader.getPage(0)
❷ Traceback (most recent call last):
     File "", line 1, in 
       pdfReader.getPage()
     --snip--
     File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1173, in getObject
       raise utils.PdfReadError("file has not been decrypted")
   PyPDF2.utils.PdfReadError: file has not been decrypted
❸ >>> pdfReader.decrypt('rosebud')
   1
   >>> pageObj = pdfReader.getPage(0)

8 trên đối tượng

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

8 và chuyển nó số trang của trang mà bạn quan tâm đến trong trường hợp của chúng tôi, 0.

PYPDF2 sử dụng chỉ mục dựa trên không để nhận trang: trang đầu tiên là trang 0, phần thứ hai là giới thiệu, v.v. Đây luôn là trường hợp, ngay cả khi các trang được đánh số khác nhau trong tài liệu. Ví dụ: giả sử PDF của bạn là một đoạn trích ba trang từ một báo cáo dài hơn và các trang của nó được đánh số 42, 43 và 44. Để có được trang đầu tiên của tài liệu này, bạn sẽ muốn gọi

   >>> import PyPDF2
   >>> pdf1File = open('meetingminutes.pdf', 'rb')
   >>> pdf2File = open('meetingminutes2.pdf', 'rb')
❶ >>> pdf1Reader = PyPDF2.PdfFileReader(pdf1File)
❷ >>> pdf2Reader = PyPDF2.PdfFileReader(pdf2File)
❸ >>> pdfWriter = PyPDF2.PdfFileWriter()

   >>> for pageNum in range(pdf1Reader.numPages):
❹         pageObj = pdf1Reader.getPage(pageNum)
❺         pdfWriter.addPage(pageObj)

   >>> for pageNum in range(pdf2Reader.numPages):
❻         pageObj = pdf2Reader.getPage(pageNum)
❼         pdfWriter.addPage(pageObj)

❽ >>> pdfOutputFile = open('combinedminutes.pdf', 'wb')
   >>> pdfWriter.write(pdfOutputFile)
   >>> pdfOutputFile.close()
   >>> pdf1File.close()
   >>> pdf2File.close()

0, chứ không phải

   >>> import PyPDF2
   >>> pdf1File = open('meetingminutes.pdf', 'rb')
   >>> pdf2File = open('meetingminutes2.pdf', 'rb')
❶ >>> pdf1Reader = PyPDF2.PdfFileReader(pdf1File)
❷ >>> pdf2Reader = PyPDF2.PdfFileReader(pdf2File)
❸ >>> pdfWriter = PyPDF2.PdfFileWriter()

   >>> for pageNum in range(pdf1Reader.numPages):
❹         pageObj = pdf1Reader.getPage(pageNum)
❺         pdfWriter.addPage(pageObj)

   >>> for pageNum in range(pdf2Reader.numPages):
❻         pageObj = pdf2Reader.getPage(pageNum)
❼         pdfWriter.addPage(pageObj)

❽ >>> pdfOutputFile = open('combinedminutes.pdf', 'wb')
   >>> pdfWriter.write(pdfOutputFile)
   >>> pdfOutputFile.close()
   >>> pdf1File.close()
   >>> pdf2File.close()

1 hoặc

   >>> import PyPDF2
   >>> pdf1File = open('meetingminutes.pdf', 'rb')
   >>> pdf2File = open('meetingminutes2.pdf', 'rb')
❶ >>> pdf1Reader = PyPDF2.PdfFileReader(pdf1File)
❷ >>> pdf2Reader = PyPDF2.PdfFileReader(pdf2File)
❸ >>> pdfWriter = PyPDF2.PdfFileWriter()

   >>> for pageNum in range(pdf1Reader.numPages):
❹         pageObj = pdf1Reader.getPage(pageNum)
❺         pdfWriter.addPage(pageObj)

   >>> for pageNum in range(pdf2Reader.numPages):
❻         pageObj = pdf2Reader.getPage(pageNum)
❼         pdfWriter.addPage(pageObj)

❽ >>> pdfOutputFile = open('combinedminutes.pdf', 'wb')
   >>> pdfWriter.write(pdfOutputFile)
   >>> pdfOutputFile.close()
   >>> pdf1File.close()
   >>> pdf2File.close()

2.zero-based index for getting pages: The first page is page 0, the second is Introduction, and so on. This is always the case, even if pages are numbered differently within the document. For example, say your PDF is a three-page excerpt from a longer report, and its pages are numbered 42, 43, and 44. To get the first page of this document, you would want to call

   >>> import PyPDF2
   >>> pdf1File = open('meetingminutes.pdf', 'rb')
   >>> pdf2File = open('meetingminutes2.pdf', 'rb')
❶ >>> pdf1Reader = PyPDF2.PdfFileReader(pdf1File)
❷ >>> pdf2Reader = PyPDF2.PdfFileReader(pdf2File)
❸ >>> pdfWriter = PyPDF2.PdfFileWriter()

   >>> for pageNum in range(pdf1Reader.numPages):
❹         pageObj = pdf1Reader.getPage(pageNum)
❺         pdfWriter.addPage(pageObj)

   >>> for pageNum in range(pdf2Reader.numPages):
❻         pageObj = pdf2Reader.getPage(pageNum)
❼         pdfWriter.addPage(pageObj)

❽ >>> pdfOutputFile = open('combinedminutes.pdf', 'wb')
   >>> pdfWriter.write(pdfOutputFile)
   >>> pdfOutputFile.close()
   >>> pdf1File.close()
   >>> pdf2File.close()

0, not

   >>> import PyPDF2
   >>> pdf1File = open('meetingminutes.pdf', 'rb')
   >>> pdf2File = open('meetingminutes2.pdf', 'rb')
❶ >>> pdf1Reader = PyPDF2.PdfFileReader(pdf1File)
❷ >>> pdf2Reader = PyPDF2.PdfFileReader(pdf2File)
❸ >>> pdfWriter = PyPDF2.PdfFileWriter()

   >>> for pageNum in range(pdf1Reader.numPages):
❹         pageObj = pdf1Reader.getPage(pageNum)
❺         pdfWriter.addPage(pageObj)

   >>> for pageNum in range(pdf2Reader.numPages):
❻         pageObj = pdf2Reader.getPage(pageNum)
❼         pdfWriter.addPage(pageObj)

❽ >>> pdfOutputFile = open('combinedminutes.pdf', 'wb')
   >>> pdfWriter.write(pdfOutputFile)
   >>> pdfOutputFile.close()
   >>> pdf1File.close()
   >>> pdf2File.close()

1 or

   >>> import PyPDF2
   >>> pdf1File = open('meetingminutes.pdf', 'rb')
   >>> pdf2File = open('meetingminutes2.pdf', 'rb')
❶ >>> pdf1Reader = PyPDF2.PdfFileReader(pdf1File)
❷ >>> pdf2Reader = PyPDF2.PdfFileReader(pdf2File)
❸ >>> pdfWriter = PyPDF2.PdfFileWriter()

   >>> for pageNum in range(pdf1Reader.numPages):
❹         pageObj = pdf1Reader.getPage(pageNum)
❺         pdfWriter.addPage(pageObj)

   >>> for pageNum in range(pdf2Reader.numPages):
❻         pageObj = pdf2Reader.getPage(pageNum)
❼         pdfWriter.addPage(pageObj)

❽ >>> pdfOutputFile = open('combinedminutes.pdf', 'wb')
   >>> pdfWriter.write(pdfOutputFile)
   >>> pdfOutputFile.close()
   >>> pdf1File.close()
   >>> pdf2File.close()

Khi bạn có đối tượng

   >>> import PyPDF2
   >>> pdfReader = PyPDF2.PdfFileReader(open('encrypted.pdf', 'rb'))
❶ >>> pdfReader.isEncrypted
   True
   >>> pdfReader.getPage(0)
❷ Traceback (most recent call last):
     File "", line 1, in 
       pdfReader.getPage()
     --snip--
     File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1173, in getObject
       raise utils.PdfReadError("file has not been decrypted")
   PyPDF2.utils.PdfReadError: file has not been decrypted
❸ >>> pdfReader.decrypt('rosebud')
   1
   >>> pageObj = pdfReader.getPage(0)

5 của mình, hãy gọi phương thức

   >>> import PyPDF2
   >>> pdf1File = open('meetingminutes.pdf', 'rb')
   >>> pdf2File = open('meetingminutes2.pdf', 'rb')
❶ >>> pdf1Reader = PyPDF2.PdfFileReader(pdf1File)
❷ >>> pdf2Reader = PyPDF2.PdfFileReader(pdf2File)
❸ >>> pdfWriter = PyPDF2.PdfFileWriter()

   >>> for pageNum in range(pdf1Reader.numPages):
❹         pageObj = pdf1Reader.getPage(pageNum)
❺         pdfWriter.addPage(pageObj)

   >>> for pageNum in range(pdf2Reader.numPages):
❻         pageObj = pdf2Reader.getPage(pageNum)
❼         pdfWriter.addPage(pageObj)

❽ >>> pdfOutputFile = open('combinedminutes.pdf', 'wb')
   >>> pdfWriter.write(pdfOutputFile)
   >>> pdfOutputFile.close()
   >>> pdf1File.close()
   >>> pdf2File.close()

4 của nó để trả về một chuỗi của văn bản trang. Việc trích xuất văn bản là hoàn hảo: Văn bản Charles E. xông Chas Roemer Roemer, chủ tịch của PDF vắng mặt trong chuỗi được trả về bởi

   >>> import PyPDF2
   >>> pdf1File = open('meetingminutes.pdf', 'rb')
   >>> pdf2File = open('meetingminutes2.pdf', 'rb')
❶ >>> pdf1Reader = PyPDF2.PdfFileReader(pdf1File)
❷ >>> pdf2Reader = PyPDF2.PdfFileReader(pdf2File)
❸ >>> pdfWriter = PyPDF2.PdfFileWriter()

   >>> for pageNum in range(pdf1Reader.numPages):
❹         pageObj = pdf1Reader.getPage(pageNum)
❺         pdfWriter.addPage(pageObj)

   >>> for pageNum in range(pdf2Reader.numPages):
❻         pageObj = pdf2Reader.getPage(pageNum)
❼         pdfWriter.addPage(pageObj)

❽ >>> pdfOutputFile = open('combinedminutes.pdf', 'wb')
   >>> pdfWriter.write(pdfOutputFile)
   >>> pdfOutputFile.close()
   >>> pdf1File.close()
   >>> pdf2File.close()

4, và khoảng cách đôi khi bị tắt. Tuy nhiên, sự gần đúng của nội dung văn bản PDF này có thể đủ tốt cho chương trình của bạn.Charles E. “Chas” Roemer, President from the PDF is absent from the string returned by

   >>> import PyPDF2
   >>> pdf1File = open('meetingminutes.pdf', 'rb')
   >>> pdf2File = open('meetingminutes2.pdf', 'rb')
❶ >>> pdf1Reader = PyPDF2.PdfFileReader(pdf1File)
❷ >>> pdf2Reader = PyPDF2.PdfFileReader(pdf2File)
❸ >>> pdfWriter = PyPDF2.PdfFileWriter()

   >>> for pageNum in range(pdf1Reader.numPages):
❹         pageObj = pdf1Reader.getPage(pageNum)
❺         pdfWriter.addPage(pageObj)

   >>> for pageNum in range(pdf2Reader.numPages):
❻         pageObj = pdf2Reader.getPage(pageNum)
❼         pdfWriter.addPage(pageObj)

❽ >>> pdfOutputFile = open('combinedminutes.pdf', 'wb')
   >>> pdfWriter.write(pdfOutputFile)
   >>> pdfOutputFile.close()
   >>> pdf1File.close()
   >>> pdf2File.close()

4, and the spacing is sometimes off. Still, this approximation of the PDF text content may be good enough for your program.

Giải mã PDFS

Một số tài liệu PDF có tính năng mã hóa sẽ khiến chúng không được đọc cho đến khi bất cứ ai mở tài liệu cung cấp mật khẩu. Nhập phần sau vào vỏ tương tác với PDF bạn đã tải xuống, đã được mã hóa bằng mật khẩu Rosebud:rosebud:

   >>> import PyPDF2
   >>> pdfReader = PyPDF2.PdfFileReader(open('encrypted.pdf', 'rb'))
❶ >>> pdfReader.isEncrypted
   True
   >>> pdfReader.getPage(0)
❷ Traceback (most recent call last):
     File "", line 1, in 
       pdfReader.getPage()
     --snip--
     File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1173, in getObject
       raise utils.PdfReadError("file has not been decrypted")
   PyPDF2.utils.PdfReadError: file has not been decrypted
❸ >>> pdfReader.decrypt('rosebud')
   1
   >>> pageObj = pdfReader.getPage(0)

Tất cả các đối tượng

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

8 đều có thuộc tính

   >>> import PyPDF2
   >>> pdf1File = open('meetingminutes.pdf', 'rb')
   >>> pdf2File = open('meetingminutes2.pdf', 'rb')
❶ >>> pdf1Reader = PyPDF2.PdfFileReader(pdf1File)
❷ >>> pdf2Reader = PyPDF2.PdfFileReader(pdf2File)
❸ >>> pdfWriter = PyPDF2.PdfFileWriter()

   >>> for pageNum in range(pdf1Reader.numPages):
❹         pageObj = pdf1Reader.getPage(pageNum)
❺         pdfWriter.addPage(pageObj)

   >>> for pageNum in range(pdf2Reader.numPages):
❻         pageObj = pdf2Reader.getPage(pageNum)
❼         pdfWriter.addPage(pageObj)

❽ >>> pdfOutputFile = open('combinedminutes.pdf', 'wb')
   >>> pdfWriter.write(pdfOutputFile)
   >>> pdfOutputFile.close()
   >>> pdf1File.close()
   >>> pdf2File.close()

7 là

   >>> import PyPDF2
   >>> pdf1File = open('meetingminutes.pdf', 'rb')
   >>> pdf2File = open('meetingminutes2.pdf', 'rb')
❶ >>> pdf1Reader = PyPDF2.PdfFileReader(pdf1File)
❷ >>> pdf2Reader = PyPDF2.PdfFileReader(pdf2File)
❸ >>> pdfWriter = PyPDF2.PdfFileWriter()

   >>> for pageNum in range(pdf1Reader.numPages):
❹         pageObj = pdf1Reader.getPage(pageNum)
❺         pdfWriter.addPage(pageObj)

   >>> for pageNum in range(pdf2Reader.numPages):
❻         pageObj = pdf2Reader.getPage(pageNum)
❼         pdfWriter.addPage(pageObj)

❽ >>> pdfOutputFile = open('combinedminutes.pdf', 'wb')
   >>> pdfWriter.write(pdfOutputFile)
   >>> pdfOutputFile.close()
   >>> pdf1File.close()
   >>> pdf2File.close()

8 nếu PDF được mã hóa và

   >>> import PyPDF2
   >>> pdf1File = open('meetingminutes.pdf', 'rb')
   >>> pdf2File = open('meetingminutes2.pdf', 'rb')
❶ >>> pdf1Reader = PyPDF2.PdfFileReader(pdf1File)
❷ >>> pdf2Reader = PyPDF2.PdfFileReader(pdf2File)
❸ >>> pdfWriter = PyPDF2.PdfFileWriter()

   >>> for pageNum in range(pdf1Reader.numPages):
❹         pageObj = pdf1Reader.getPage(pageNum)
❺         pdfWriter.addPage(pageObj)

   >>> for pageNum in range(pdf2Reader.numPages):
❻         pageObj = pdf2Reader.getPage(pageNum)
❼         pdfWriter.addPage(pageObj)

❽ >>> pdfOutputFile = open('combinedminutes.pdf', 'wb')
   >>> pdfWriter.write(pdfOutputFile)
   >>> pdfOutputFile.close()
   >>> pdf1File.close()
   >>> pdf2File.close()

9 nếu nó không phải là ❶. Bất kỳ nỗ lực nào để gọi một hàm đọc tệp trước khi nó được giải mã bằng mật khẩu chính xác sẽ dẫn đến lỗi.

Để đọc PDF được mã hóa, hãy gọi hàm

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❶ >>> page = pdfReader.getPage(0)
❷ >>> page.rotateClockwise(90)
   {'/Contents': [IndirectObject(961, 0), IndirectObject(962, 0),
   --snip--
   }
   >>> pdfWriter = PyPDF2.PdfFileWriter()
   >>> pdfWriter.addPage(page)
❸ >>> resultPdfFile = open('rotatedPage.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> resultPdfFile.close()
   >>> minutesFile.close()

0 và truyền mật khẩu dưới dạng chuỗi. Sau khi bạn gọi

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❶ >>> page = pdfReader.getPage(0)
❷ >>> page.rotateClockwise(90)
   {'/Contents': [IndirectObject(961, 0), IndirectObject(962, 0),
   --snip--
   }
   >>> pdfWriter = PyPDF2.PdfFileWriter()
   >>> pdfWriter.addPage(page)
❸ >>> resultPdfFile = open('rotatedPage.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> resultPdfFile.close()
   >>> minutesFile.close()

0 với mật khẩu chính xác, bạn sẽ thấy rằng việc gọi

   >>> import PyPDF2
   >>> pdfReader = PyPDF2.PdfFileReader(open('encrypted.pdf', 'rb'))
❶ >>> pdfReader.isEncrypted
   True
   >>> pdfReader.getPage(0)
❷ Traceback (most recent call last):
     File "", line 1, in 
       pdfReader.getPage()
     --snip--
     File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1173, in getObject
       raise utils.PdfReadError("file has not been decrypted")
   PyPDF2.utils.PdfReadError: file has not been decrypted
❸ >>> pdfReader.decrypt('rosebud')
   1
   >>> pageObj = pdfReader.getPage(0)

8 không còn gây ra lỗi. Nếu được cung cấp sai mật khẩu, hàm

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❶ >>> page = pdfReader.getPage(0)
❷ >>> page.rotateClockwise(90)
   {'/Contents': [IndirectObject(961, 0), IndirectObject(962, 0),
   --snip--
   }
   >>> pdfWriter = PyPDF2.PdfFileWriter()
   >>> pdfWriter.addPage(page)
❸ >>> resultPdfFile = open('rotatedPage.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> resultPdfFile.close()
   >>> minutesFile.close()

0 sẽ trả về

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❶ >>> page = pdfReader.getPage(0)
❷ >>> page.rotateClockwise(90)
   {'/Contents': [IndirectObject(961, 0), IndirectObject(962, 0),
   --snip--
   }
   >>> pdfWriter = PyPDF2.PdfFileWriter()
   >>> pdfWriter.addPage(page)
❸ >>> resultPdfFile = open('rotatedPage.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> resultPdfFile.close()
   >>> minutesFile.close()

4 và

   >>> import PyPDF2
   >>> pdfReader = PyPDF2.PdfFileReader(open('encrypted.pdf', 'rb'))
❶ >>> pdfReader.isEncrypted
   True
   >>> pdfReader.getPage(0)
❷ Traceback (most recent call last):
     File "", line 1, in 
       pdfReader.getPage()
     --snip--
     File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1173, in getObject
       raise utils.PdfReadError("file has not been decrypted")
   PyPDF2.utils.PdfReadError: file has not been decrypted
❸ >>> pdfReader.decrypt('rosebud')
   1
   >>> pageObj = pdfReader.getPage(0)

8 sẽ tiếp tục thất bại. Lưu ý rằng phương thức

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❶ >>> page = pdfReader.getPage(0)
❷ >>> page.rotateClockwise(90)
   {'/Contents': [IndirectObject(961, 0), IndirectObject(962, 0),
   --snip--
   }
   >>> pdfWriter = PyPDF2.PdfFileWriter()
   >>> pdfWriter.addPage(page)
❸ >>> resultPdfFile = open('rotatedPage.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> resultPdfFile.close()
   >>> minutesFile.close()

0 chỉ giải mã đối tượng

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

8, không phải là tệp PDF thực tế. Sau khi chương trình của bạn chấm dứt, tệp trên ổ cứng của bạn vẫn được mã hóa. Chương trình của bạn sẽ phải gọi lại

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❶ >>> page = pdfReader.getPage(0)
❷ >>> page.rotateClockwise(90)
   {'/Contents': [IndirectObject(961, 0), IndirectObject(962, 0),
   --snip--
   }
   >>> pdfWriter = PyPDF2.PdfFileWriter()
   >>> pdfWriter.addPage(page)
❸ >>> resultPdfFile = open('rotatedPage.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> resultPdfFile.close()
   >>> minutesFile.close()

0 vào lần tiếp theo được chạy.

Tạo pdfs

Đối tác PYPDF2 đối tác với các đối tượng

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

8 là các đối tượng

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

0, có thể tạo các tệp PDF mới. Nhưng PYPDF2 không thể viết văn bản tùy ý vào PDF như Python có thể làm với các tệp plaintext. Thay vào đó, các khả năng ghi PDF PYPDF2 được giới hạn trong việc sao chép các trang từ các tệp PDF khác, các trang xoay, các trang phủ và mã hóa các tệp.

PYPDF2 không cho phép bạn chỉnh sửa trực tiếp PDF. Thay vào đó, bạn phải tạo một tệp PDF mới và sau đó sao chép nội dung từ một tài liệu hiện có. Các ví dụ trong phần này sẽ tuân theo cách tiếp cận chung này:

Mở một hoặc nhiều tệp PDF hiện có (PDF nguồn) vào các đối tượng

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

Tạo một đối tượng

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

0 mới.

Sao chép các trang từ các đối tượng

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

8 vào đối tượng

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

Cuối cùng, sử dụng đối tượng

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

0 để viết PDF đầu ra.

Tạo một đối tượng

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

0 chỉ tạo ra một giá trị đại diện cho một tài liệu PDF trong Python. Nó không tạo ra tệp PDF thực tế. Đối với điều đó, bạn phải gọi phương thức pdffileWriter từ ____ ____77.

Phương thức

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

7 có một đối tượng

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

9 thông thường đã được mở ở chế độ ghi nhị phân. Bạn có thể nhận được một đối tượng

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

9 như vậy bằng cách gọi chức năng Python từ

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

3 với hai đối số: chuỗi của những gì bạn muốn tên tệp PDF PDF và

   >>> import PyPDF2
   >>> pdfFile = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFile)
   >>> pdfWriter = PyPDF2.PdfFileWriter()
   >>> for pageNum in range(pdfReader.numPages):
           pdfWriter.addPage(pdfReader.getPage(pageNum))

❶ >>> pdfWriter.encrypt('swordfish')
   >>> resultPdf = open('encryptedminutes.pdf', 'wb')
   >>> pdfWriter.write(resultPdf)
   >>> resultPdf.close()

2 để chỉ ra tệp nên được mở ở chế độ ghi nhị phân.write-binary mode. You can get such a

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

9 object by calling Python’s

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

3 function with two arguments: the string of what you want the PDF’s filename to be and

   >>> import PyPDF2
   >>> pdfFile = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFile)
   >>> pdfWriter = PyPDF2.PdfFileWriter()
   >>> for pageNum in range(pdfReader.numPages):
           pdfWriter.addPage(pdfReader.getPage(pageNum))

❶ >>> pdfWriter.encrypt('swordfish')
   >>> resultPdf = open('encryptedminutes.pdf', 'wb')
   >>> pdfWriter.write(resultPdf)
   >>> resultPdf.close()

2 to indicate the file should be opened in write-binary mode.

Nếu điều này nghe có vẻ hơi khó hiểu, thì đừng lo lắng, bạn sẽ thấy cách thức hoạt động trong các ví dụ mã sau.

Trang sao chép

Bạn có thể sử dụng PYPDF2 để sao chép các trang từ tài liệu PDF này sang tài liệu khác. Điều này cho phép bạn kết hợp nhiều tệp PDF, cắt các trang không mong muốn hoặc sắp xếp lại các trang.

Tải xuống Gặp gỡ Nhập phần sau vào vỏ tương tác:meetingminutes.pdf and meetingminutes2.pdf from http://nostarch.com/automatestuff/ and place the PDFs in the current working directory. Enter the following into the interactive shell:

   >>> import PyPDF2
   >>> pdf1File = open('meetingminutes.pdf', 'rb')
   >>> pdf2File = open('meetingminutes2.pdf', 'rb')
❶ >>> pdf1Reader = PyPDF2.PdfFileReader(pdf1File)
❷ >>> pdf2Reader = PyPDF2.PdfFileReader(pdf2File)
❸ >>> pdfWriter = PyPDF2.PdfFileWriter()

   >>> for pageNum in range(pdf1Reader.numPages):
❹         pageObj = pdf1Reader.getPage(pageNum)
❺         pdfWriter.addPage(pageObj)

   >>> for pageNum in range(pdf2Reader.numPages):
❻         pageObj = pdf2Reader.getPage(pageNum)
❼         pdfWriter.addPage(pageObj)

❽ >>> pdfOutputFile = open('combinedminutes.pdf', 'wb')
   >>> pdfWriter.write(pdfOutputFile)
   >>> pdfOutputFile.close()
   >>> pdf1File.close()
   >>> pdf2File.close()

Mở cả hai tệp PDF ở chế độ nhị phân đọc và lưu trữ hai đối tượng

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

9 trong

   >>> import PyPDF2
   >>> pdfFile = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFile)
   >>> pdfWriter = PyPDF2.PdfFileWriter()
   >>> for pageNum in range(pdfReader.numPages):
           pdfWriter.addPage(pdfReader.getPage(pageNum))

❶ >>> pdfWriter.encrypt('swordfish')
   >>> resultPdf = open('encryptedminutes.pdf', 'wb')
   >>> pdfWriter.write(resultPdf)
   >>> resultPdf.close()

4 và

   >>> import PyPDF2
   >>> pdfFile = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFile)
   >>> pdfWriter = PyPDF2.PdfFileWriter()
   >>> for pageNum in range(pdfReader.numPages):
           pdfWriter.addPage(pdfReader.getPage(pageNum))

❶ >>> pdfWriter.encrypt('swordfish')
   >>> resultPdf = open('encryptedminutes.pdf', 'wb')
   >>> pdfWriter.write(resultPdf)
   >>> resultPdf.close()

5. Gọi

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

9 và vượt qua

   >>> import PyPDF2
   >>> pdfFile = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFile)
   >>> pdfWriter = PyPDF2.PdfFileWriter()
   >>> for pageNum in range(pdfReader.numPages):
           pdfWriter.addPage(pdfReader.getPage(pageNum))

❶ >>> pdfWriter.encrypt('swordfish')
   >>> resultPdf = open('encryptedminutes.pdf', 'wb')
   >>> pdfWriter.write(resultPdf)
   >>> resultPdf.close()

4 để nhận đối tượng

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

8 cho meetminutes.pdf. Gọi lại và chuyển nó

   >>> import PyPDF2
   >>> pdfFile = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFile)
   >>> pdfWriter = PyPDF2.PdfFileWriter()
   >>> for pageNum in range(pdfReader.numPages):
           pdfWriter.addPage(pdfReader.getPage(pageNum))

❶ >>> pdfWriter.encrypt('swordfish')
   >>> resultPdf = open('encryptedminutes.pdf', 'wb')
   >>> pdfWriter.write(resultPdf)
   >>> resultPdf.close()

5 để có được một đối tượng

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

8 cho meetminutes2.pdf. Sau đó tạo một đối tượng

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

0 mới, đại diện cho một tài liệu PDF trống.meetingminutes.pdf ❶. Call it again and pass it

   >>> import PyPDF2
   >>> pdfFile = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFile)
   >>> pdfWriter = PyPDF2.PdfFileWriter()
   >>> for pageNum in range(pdfReader.numPages):
           pdfWriter.addPage(pdfReader.getPage(pageNum))

❶ >>> pdfWriter.encrypt('swordfish')
   >>> resultPdf = open('encryptedminutes.pdf', 'wb')
   >>> pdfWriter.write(resultPdf)
   >>> resultPdf.close()

5 to get a

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

8 object for meetingminutes2.pdf ❷. Then create a new

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

0 object, which represents a blank PDF document ❸.

Tiếp theo, sao chép tất cả các trang từ hai PDF nguồn và thêm chúng vào đối tượng

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

0. Nhận đối tượng

   >>> import PyPDF2
   >>> pdfReader = PyPDF2.PdfFileReader(open('encrypted.pdf', 'rb'))
❶ >>> pdfReader.isEncrypted
   True
   >>> pdfReader.getPage(0)
❷ Traceback (most recent call last):
     File "", line 1, in 
       pdfReader.getPage()
     --snip--
     File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1173, in getObject
       raise utils.PdfReadError("file has not been decrypted")
   PyPDF2.utils.PdfReadError: file has not been decrypted
❸ >>> pdfReader.decrypt('rosebud')
   1
   >>> pageObj = pdfReader.getPage(0)

5 bằng cách gọi

   >>> import PyPDF2
   >>> pdfReader = PyPDF2.PdfFileReader(open('encrypted.pdf', 'rb'))
❶ >>> pdfReader.isEncrypted
   True
   >>> pdfReader.getPage(0)
❷ Traceback (most recent call last):
     File "", line 1, in 
       pdfReader.getPage()
     --snip--
     File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1173, in getObject
       raise utils.PdfReadError("file has not been decrypted")
   PyPDF2.utils.PdfReadError: file has not been decrypted
❸ >>> pdfReader.decrypt('rosebud')
   1
   >>> pageObj = pdfReader.getPage(0)

8 trên đối tượng

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

8. Sau đó, chuyển đối tượng

   >>> import PyPDF2
   >>> pdfReader = PyPDF2.PdfFileReader(open('encrypted.pdf', 'rb'))
❶ >>> pdfReader.isEncrypted
   True
   >>> pdfReader.getPage(0)
❷ Traceback (most recent call last):
     File "", line 1, in 
       pdfReader.getPage()
     --snip--
     File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1173, in getObject
       raise utils.PdfReadError("file has not been decrypted")
   PyPDF2.utils.PdfReadError: file has not been decrypted
❸ >>> pdfReader.decrypt('rosebud')
   1
   >>> pageObj = pdfReader.getPage(0)

5 đó cho phương thức pdffileWriter của bạn ____ ____997. Các bước này được thực hiện đầu tiên cho

   #! python3
   # combinePdfs.py - Combines all the PDFs in the current working directory into
   # into a single PDF.

❶ import PyPDF2, os

   # Get all the PDF filenames.
   pdfFiles = []
   for filename in os.listdir('.'):
       if filename.endswith('.pdf'):
❷         pdfFiles.append(filename)
❸ pdfFiles.sort(key=str.lower)

❹ pdfWriter = PyPDF2.PdfFileWriter()

   # TODO: Loop through all the PDF files.

   # TODO: Loop through all the pages (except the first) and add them.

   # TODO: Save the resulting PDF to a file.

8 và sau đó một lần nữa cho

   #! python3
   # combinePdfs.py - Combines all the PDFs in the current working directory into
   # into a single PDF.

❶ import PyPDF2, os

   # Get all the PDF filenames.
   pdfFiles = []
   for filename in os.listdir('.'):
       if filename.endswith('.pdf'):
❷         pdfFiles.append(filename)
❸ pdfFiles.sort(key=str.lower)

❹ pdfWriter = PyPDF2.PdfFileWriter()

   # TODO: Loop through all the PDF files.

   # TODO: Loop through all the pages (except the first) and add them.

   # TODO: Save the resulting PDF to a file.

9. Khi bạn đã thực hiện các trang sao chép xong, hãy viết một tệp PDF mới có tên CombineDMinutes.pdf bằng cách chuyển một đối tượng

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

9 cho phương thức pdffileWriter từ ____ ____77.combinedminutes.pdf by passing a

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

9 object to the PdfFileWriter’s

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

7 method ❻.

Ghi chú

PYPDF2 không thể chèn các trang ở giữa A

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

0Object; The ____97method sẽ chỉ thêm các trang vào cuối.

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

0 object; the

   #! python3
   # combinePdfs.py - Combines all the PDFs in the current working directory into
   # into a single PDF.

❶ import PyPDF2, os

   # Get all the PDF filenames.
   pdfFiles = []
   for filename in os.listdir('.'):
       if filename.endswith('.pdf'):
❷         pdfFiles.append(filename)
❸ pdfFiles.sort(key=str.lower)

❹ pdfWriter = PyPDF2.PdfFileWriter()

   # TODO: Loop through all the PDF files.

   # TODO: Loop through all the pages (except the first) and add them.

   # TODO: Save the resulting PDF to a file.

7 method will only add pages to the end.

Bây giờ bạn đã tạo một tệp PDF mới kết hợp các trang từ meetminutes.pdf và meetminutes2.pdf thành một tài liệu duy nhất. Hãy nhớ rằng đối tượng

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

9 được chuyển sang

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

9 cần được mở ở chế độ đọc nhị phân bằng cách chuyển

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

06 làm đối số thứ hai cho

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

3. Tương tự như vậy, đối tượng

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

9 được chuyển đến

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

09 cần được mở ở chế độ ghi nhị phân với

   >>> import PyPDF2
   >>> pdfFile = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFile)
   >>> pdfWriter = PyPDF2.PdfFileWriter()
   >>> for pageNum in range(pdfReader.numPages):
           pdfWriter.addPage(pdfReader.getPage(pageNum))

❶ >>> pdfWriter.encrypt('swordfish')
   >>> resultPdf = open('encryptedminutes.pdf', 'wb')
   >>> pdfWriter.write(resultPdf)
   >>> resultPdf.close()

2.meetingminutes.pdf and meetingminutes2.pdf into a single document. Remember that the

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

9 object passed to

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

9 needs to be opened in read-binary mode by passing

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

06 as the second argument to

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

3. Likewise, the

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

9 object passed to

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

09 needs to be opened in write-binary mode with

   >>> import PyPDF2
   >>> pdfFile = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFile)
   >>> pdfWriter = PyPDF2.PdfFileWriter()
   >>> for pageNum in range(pdfReader.numPages):
           pdfWriter.addPage(pdfReader.getPage(pageNum))

❶ >>> pdfWriter.encrypt('swordfish')
   >>> resultPdf = open('encryptedminutes.pdf', 'wb')
   >>> pdfWriter.write(resultPdf)
   >>> resultPdf.close()

Trang quay

Các trang của PDF cũng có thể được xoay theo gia số 90 độ với các phương pháp

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

11 và

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

12. Vượt qua một trong các số nguyên

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

13,

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

14 hoặc

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

15 cho các phương pháp này. Nhập phần sau vào Shell tương tác, với tệp meetminutes.pdf vào thư mục làm việc hiện tại:meetingminutes.pdf file in the current working directory:

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❶ >>> page = pdfReader.getPage(0)
❷ >>> page.rotateClockwise(90)
   {'/Contents': [IndirectObject(961, 0), IndirectObject(962, 0),
   --snip--
   }
   >>> pdfWriter = PyPDF2.PdfFileWriter()
   >>> pdfWriter.addPage(page)
❸ >>> resultPdfFile = open('rotatedPage.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> resultPdfFile.close()
   >>> minutesFile.close()

Ở đây chúng tôi sử dụng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

16 để chọn trang đầu tiên của PDF, và sau đó chúng tôi gọi

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

17 trên trang đó. Chúng tôi viết một tệp PDF mới với trang xoay và lưu nó dưới dạng RotatedPage.pdf.rotatedPage.pdf ❸.

PDF kết quả sẽ có một trang, xoay 90 độ theo chiều kim đồng hồ, như trong Hình & NBSP; 13-2. Các giá trị trả về từ

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

11 và

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

12 chứa rất nhiều thông tin mà bạn có thể bỏ qua.

Hình & nbsp; 13-2. & Nbsp; tệp rotatedpage.pdf với trang xoay 90 độ theo chiều kim đồng hồrotatedPage.pdf file with the page rotated 90 degrees clockwise

Lớp phủ trang

PYPDF2 cũng có thể phủ lên nội dung của một trang trên một trang khác, rất hữu ích cho việc thêm logo, dấu thời gian hoặc hình mờ vào một trang. Với Python, nó rất dễ dàng để thêm hình mờ vào nhiều tệp và chỉ vào các trang chương trình của bạn chỉ định.

Tải xuống watermark.pdf từ http://nostarch.com/automatestuff/ và đặt pdf vào thư mục làm việc hiện tại cùng với meetminutes.pdf. Sau đó nhập phần sau vào vỏ tương tác:watermark.pdf from http://nostarch.com/automatestuff/ and place the PDF in the current working directory along with meetingminutes.pdf. Then enter the following into the interactive shell:

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

Ở đây chúng tôi tạo một đối tượng

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

8 của meetminutes.pdf. Chúng tôi gọi

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

16 để nhận đối tượng

   >>> import PyPDF2
   >>> pdfReader = PyPDF2.PdfFileReader(open('encrypted.pdf', 'rb'))
❶ >>> pdfReader.isEncrypted
   True
   >>> pdfReader.getPage(0)
❷ Traceback (most recent call last):
     File "", line 1, in 
       pdfReader.getPage()
     --snip--
     File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1173, in getObject
       raise utils.PdfReadError("file has not been decrypted")
   PyPDF2.utils.PdfReadError: file has not been decrypted
❸ >>> pdfReader.decrypt('rosebud')
   1
   >>> pageObj = pdfReader.getPage(0)

5 cho trang đầu tiên và lưu trữ đối tượng này trong

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

23. Sau đó, chúng tôi tạo một đối tượng

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

8 cho watermark.pdf và gọi

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

25 trên

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

23. Đối số chúng tôi chuyển đến

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

25 là đối tượng

   >>> import PyPDF2
   >>> pdfReader = PyPDF2.PdfFileReader(open('encrypted.pdf', 'rb'))
❶ >>> pdfReader.isEncrypted
   True
   >>> pdfReader.getPage(0)
❷ Traceback (most recent call last):
     File "", line 1, in 
       pdfReader.getPage()
     --snip--
     File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1173, in getObject
       raise utils.PdfReadError("file has not been decrypted")
   PyPDF2.utils.PdfReadError: file has not been decrypted
❸ >>> pdfReader.decrypt('rosebud')
   1
   >>> pageObj = pdfReader.getPage(0)

5 cho trang đầu tiên của watermark.pdf.meetingminutes.pdf ❶. We call

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

16 to get a

   >>> import PyPDF2
   >>> pdfReader = PyPDF2.PdfFileReader(open('encrypted.pdf', 'rb'))
❶ >>> pdfReader.isEncrypted
   True
   >>> pdfReader.getPage(0)
❷ Traceback (most recent call last):
     File "", line 1, in 
       pdfReader.getPage()
     --snip--
     File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1173, in getObject
       raise utils.PdfReadError("file has not been decrypted")
   PyPDF2.utils.PdfReadError: file has not been decrypted
❸ >>> pdfReader.decrypt('rosebud')
   1
   >>> pageObj = pdfReader.getPage(0)

5 object for the first page and store this object in

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

23 ❷. We then make a

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

8 object for watermark.pdf ❸ and call

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

25 on

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

23 ❹. The argument we pass to

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

25 is a

   >>> import PyPDF2
   >>> pdfReader = PyPDF2.PdfFileReader(open('encrypted.pdf', 'rb'))
❶ >>> pdfReader.isEncrypted
   True
   >>> pdfReader.getPage(0)
❷ Traceback (most recent call last):
     File "", line 1, in 
       pdfReader.getPage()
     --snip--
     File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1173, in getObject
       raise utils.PdfReadError("file has not been decrypted")
   PyPDF2.utils.PdfReadError: file has not been decrypted
❸ >>> pdfReader.decrypt('rosebud')
   1
   >>> pageObj = pdfReader.getPage(0)

5 object for the first page of watermark.pdf.

Bây giờ chúng tôi đã gọi

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

25 vào

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

23,

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

23 đại diện cho trang đầu tiên được đánh dấu bằng hình mờ. Chúng tôi tạo một đối tượng

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

0 và thêm trang đầu tiên được đánh dấu dưới hình. Sau đó, chúng tôi lặp qua phần còn lại của các trang trong meetminutes.pdf và thêm chúng vào đối tượng

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

0. Cuối cùng, chúng tôi mở một tệp PDF mới có tên WatermarkedCover.pdf và viết nội dung của PDffileWriter cho PDF mới.meetingminutes.pdf and add them to the

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

0 object ❼. Finally, we open a new PDF called watermarkedCover.pdf and write the contents of the PdfFileWriter to the new PDF.

Hình & nbsp; 13-3 hiển thị kết quả. PDF mới của chúng tôi, watermarkedCover.pdf, có tất cả các nội dung của meetminutes.pdf, và trang đầu tiên được đánh dấu bằng hình mờ.watermarkedCover.pdf, has all the contents of the meetingminutes.pdf, and the first page is watermarked.

Hình & NBSP;

Mã hóa PDFS

Đối tượng

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

0 cũng có thể thêm mã hóa vào tài liệu PDF. Nhập phần sau vào vỏ tương tác:

   >>> import PyPDF2
   >>> pdfFile = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFile)
   >>> pdfWriter = PyPDF2.PdfFileWriter()
   >>> for pageNum in range(pdfReader.numPages):
           pdfWriter.addPage(pdfReader.getPage(pageNum))

❶ >>> pdfWriter.encrypt('swordfish')
   >>> resultPdf = open('encryptedminutes.pdf', 'wb')
   >>> pdfWriter.write(resultPdf)
   >>> resultPdf.close()

Trước khi gọi phương thức

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

7 để lưu vào tệp, hãy gọi phương thức

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

36 và chuyển nó một chuỗi mật khẩu. PDF có thể có mật khẩu người dùng (cho phép bạn xem PDF) và mật khẩu chủ sở hữu (cho phép bạn đặt quyền in, bình luận, trích xuất văn bản và các tính năng khác). Mật khẩu người dùng và mật khẩu chủ sở hữu lần lượt là đối số thứ nhất và thứ hai với

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

36. Nếu chỉ có một đối số chuỗi được truyền đến

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

36, nó sẽ được sử dụng cho cả hai mật khẩu.user password (allowing you to view the PDF) and an owner password (allowing you to set permissions for printing, commenting, extracting text, and other features). The user password and owner password are the first and second arguments to

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

36, respectively. If only one string argument is passed to

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

36, it will be used for both passwords.

Trong ví dụ này, chúng tôi đã sao chép các trang của meetminutes.pdf sang đối tượng

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

0. Chúng tôi đã mã hóa pdffilewriter bằng cá kiếm mật khẩu, đã mở một pdf mới có tên EncryptedMinutes.pdf và viết nội dung của pdffilewriter cho pdf mới. Trước khi bất cứ ai có thể xem EncryptedMinutes.pdf, họ sẽ phải nhập mật khẩu này. Bạn có thể muốn xóa tệp ban đầu, không được mã hóa.pdf sau khi đảm bảo bản sao của nó được mã hóa chính xác.meetingminutes.pdf to a

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

0 object. We encrypted the PdfFileWriter with the password swordfish, opened a new PDF called encryptedminutes.pdf, and wrote the contents of the PdfFileWriter to the new PDF. Before anyone can view encryptedminutes.pdf, they’ll have to enter this password. You may want to delete the original, unencrypted meetingminutes.pdf file after ensuring its copy was correctly encrypted.

Dự án: Kết hợp các trang chọn lọc từ nhiều tệp PDF

Giả sử bạn có công việc nhàm chán là hợp nhất vài chục tài liệu PDF vào một tệp PDF duy nhất. Mỗi người trong số họ có một trang bìa là trang đầu tiên, nhưng bạn không muốn trang bìa được lặp lại trong kết quả cuối cùng. Mặc dù có rất nhiều chương trình miễn phí để kết hợp PDF, nhiều trong số chúng chỉ đơn giản hợp nhất toàn bộ các tệp với nhau. Hãy để viết một chương trình Python để tùy chỉnh trang bạn muốn trong PDF kết hợp.

Ở cấp độ cao, ở đây, những gì chương trình sẽ làm:

Tìm tất cả các tệp PDF trong thư mục làm việc hiện tại.
Sắp xếp tên tệp để các tệp PDF được thêm vào theo thứ tự.
Viết từng trang, không bao gồm trang đầu tiên, của mỗi tệp PDF vào tệp đầu ra.
Về mặt thực hiện, mã của bạn sẽ cần phải thực hiện như sau:
Gọi
```
document = Document('existing-document-file.docx')
document.save('new-file-name.docx')
```
40 để tìm tất cả các tệp trong thư mục làm việc và xóa bất kỳ tệp không PDF nào.

Gọi Phương thức danh sách Python từ

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

41 để bảng chữ cái tên tệp.

Tạo một đối tượng

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

0 cho PDF đầu ra.

Vòng lặp qua mỗi tệp PDF, tạo một đối tượng

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

8 cho nó.

Vòng lặp qua mỗi trang (ngoại trừ trang đầu tiên) trong mỗi tệp PDF.
Thêm các trang vào PDF đầu ra.
Viết PDF đầu ra vào một tệp có tên allminutes.pdf.allminutes.pdf.

Đối với dự án này, hãy mở một cửa sổ Trình chỉnh sửa tệp mới và lưu nó dưới dạng CombinePdfs.py.combinePdfs.py.

Bước 1: Tìm tất cả các tệp PDF

Đầu tiên, chương trình của bạn cần có một danh sách tất cả các tệp với tiện ích mở rộng .pdf trong thư mục làm việc hiện tại và sắp xếp chúng. Làm cho mã của bạn trông giống như sau:.pdf extension in the current working directory and sort them. Make your code look like the following:

   #! python3
   # combinePdfs.py - Combines all the PDFs in the current working directory into
   # into a single PDF.

❶ import PyPDF2, os

   # Get all the PDF filenames.
   pdfFiles = []
   for filename in os.listdir('.'):
       if filename.endswith('.pdf'):
❷         pdfFiles.append(filename)
❸ pdfFiles.sort(key=str.lower)

❹ pdfWriter = PyPDF2.PdfFileWriter()

   # TODO: Loop through all the PDF files.

   # TODO: Loop through all the pages (except the first) and add them.

   # TODO: Save the resulting PDF to a file.

Sau dòng Shebang và nhận xét mô tả về những gì chương trình làm, mã này nhập các mô -đun

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

44 và

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

6. Cuộc gọi

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

46 sẽ trả về danh sách mọi tệp trong thư mục làm việc hiện tại. Mã các vòng lặp trong danh sách này và chỉ thêm các tệp đó có phần mở rộng .pdf vào

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

47. Sau đó, danh sách này được sắp xếp theo thứ tự bảng chữ cái với đối số từ khóa

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

48 thành

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

41..pdf extension to

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

47 ❷. Afterward, this list is sorted in alphabetical order with the

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

48 keyword argument to

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

41 ❸.

Một đối tượng

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

0 được tạo để giữ các trang PDF kết hợp. Cuối cùng, một vài bình luận phác thảo phần còn lại của chương trình.

Bước 2: Mở từng tệp PDF

Bây giờ chương trình phải đọc từng tệp PDF trong

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

47. Thêm phần sau vào chương trình của bạn:

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

Đối với mỗi pdf, vòng lặp mở tên tệp ở chế độ đọc nhị phân bằng cách gọi

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

3 với

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

06 làm đối số thứ hai. Cuộc gọi

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

3 trả về một đối tượng

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

9, được truyền đến

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

9 để tạo đối tượng

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

8 cho tệp PDF đó.

Bước 3: Thêm từng trang

Đối với mỗi PDF, bạn sẽ muốn lặp qua mỗi trang ngoại trừ trang đầu tiên. Thêm mã này vào chương trình của bạn:

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

Mã bên trong vòng lặp

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

58 sao chép từng đối tượng

   >>> import PyPDF2
   >>> pdfReader = PyPDF2.PdfFileReader(open('encrypted.pdf', 'rb'))
❶ >>> pdfReader.isEncrypted
   True
   >>> pdfReader.getPage(0)
❷ Traceback (most recent call last):
     File "", line 1, in 
       pdfReader.getPage()
     --snip--
     File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1173, in getObject
       raise utils.PdfReadError("file has not been decrypted")
   PyPDF2.utils.PdfReadError: file has not been decrypted
❸ >>> pdfReader.decrypt('rosebud')
   1
   >>> pageObj = pdfReader.getPage(0)

5 riêng lẻ vào đối tượng

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

0. Hãy nhớ rằng, bạn muốn bỏ qua trang đầu tiên. Vì PYPDF2 coi

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❶ >>> page = pdfReader.getPage(0)
❷ >>> page.rotateClockwise(90)
   {'/Contents': [IndirectObject(961, 0), IndirectObject(962, 0),
   --snip--
   }
   >>> pdfWriter = PyPDF2.PdfFileWriter()
   >>> pdfWriter.addPage(page)
❸ >>> resultPdfFile = open('rotatedPage.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> resultPdfFile.close()
   >>> minutesFile.close()

4 là trang đầu tiên, vòng lặp của bạn sẽ bắt đầu tại

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

62 và sau đó đi lên, nhưng không bao gồm số nguyên trong

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

63.

Bước 4: Lưu kết quả

Sau khi các vòng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

58 lồng nhau được lặp lại, biến

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

65 sẽ chứa một đối tượng

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

0 với các trang cho tất cả các PDF được kết hợp. Bước cuối cùng là viết nội dung này vào một tệp trên ổ cứng. Thêm mã này vào chương trình của bạn:

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

Chuyển

   >>> import PyPDF2
   >>> pdfFile = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFile)
   >>> pdfWriter = PyPDF2.PdfFileWriter()
   >>> for pageNum in range(pdfReader.numPages):
           pdfWriter.addPage(pdfReader.getPage(pageNum))

❶ >>> pdfWriter.encrypt('swordfish')
   >>> resultPdf = open('encryptedminutes.pdf', 'wb')
   >>> pdfWriter.write(resultPdf)
   >>> resultPdf.close()

2 đến

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

3 Mở tệp PDF đầu ra, allminutes.pdf, ở chế độ ghi nhị phân. Sau đó, chuyển đối tượng

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

9 kết quả cho phương thức

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

7 tạo ra tệp PDF thực tế. Một cuộc gọi đến phương thức

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

71 kết thúc chương trình.allminutes.pdf, in write-binary mode. Then, passing the resulting

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

9 object to the

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❷ >>> minutesFirstPage = pdfReader.getPage(0)
❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
❺ >>> pdfWriter = PyPDF2.PdfFileWriter()
❻ >>> pdfWriter.addPage(minutesFirstPage)

❼ >>> for pageNum in range(1, pdfReader.numPages):
           pageObj = pdfReader.getPage(pageNum)
           pdfWriter.addPage(pageObj)
   >>> resultPdfFile = open('watermarkedCover.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> minutesFile.close()
   >>> resultPdfFile.close()

7 method creates the actual PDF file. A call to the

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

71 method finishes the program.

Ý tưởng cho các chương trình tương tự

Có thể tạo các tệp PDF từ các trang của các tệp PDF khác sẽ cho phép bạn thực hiện các chương trình có thể thực hiện như sau:

Cắt ra các trang cụ thể từ PDFS.
Sắp xếp lại các trang trong một tệp PDF.

Tạo PDF chỉ từ những trang có một số văn bản cụ thể, được xác định bởi

   >>> import PyPDF2
   >>> pdf1File = open('meetingminutes.pdf', 'rb')
   >>> pdf2File = open('meetingminutes2.pdf', 'rb')
❶ >>> pdf1Reader = PyPDF2.PdfFileReader(pdf1File)
❷ >>> pdf2Reader = PyPDF2.PdfFileReader(pdf2File)
❸ >>> pdfWriter = PyPDF2.PdfFileWriter()

   >>> for pageNum in range(pdf1Reader.numPages):
❹         pageObj = pdf1Reader.getPage(pageNum)
❺         pdfWriter.addPage(pageObj)

   >>> for pageNum in range(pdf2Reader.numPages):
❻         pageObj = pdf2Reader.getPage(pageNum)
❼         pdfWriter.addPage(pageObj)

❽ >>> pdfOutputFile = open('combinedminutes.pdf', 'wb')
   >>> pdfWriter.write(pdfOutputFile)
   >>> pdfOutputFile.close()
   >>> pdf1File.close()
   >>> pdf2File.close()

Tài liệu từ

Python có thể tạo và sửa đổi các tài liệu Word, có phần mở rộng tệp .docx, với mô -đun

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

73. Bạn có thể cài đặt mô -đun bằng cách chạy

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

74. (Phụ lục A có chi tiết đầy đủ về việc cài đặt các mô-đun bên thứ ba.).docx file extension, with the

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

73 module. You can install the module by running

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

74. (Appendix A has full details on installing third-party modules.)

Ghi chú

Khi sử dụng PIP để cài đặt Python-DOCX đầu tiên, hãy nhớ cài đặt

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

73, không phải

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

76. Tên cài đặt

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

76 dành cho một mô -đun khác mà cuốn sách này không bao gồm. Tuy nhiên, khi bạn sẽ nhập mô -đun

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

73, bạn sẽ cần phải chạy

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

79, chứ không phải

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

80.

Nếu bạn không có Word, LibreOffice Writer và OpenOffice Writer đều là các ứng dụng thay thế miễn phí cho Windows, OS X và Linux có thể được sử dụng để mở các tệp .docx. Bạn có thể tải xuống chúng từ https://www.libreoffice.org và http://openoffice.org, tương ứng. Tài liệu đầy đủ cho Python-docx có sẵn tại https://python-docx.readthedocs.org/. Mặc dù có một phiên bản của Word cho OS X, chương này sẽ tập trung vào Word cho Windows..docx files. You can download them from https://www.libreoffice.org and http://openoffice.org, respectively. The full documentation for Python-Docx is available at https://python-docx.readthedocs.org/. Although there is a version of Word for OS X, this chapter will focus on Word for Windows.

So với Plaintext, các tệp .docx có rất nhiều cấu trúc. Cấu trúc này được biểu thị bằng ba loại dữ liệu khác nhau trong Python-docx. Ở cấp độ cao nhất, một đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

81 đại diện cho toàn bộ tài liệu. Đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

81 chứa một danh sách các đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 cho các đoạn trong tài liệu. (Một đoạn mới bắt đầu bất cứ khi nào người dùng nhấn nhập hoặc trả về trong khi nhập vào tài liệu Word.) Đoạn câu đơn trong Hình & NBSP; 13-4 có bốn lần chạy..docx files have a lot of structure. This structure is represented by three different data types in Python-Docx. At the highest level, a

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

81 object represents the entire document. The

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

81 object contains a list of

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 objects for the paragraphs in the document. (A new paragraph begins whenever the user presses ENTER or RETURN while typing in a Word document.) Each of these

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 objects contains a list of one or more

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85 objects. The single-sentence paragraph in Figure 13-4 has four runs.

Hình & NBSP; 13-4. & NBSP; Các đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85 được xác định trong một đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

Văn bản trong một tài liệu Word không chỉ là một chuỗi. Nó có phông chữ, kích thước, màu sắc và thông tin kiểu dáng khác liên quan đến nó. Một phong cách trong Word là một tập hợp các thuộc tính này. Một đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85 là một văn bản chạy tiếp giáp với cùng một kiểu. Một đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85 mới là cần thiết bất cứ khi nào kiểu văn bản thay đổi.style in Word is a collection of these attributes. A

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85 object is a contiguous run of text with the same style. A new

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85 object is needed whenever the text style changes.

Đọc tài liệu từ

Hãy để thử nghiệm với mô -đun

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

73. Tải xuống demo.docx từ http://nostarch.com/automatestuff/ và lưu tài liệu vào thư mục làm việc. Sau đó nhập phần sau vào vỏ tương tác:demo.docx from http://nostarch.com/automatestuff/ and save the document to the working directory. Then enter the following into the interactive shell:

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

Tại ❶, chúng tôi mở một tệp .docx trong Python, gọi

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

91 và truyền Demo.Docx của FileName. Điều này sẽ trả về một đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

81, có thuộc tính

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

93 là danh sách các đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83. Khi chúng tôi gọi

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

95 vào

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

96, nó sẽ trả về

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

97, cho chúng tôi biết rằng có bảy đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 trong tài liệu này. Mỗi đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 này có thuộc tính

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

00 chứa một chuỗi văn bản trong đoạn đó (không có thông tin kiểu). Ở đây, thuộc tính

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

00 đầu tiên chứa

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

02 và thứ hai chứa

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

03..docx file in Python, call

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

91, and pass the filename demo.docx. This will return a

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

81 object, which has a

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

93 attribute that is a list of

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 objects. When we call

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

95 on

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

96, it returns

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

97, which tells us that there are seven

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 objects in this document ❷. Each of these

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 objects has a

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

00 attribute that contains a string of the text in that paragraph (without the style information). Here, the first

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

00 attribute contains

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

02 ❸, and the second contains

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

03 ❹.

Mỗi đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 cũng có thuộc tính

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

05 là danh sách các đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85. Các đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85 cũng có thuộc tính

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

00, chỉ chứa văn bản trong lần chạy cụ thể đó. Chúng ta hãy nhìn vào các thuộc tính

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

00 trong đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 thứ hai,

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

03. Gọi

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

95 trên đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 này cho chúng ta biết rằng có bốn đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85. Đối tượng chạy đầu tiên chứa

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

15. Sau đó, văn bản thay đổi thành một kiểu táo bạo, do đó

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

16 bắt đầu một đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85 mới. Văn bản trở về một kiểu chưa được đánh giá cao sau đó, dẫn đến một đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85 thứ ba,

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

19. Cuối cùng, đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85 thứ tư và cuối cùng chứa

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

21 theo kiểu in nghiêng.

Với Python-DOCX, các chương trình Python của bạn giờ đây sẽ có thể đọc văn bản từ tệp .docx và sử dụng nó giống như bất kỳ giá trị chuỗi nào khác..docx file and use it just like any other string value.

Nhận toàn bộ văn bản từ tệp .docx

Nếu bạn chỉ quan tâm đến văn bản, không phải thông tin kiểu dáng, trong tài liệu Word, bạn có thể sử dụng hàm

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

22. Nó chấp nhận tên tệp của tệp .docx và trả về một giá trị chuỗi duy nhất của văn bản của nó. Mở cửa sổ Trình chỉnh sửa tệp mới và nhập mã sau, lưu nó dưới dạng readDocx.py:.docx file and returns a single string value of its text. Open a new file editor window and enter the following code, saving it as readDocx.py:

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

Hàm

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

22 mở tài liệu Word, các vòng lặp trên tất cả các đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 trong danh sách

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

93, sau đó nối văn bản của họ vào danh sách trong

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

26. Sau vòng lặp, các chuỗi trong

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

26 được nối với nhau với các ký tự mới.

Chương trình ReadDocx.py có thể được nhập như bất kỳ mô -đun nào khác. Bây giờ nếu bạn chỉ cần văn bản từ một tài liệu Word, bạn có thể nhập những điều sau:readDocx.py program can be imported like any other module. Now if you just need the text from a Word document, you can enter the following:

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

Bạn cũng có thể điều chỉnh

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

22 để sửa đổi chuỗi trước khi trả về nó. Ví dụ: để thụt lề từng đoạn, thay thế cuộc gọi

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

29 trong readdocx.py bằng điều này:readDocx.py with this:

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

Để thêm một không gian kép ở giữa các đoạn, hãy thay đổi mã cuộc gọi

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

30 thành điều này:

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

Như bạn có thể thấy, chỉ cần một vài dòng mã để viết các chức năng sẽ đọc tệp .docx và trả về một chuỗi nội dung của nó theo ý thích của bạn..docx file and return a string of its content to your liking.

Đoạn đồ tạo kiểu và các đối tượng chạy

Trong Word for Windows, bạn có thể thấy các kiểu bằng cách nhấn Ctrl-Alt-Shift-S để hiển thị khung kiểu kiểu, trông giống như Hình & NBSP; 13-5. Trên OS X, bạn có thể xem ngăn Kiểu bằng cách nhấp vào mục menu Xem.CTRL-ALT-SHIFT-S to display the Styles pane, which looks like Figure 13-5. On OS X, you can view the Styles pane by clicking the View▸Styles menu item.

Hình & nbsp; 13-5. & Nbsp; hiển thị khung kiểu bằng cách nhấn ________ 231-S trên Windows.

Word và các bộ xử lý Word khác sử dụng các kiểu để giữ trình bày trực quan của các loại văn bản tương tự nhất quán và dễ thay đổi. Ví dụ, có lẽ bạn muốn thiết lập các đoạn văn trong văn bản 11 điểm, lần New Roman, vô lý bên trái, rách rưới. Bạn có thể tạo một kiểu với các cài đặt này và gán nó cho tất cả các đoạn văn. Sau đó, nếu sau đó bạn muốn thay đổi cách trình bày tất cả các đoạn văn trong tài liệu, bạn chỉ có thể thay đổi kiểu và tất cả các đoạn đó sẽ được tự động cập nhật.

Đối với các tài liệu từ, có ba loại kiểu: Kiểu đoạn có thể được áp dụng cho các đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83, các kiểu ký tự có thể được áp dụng cho các đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85 và các kiểu được liên kết có thể được áp dụng cho cả hai loại đối tượng. Bạn có thể cung cấp cho cả các kiểu đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 và

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85 bằng cách đặt thuộc tính

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

36 của chúng thành một chuỗi. Chuỗi này phải là tên của một phong cách. Nếu

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

36 được đặt thành

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

38, thì sẽ không có kiểu nào liên quan đến đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 hoặc

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85.Paragraph styles can be applied to

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 objects, character styles can be applied to

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85 objects, and linked styles can be applied to both kinds of objects. You can give both

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 and

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85 objects styles by setting their

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

36 attribute to a string. This string should be the name of a style. If

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

36 is set to

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

38, then there will be no style associated with the

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 or

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85 object.

Các giá trị chuỗi cho các kiểu từ mặc định như sau:

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

& nbsp;

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

& nbsp;

Khi đặt thuộc tính

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

36, không sử dụng khoảng trống trong tên kiểu. Ví dụ: trong khi tên kiểu có thể nhấn mạnh tinh tế, bạn nên đặt thuộc tính

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

36 cho giá trị chuỗi

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

77 thay vì

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

78. Bao gồm các không gian sẽ khiến Word đọc sai tên kiểu và không áp dụng nó.

Khi sử dụng kiểu được liên kết cho đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85, bạn sẽ cần thêm

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

80 vào cuối tên của nó. Ví dụ: để đặt kiểu liên kết báo giá cho đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83, bạn sẽ sử dụng

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

82, nhưng đối với đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85, bạn sẽ sử dụng

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

84.

Trong phiên bản hiện tại của Python-docx (0,7.4), các kiểu duy nhất có thể được sử dụng là các kiểu từ mặc định và các kiểu trong .Docx đã mở. Các phong cách mới không thể được tạo ra, mặc dù điều này có thể thay đổi trong các phiên bản tương lai của Python-docx..docx. New styles cannot be created—though this may change in future versions of Python-Docx.

Tạo tài liệu Word với kiểu chữ Nondefault

Nếu bạn muốn tạo các tài liệu Word sử dụng các kiểu vượt ra ngoài các tài liệu mặc định, bạn sẽ cần mở Word vào một tài liệu từ trống và tự tạo các kiểu bằng cách nhấp vào nút Kiểu mới ở dưới cùng của khung Styles (Hình & NBSP; 13-6 hiển thị điều này trên Windows).New Style button at the bottom of the Styles pane (Figure 13-6 shows this on Windows).

Điều này sẽ mở kiểu tạo mới từ hộp thoại định dạng, nơi bạn có thể nhập kiểu mới. Sau đó, quay trở lại vào vỏ tương tác và mở tài liệu từ trống này bằng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

91, sử dụng nó làm cơ sở cho tài liệu Word của bạn. Tên bạn đã đặt cho phong cách này bây giờ sẽ có sẵn để sử dụng với Python-docx.

Hình & NBSP; 13-6. & NBSP; Nút kiểu mới (trái) và tạo kiểu mới từ hộp thoại định dạng (phải)

Chạy các thuộc tính

Chạy có thể được tạo kiểu thêm bằng cách sử dụng các thuộc tính

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

00. Mỗi thuộc tính có thể được đặt thành một trong ba giá trị:

   >>> import PyPDF2
   >>> pdf1File = open('meetingminutes.pdf', 'rb')
   >>> pdf2File = open('meetingminutes2.pdf', 'rb')
❶ >>> pdf1Reader = PyPDF2.PdfFileReader(pdf1File)
❷ >>> pdf2Reader = PyPDF2.PdfFileReader(pdf2File)
❸ >>> pdfWriter = PyPDF2.PdfFileWriter()

   >>> for pageNum in range(pdf1Reader.numPages):
❹         pageObj = pdf1Reader.getPage(pageNum)
❺         pdfWriter.addPage(pageObj)

   >>> for pageNum in range(pdf2Reader.numPages):
❻         pageObj = pdf2Reader.getPage(pageNum)
❼         pdfWriter.addPage(pageObj)

❽ >>> pdfOutputFile = open('combinedminutes.pdf', 'wb')
   >>> pdfWriter.write(pdfOutputFile)
   >>> pdfOutputFile.close()
   >>> pdf1File.close()
   >>> pdf2File.close()

8 (thuộc tính luôn được bật, bất kể kiểu nào khác được áp dụng cho lần chạy),

   >>> import PyPDF2
   >>> pdf1File = open('meetingminutes.pdf', 'rb')
   >>> pdf2File = open('meetingminutes2.pdf', 'rb')
❶ >>> pdf1Reader = PyPDF2.PdfFileReader(pdf1File)
❷ >>> pdf2Reader = PyPDF2.PdfFileReader(pdf2File)
❸ >>> pdfWriter = PyPDF2.PdfFileWriter()

   >>> for pageNum in range(pdf1Reader.numPages):
❹         pageObj = pdf1Reader.getPage(pageNum)
❺         pdfWriter.addPage(pageObj)

   >>> for pageNum in range(pdf2Reader.numPages):
❻         pageObj = pdf2Reader.getPage(pageNum)
❼         pdfWriter.addPage(pageObj)

❽ >>> pdfOutputFile = open('combinedminutes.pdf', 'wb')
   >>> pdfWriter.write(pdfOutputFile)
   >>> pdfOutputFile.close()
   >>> pdf1File.close()
   >>> pdf2File.close()

9 (thuộc tính luôn luôn bị vô hiệu hóa) hoặc

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

38 (mặc định cho bất kỳ kiểu nào của Run là đặt thành).

Bảng & NBSP; 13-1 liệt kê các thuộc tính

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

00 có thể được đặt trên các đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85.

Bảng & NBSP; 13-1. & NBSP; ________ 185 đối tượng

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

00 thuộc tính

Thuộc tính	Sự mô tả
f = open('foobar.docx', 'rb') document = Document(f) f.close() # or with open('foobar.docx', 'rb') as f: source_stream = StringIO(f.read()) document = Document(source_stream) source_stream.close() ... target_stream = StringIO() document.save(target_stream) 94	Các văn bản xuất hiện in đậm.
f = open('foobar.docx', 'rb') document = Document(f) f.close() # or with open('foobar.docx', 'rb') as f: source_stream = StringIO(f.read()) document = Document(source_stream) source_stream.close() ... target_stream = StringIO() document.save(target_stream) 95	Các văn bản xuất hiện in nghiêng.
f = open('foobar.docx', 'rb') document = Document(f) f.close() # or with open('foobar.docx', 'rb') as f: source_stream = StringIO(f.read()) document = Document(source_stream) source_stream.close() ... target_stream = StringIO() document.save(target_stream) 96	Các văn bản được gạch chân.
f = open('foobar.docx', 'rb') document = Document(f) f.close() # or with open('foobar.docx', 'rb') as f: source_stream = StringIO(f.read()) document = Document(source_stream) source_stream.close() ... target_stream = StringIO() document.save(target_stream) 97	Văn bản xuất hiện với Strikethrough.
f = open('foobar.docx', 'rb') document = Document(f) f.close() # or with open('foobar.docx', 'rb') as f: source_stream = StringIO(f.read()) document = Document(source_stream) source_stream.close() ... target_stream = StringIO() document.save(target_stream) 98	Văn bản xuất hiện với gấp đôi strikethrough.
f = open('foobar.docx', 'rb') document = Document(f) f.close() # or with open('foobar.docx', 'rb') as f: source_stream = StringIO(f.read()) document = Document(source_stream) source_stream.close() ... target_stream = StringIO() document.save(target_stream) 99	Các văn bản xuất hiện bằng chữ in hoa.
>>> import PyPDF2 >>> pdfFileObj = open('meetingminutes.pdf', 'rb') >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj) ❶ >>> pdfReader.numPages 19 ❷ >>> pageObj = pdfReader.getPage(0) ❸ >>> pageObj.extractText() 'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015 \n The Board of Elementary and Secondary Education shall provide leadership and create policies for education that expand opportunities for children, empower families and communities, and advance Louisiana in an increasingly competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION ' 00	Văn bản xuất hiện bằng chữ in hoa, với chữ thường nhỏ hơn hai điểm.
>>> import PyPDF2 >>> pdfFileObj = open('meetingminutes.pdf', 'rb') >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj) ❶ >>> pdfReader.numPages 19 ❷ >>> pageObj = pdfReader.getPage(0) ❸ >>> pageObj.extractText() 'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015 \n The Board of Elementary and Secondary Education shall provide leadership and create policies for education that expand opportunities for children, empower families and communities, and advance Louisiana in an increasingly competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION ' 01	Văn bản xuất hiện với một cái bóng.
>>> import PyPDF2 >>> pdfFileObj = open('meetingminutes.pdf', 'rb') >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj) ❶ >>> pdfReader.numPages 19 ❷ >>> pageObj = pdfReader.getPage(0) ❸ >>> pageObj.extractText() 'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015 \n The Board of Elementary and Secondary Education shall provide leadership and create policies for education that expand opportunities for children, empower families and communities, and advance Louisiana in an increasingly competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION ' 02	Các văn bản xuất hiện được phác thảo thay vì rắn.
>>> import PyPDF2 >>> pdfFileObj = open('meetingminutes.pdf', 'rb') >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj) ❶ >>> pdfReader.numPages 19 ❷ >>> pageObj = pdfReader.getPage(0) ❸ >>> pageObj.extractText() 'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015 \n The Board of Elementary and Secondary Education shall provide leadership and create policies for education that expand opportunities for children, empower families and communities, and advance Louisiana in an increasingly competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION ' 03	Các văn bản được viết từ phải sang trái.
>>> import PyPDF2 >>> pdfFileObj = open('meetingminutes.pdf', 'rb') >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj) ❶ >>> pdfReader.numPages 19 ❷ >>> pageObj = pdfReader.getPage(0) ❸ >>> pageObj.extractText() 'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015 \n The Board of Elementary and Secondary Education shall provide leadership and create policies for education that expand opportunities for children, empower families and communities, and advance Louisiana in an increasingly competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION ' 04	Văn bản xuất hiện nhấn vào trang.
>>> import PyPDF2 >>> pdfFileObj = open('meetingminutes.pdf', 'rb') >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj) ❶ >>> pdfReader.numPages 19 ❷ >>> pageObj = pdfReader.getPage(0) ❸ >>> pageObj.extractText() 'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015 \n The Board of Elementary and Secondary Education shall provide leadership and create policies for education that expand opportunities for children, empower families and communities, and advance Louisiana in an increasingly competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION ' 05	Các văn bản xuất hiện ra khỏi trang trong cứu trợ.

Ví dụ: để thay đổi các kiểu demo.docx, hãy nhập phần sau vào vỏ tương tác:demo.docx, enter the following into the interactive shell:

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

Ở đây, chúng tôi sử dụng các thuộc tính

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

00 và

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

36 để dễ dàng xem những gì trong các đoạn văn trong tài liệu của chúng tôi. Chúng ta có thể thấy rằng nó rất đơn giản để chia một đoạn văn thành chạy và truy cập vào mỗi lần chạy. Vì vậy, chúng tôi có được các lần chạy thứ nhất, thứ hai và thứ tư trong đoạn thứ hai, tạo kiểu cho mỗi lần chạy và lưu kết quả vào một tài liệu mới.

Tiêu đề tài liệu Words ở đầu của restyled.docx sẽ có kiểu bình thường thay vì kiểu tiêu đề, đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85 cho văn bản một đoạn văn đơn giản với một số sẽ có kiểu quotechar và hai đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85 cho các từ in đậm và in nghiêng sẽ có các thuộc tính

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

96 của chúng được đặt thành

   >>> import PyPDF2
   >>> pdf1File = open('meetingminutes.pdf', 'rb')
   >>> pdf2File = open('meetingminutes2.pdf', 'rb')
❶ >>> pdf1Reader = PyPDF2.PdfFileReader(pdf1File)
❷ >>> pdf2Reader = PyPDF2.PdfFileReader(pdf2File)
❸ >>> pdfWriter = PyPDF2.PdfFileWriter()

   >>> for pageNum in range(pdf1Reader.numPages):
❹         pageObj = pdf1Reader.getPage(pageNum)
❺         pdfWriter.addPage(pageObj)

   >>> for pageNum in range(pdf2Reader.numPages):
❻         pageObj = pdf2Reader.getPage(pageNum)
❼         pdfWriter.addPage(pageObj)

❽ >>> pdfOutputFile = open('combinedminutes.pdf', 'wb')
   >>> pdfWriter.write(pdfOutputFile)
   >>> pdfOutputFile.close()
   >>> pdf1File.close()
   >>> pdf2File.close()

8. Hình & nbsp; 13-7 cho thấy các kiểu của các đoạn văn và chạy trông như thế nào trong restyled.docx.Document Title at the top of restyled.docx will have the Normal style instead of the Title style, the

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85 object for the text A plain paragraph with some will have the QuoteChar style, and the two

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85 objects for the words bold and italic will have their

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

96 attributes set to

   >>> import PyPDF2
   >>> pdf1File = open('meetingminutes.pdf', 'rb')
   >>> pdf2File = open('meetingminutes2.pdf', 'rb')
❶ >>> pdf1Reader = PyPDF2.PdfFileReader(pdf1File)
❷ >>> pdf2Reader = PyPDF2.PdfFileReader(pdf2File)
❸ >>> pdfWriter = PyPDF2.PdfFileWriter()

   >>> for pageNum in range(pdf1Reader.numPages):
❹         pageObj = pdf1Reader.getPage(pageNum)
❺         pdfWriter.addPage(pageObj)

   >>> for pageNum in range(pdf2Reader.numPages):
❻         pageObj = pdf2Reader.getPage(pageNum)
❼         pdfWriter.addPage(pageObj)

❽ >>> pdfOutputFile = open('combinedminutes.pdf', 'wb')
   >>> pdfWriter.write(pdfOutputFile)
   >>> pdfOutputFile.close()
   >>> pdf1File.close()
   >>> pdf2File.close()

8. Figure 13-7 shows how the styles of paragraphs and runs look in restyled.docx.

Hình & nbsp; 13-7. & Nbsp; tệp restyled.docxrestyled.docx file

Bạn có thể tìm thấy tài liệu đầy đủ hơn về việc sử dụng các kiểu Python-DOCX tại https://python-docx.readthedocs.org/en/latest/user/styles.html.https://python-docx.readthedocs.org/en/latest/user/styles.html.

Viết tài liệu từ

Nhập phần sau vào vỏ tương tác:

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

Để tạo tệp .docx của riêng bạn, hãy gọi

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

91 để trả về một đối tượng từ mới, trống

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

81. Phương thức tài liệu

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

14 thêm một đoạn văn bản mới vào tài liệu và trả về một tham chiếu đến đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 đã được thêm vào. Khi bạn đã hoàn thành thêm văn bản, hãy chuyển một chuỗi tên tệp vào phương thức tài liệu

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

16 để lưu đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

81 vào một tệp..docx file, call

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

91 to return a new, blank Word

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

81 object. The

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

14 document method adds a new paragraph of text to the document and returns a reference to the

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 object that was added. When you’re done adding text, pass a filename string to the

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

16 document method to save the

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

81 object to a file.

Điều này sẽ tạo một tệp có tên helloworld.docx trong thư mục làm việc hiện tại, khi được mở, trông giống như hình & nbsp; 13-8.helloworld.docx in the current working directory that, when opened, looks like Figure 13-8.

Hình & NBSP; 13-8. & NBSP; Tài liệu từ được tạo bằng

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

Bạn có thể thêm đoạn văn bằng cách gọi lại phương thức

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

14 với văn bản đoạn văn mới. Hoặc để thêm văn bản vào cuối đoạn văn hiện có, bạn có thể gọi phương thức đoạn văn

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

20 và chuyển nó một chuỗi. Nhập phần sau vào vỏ tương tác:

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

Tài liệu kết quả sẽ trông giống như Hình & NBSP; 13-9. Lưu ý rằng văn bản văn bản này đang được thêm vào đoạn thứ hai. đã được thêm vào đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 trong

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

22, đó là đoạn thứ hai được thêm vào

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

23. Các hàm

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

14 và

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

20 trả về đoạn văn và các đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85, để giúp bạn tiết kiệm rắc rối khi trích xuất chúng như một bước riêng biệt.This text is being added to the second paragraph. was added to the

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 object in

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

22, which was the second paragraph added to

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

23. The

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

14 and

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

20 functions return paragraph and

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85 objects, respectively, to save you the trouble of extracting them as a separate step.

Hãy nhớ rằng kể từ phiên bản Python-docx 0.5.3, các đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 mới chỉ có thể được thêm vào cuối tài liệu và các đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85 mới chỉ có thể được thêm vào cuối đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83.

Phương pháp

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

16 có thể được gọi lại để lưu các thay đổi bổ sung mà bạn đã thực hiện.

Hình & NBSP; 13-9. & NBSP; Tài liệu có nhiều đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 và

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85 được thêm

Cả

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

14 và

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

20 đều chấp nhận một đối số thứ hai tùy chọn là chuỗi của kiểu

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 hoặc

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85. Ví dụ:

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

Dòng này thêm một đoạn văn với văn bản Hello World! Trong phong cách tiêu đề.Hello world! in the Title style.

Thêm các tiêu đề

Gọi

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

37 thêm một đoạn văn với một trong những kiểu tiêu đề. Nhập phần sau vào vỏ tương tác:

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

Các đối số cho

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

37 là một chuỗi của văn bản tiêu đề và một số nguyên từ

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❶ >>> page = pdfReader.getPage(0)
❷ >>> page.rotateClockwise(90)
   {'/Contents': [IndirectObject(961, 0), IndirectObject(962, 0),
   --snip--
   }
   >>> pdfWriter = PyPDF2.PdfFileWriter()
   >>> pdfWriter.addPage(page)
❸ >>> resultPdfFile = open('rotatedPage.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> resultPdfFile.close()
   >>> minutesFile.close()

4 đến

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

40. Số nguyên

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❶ >>> page = pdfReader.getPage(0)
❷ >>> page.rotateClockwise(90)
   {'/Contents': [IndirectObject(961, 0), IndirectObject(962, 0),
   --snip--
   }
   >>> pdfWriter = PyPDF2.PdfFileWriter()
   >>> pdfWriter.addPage(page)
❸ >>> resultPdfFile = open('rotatedPage.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> resultPdfFile.close()
   >>> minutesFile.close()

4 làm cho tiêu đề kiểu tiêu đề, được sử dụng cho đầu tài liệu. Các số nguyên

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

62 đến

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

40 dành cho các cấp độ tiêu đề khác nhau, với

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

62 là tiêu đề chính và

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

40 là tiêu đề thấp nhất. Hàm

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

37 trả về một đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 để lưu cho bạn bước trích xuất nó từ đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

81 dưới dạng bước riêng biệt.

Tệp tiêu đề kết quả.docx sẽ trông giống như Hình & NBSP; 13-10.headings.docx file will look like Figure 13-10.

Hình & NBSP;headings.docx document with headings 0 to 4

Thêm dòng và ngắt trang

Để thêm một lần ngắt dòng (thay vì bắt đầu một đoạn hoàn toàn mới), bạn có thể gọi phương thức

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

49 trên đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85 mà bạn muốn có sự phá vỡ xuất hiện sau. Nếu bạn muốn thêm một lần ngắt trang, bạn cần chuyển giá trị

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

51 dưới dạng đối số đơn độc cho

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

49, như được thực hiện ở giữa ví dụ sau:

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

Điều này tạo ra một tài liệu từ hai trang với điều này là trên trang đầu tiên! Trên trang đầu tiên và đây là trên trang thứ hai! vào ngày thứ hai. Mặc dù vẫn còn nhiều không gian trên trang đầu tiên sau khi văn bản đây là trang đầu tiên !, Chúng tôi đã buộc đoạn tiếp theo bắt đầu trên một trang mới bằng cách chèn một trang ngắt sau lần chạy đầu tiên của đoạn đầu tiên.This is on the first page! on the first page and This is on the second page! on the second. Even though there was still plenty of space on the first page after the text This is on the first page!, we forced the next paragraph to begin on a new page by inserting a page break after the first run of the first paragraph ❶.

Thêm hình ảnh

Các đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

81 có phương thức

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

54 sẽ cho phép bạn thêm hình ảnh vào cuối tài liệu. Giả sử bạn có một tập tin zophie.png trong thư mục làm việc hiện tại. Bạn có thể thêm zophie.png vào cuối tài liệu của bạn với chiều rộng 1 inch và chiều cao 4 cm (từ có thể sử dụng cả đơn vị đế quốc và số liệu) bằng cách nhập các mục sau:zophie.png in the current working directory. You can add zophie.png to the end of your document with a width of 1 inch and height of 4 centimeters (Word can use both imperial and metric units) by entering the following:

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

Đối số đầu tiên là một chuỗi của tên tệp hình ảnh. Các đối số từ khóa

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

55 và

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

56 tùy chọn sẽ đặt chiều rộng và chiều cao của hình ảnh trong tài liệu. Nếu bị bỏ lại, chiều rộng và chiều cao sẽ mặc định theo kích thước bình thường của hình ảnh.

Bạn có thể thích chỉ định chiều cao và chiều rộng của hình ảnh trong các đơn vị quen thuộc như inch và centimet, vì vậy bạn có thể sử dụng các hàm

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

57 và

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

58 khi bạn chỉ định các đối số từ khóa

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

55 và

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

56.

Bản tóm tắt

Thông tin văn bản không chỉ dành cho các tệp Plaintext; Trên thực tế, nó rất có khả năng bạn đối phó với các tệp PDF và tài liệu từ thường xuyên hơn nhiều. Bạn có thể sử dụng mô -đun

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

6 để đọc và ghi tài liệu PDF. Thật không may, đọc văn bản từ các tài liệu PDF có thể không phải lúc nào cũng dẫn đến một bản dịch hoàn hảo sang một chuỗi vì định dạng tệp PDF phức tạp và một số PDF có thể không thể đọc được chút nào. Trong những trường hợp này, bạn đã hết may mắn trừ khi các bản cập nhật trong tương lai cho PYPDF2 hỗ trợ các tính năng PDF bổ sung.

Tài liệu từ đáng tin cậy hơn và bạn có thể đọc chúng với mô -đun

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

73. Bạn có thể thao tác văn bản trong các tài liệu từ thông qua các đối tượng

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

83 và

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

85. Các đối tượng này cũng có thể được cung cấp các kiểu, mặc dù chúng phải là từ tập hợp các kiểu hoặc kiểu mặc định đã có trong tài liệu. Bạn có thể thêm các đoạn văn mới, tiêu đề, phá vỡ và hình ảnh vào tài liệu, mặc dù chỉ đến cuối cùng.

Nhiều hạn chế đi kèm với việc làm việc với PDF và tài liệu từ là do các định dạng này được hiển thị độc đáo cho người đọc của con người, thay vì dễ phân tích bằng phần mềm. Chương tiếp theo sẽ xem xét hai định dạng phổ biến khác để lưu trữ thông tin: các tệp JSON và CSV. Các định dạng này được thiết kế để được sử dụng bởi các máy tính và bạn sẽ thấy rằng Python có thể hoạt động với các định dạng này dễ dàng hơn nhiều.

Thực hành câu hỏi

Q:	1. Giá trị chuỗi của tên tệp PDF không được truyền đến hàm >>> import PyPDF2 >>> pdfFileObj = open('meetingminutes.pdf', 'rb') >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj) ❶ >>> pdfReader.numPages 19 ❷ >>> pageObj = pdfReader.getPage(0) ❸ >>> pageObj.extractText() 'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015 \n The Board of Elementary and Secondary Education shall provide leadership and create policies for education that expand opportunities for children, empower families and communities, and advance Louisiana in an increasingly competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION ' 9. Thay vào đó, bạn chuyển đến chức năng là gì?not passed to the >>> import PyPDF2 >>> pdfFileObj = open('meetingminutes.pdf', 'rb') >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj) ❶ >>> pdfReader.numPages 19 ❷ >>> pageObj = pdfReader.getPage(0) ❸ >>> pageObj.extractText() 'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015 \n The Board of Elementary and Secondary Education shall provide leadership and create policies for education that expand opportunities for children, empower families and communities, and advance Louisiana in an increasingly competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION ' 9 function. What do you pass to the function instead?
Q:	2. Các chế độ nào các đối tượng >>> import PyPDF2 >>> minutesFile = open('meetingminutes.pdf', 'rb') ❷ >>> pdfReader = PyPDF2.PdfFileReader(minutesFile) ❷ >>> minutesFirstPage = pdfReader.getPage(0) ❸ >>> pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb')) ❹ >>> minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0)) ❺ >>> pdfWriter = PyPDF2.PdfFileWriter() ❻ >>> pdfWriter.addPage(minutesFirstPage) ❼ >>> for pageNum in range(1, pdfReader.numPages): pageObj = pdfReader.getPage(pageNum) pdfWriter.addPage(pageObj) >>> resultPdfFile = open('watermarkedCover.pdf', 'wb') >>> pdfWriter.write(resultPdfFile) >>> minutesFile.close() >>> resultPdfFile.close() 9 cho >>> import PyPDF2 >>> pdfFileObj = open('meetingminutes.pdf', 'rb') >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj) ❶ >>> pdfReader.numPages 19 ❷ >>> pageObj = pdfReader.getPage(0) ❸ >>> pageObj.extractText() 'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015 \n The Board of Elementary and Secondary Education shall provide leadership and create policies for education that expand opportunities for children, empower families and communities, and advance Louisiana in an increasingly competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION ' 67 và >>> import PyPDF2 >>> pdfFileObj = open('meetingminutes.pdf', 'rb') >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj) ❶ >>> pdfReader.numPages 19 ❷ >>> pageObj = pdfReader.getPage(0) ❸ >>> pageObj.extractText() 'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015 \n The Board of Elementary and Secondary Education shall provide leadership and create policies for education that expand opportunities for children, empower families and communities, and advance Louisiana in an increasingly competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION ' 68 cần được mở?
Q:	3. Làm thế nào để bạn có được một đối tượng >>> import PyPDF2 >>> pdfReader = PyPDF2.PdfFileReader(open('encrypted.pdf', 'rb')) ❶ >>> pdfReader.isEncrypted True >>> pdfReader.getPage(0) ❷ Traceback (most recent call last): File "", line 1, in pdfReader.getPage() --snip-- File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1173, in getObject raise utils.PdfReadError("file has not been decrypted") PyPDF2.utils.PdfReadError: file has not been decrypted ❸ >>> pdfReader.decrypt('rosebud') 1 >>> pageObj = pdfReader.getPage(0) 5 cho về cuốn sách này từ một đối tượng >>> import PyPDF2 >>> pdfFileObj = open('meetingminutes.pdf', 'rb') >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj) ❶ >>> pdfReader.numPages 19 ❷ >>> pageObj = pdfReader.getPage(0) ❸ >>> pageObj.extractText() 'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015 \n The Board of Elementary and Secondary Education shall provide leadership and create policies for education that expand opportunities for children, empower families and communities, and advance Louisiana in an increasingly competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION ' 8?
Q:	4. Biến >>> import PyPDF2 >>> pdfFileObj = open('meetingminutes.pdf', 'rb') >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj) ❶ >>> pdfReader.numPages 19 ❷ >>> pageObj = pdfReader.getPage(0) ❸ >>> pageObj.extractText() 'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015 \n The Board of Elementary and Secondary Education shall provide leadership and create policies for education that expand opportunities for children, empower families and communities, and advance Louisiana in an increasingly competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION ' 8 lưu trữ số lượng trang trong tài liệu PDF?
Q:	5
Q:	6. Bạn sử dụng phương pháp nào để xoay một trang?
Q:	7. Phương thức nào trả về một đối tượng document = Document('existing-document-file.docx') document.save('new-file-name.docx') 81 cho một tệp có tên demo.docx?demo.docx?
Q:	8. Sự khác biệt giữa đối tượng document = Document('existing-document-file.docx') document.save('new-file-name.docx') 83 và đối tượng document = Document('existing-document-file.docx') document.save('new-file-name.docx') 85 là gì?
Q:	9. Làm thế nào để bạn có được một danh sách các đối tượng document = Document('existing-document-file.docx') document.save('new-file-name.docx') 83 cho một đối tượng document = Document('existing-document-file.docx') document.save('new-file-name.docx') 81 mà Lừa được lưu trữ trong một biến có tên >>> import PyPDF2 >>> pdfFileObj = open('meetingminutes.pdf', 'rb') >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj) ❶ >>> pdfReader.numPages 19 ❷ >>> pageObj = pdfReader.getPage(0) ❸ >>> pageObj.extractText() 'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015 \n The Board of Elementary and Secondary Education shall provide leadership and create policies for education that expand opportunities for children, empower families and communities, and advance Louisiana in an increasingly competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION ' 23?
Q:	10. Loại đối tượng nào có f = open('foobar.docx', 'rb') document = Document(f) f.close() # or with open('foobar.docx', 'rb') as f: source_stream = StringIO(f.read()) document = Document(source_stream) source_stream.close() ... target_stream = StringIO() document.save(target_stream) 94, f = open('foobar.docx', 'rb') document = Document(f) f.close() # or with open('foobar.docx', 'rb') as f: source_stream = StringIO(f.read()) document = Document(source_stream) source_stream.close() ... target_stream = StringIO() document.save(target_stream) 96, f = open('foobar.docx', 'rb') document = Document(f) f.close() # or with open('foobar.docx', 'rb') as f: source_stream = StringIO(f.read()) document = Document(source_stream) source_stream.close() ... target_stream = StringIO() document.save(target_stream) 95, f = open('foobar.docx', 'rb') document = Document(f) f.close() # or with open('foobar.docx', 'rb') as f: source_stream = StringIO(f.read()) document = Document(source_stream) source_stream.close() ... target_stream = StringIO() document.save(target_stream) 97 và >>> import PyPDF2 >>> pdfFileObj = open('meetingminutes.pdf', 'rb') >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj) ❶ >>> pdfReader.numPages 19 ❷ >>> pageObj = pdfReader.getPage(0) ❸ >>> pageObj.extractText() 'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015 \n The Board of Elementary and Secondary Education shall provide leadership and create policies for education that expand opportunities for children, empower families and communities, and advance Louisiana in an increasingly competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION ' 02 biến?
Q:	11. Sự khác biệt giữa việc cài đặt biến f = open('foobar.docx', 'rb') document = Document(f) f.close() # or with open('foobar.docx', 'rb') as f: source_stream = StringIO(f.read()) document = Document(source_stream) source_stream.close() ... target_stream = StringIO() document.save(target_stream) 94 thành >>> import PyPDF2 >>> pdf1File = open('meetingminutes.pdf', 'rb') >>> pdf2File = open('meetingminutes2.pdf', 'rb') ❶ >>> pdf1Reader = PyPDF2.PdfFileReader(pdf1File) ❷ >>> pdf2Reader = PyPDF2.PdfFileReader(pdf2File) ❸ >>> pdfWriter = PyPDF2.PdfFileWriter() >>> for pageNum in range(pdf1Reader.numPages): ❹ pageObj = pdf1Reader.getPage(pageNum) ❺ pdfWriter.addPage(pageObj) >>> for pageNum in range(pdf2Reader.numPages): ❻ pageObj = pdf2Reader.getPage(pageNum) ❼ pdfWriter.addPage(pageObj) ❽ >>> pdfOutputFile = open('combinedminutes.pdf', 'wb') >>> pdfWriter.write(pdfOutputFile) >>> pdfOutputFile.close() >>> pdf1File.close() >>> pdf2File.close() 8, >>> import PyPDF2 >>> pdf1File = open('meetingminutes.pdf', 'rb') >>> pdf2File = open('meetingminutes2.pdf', 'rb') ❶ >>> pdf1Reader = PyPDF2.PdfFileReader(pdf1File) ❷ >>> pdf2Reader = PyPDF2.PdfFileReader(pdf2File) ❸ >>> pdfWriter = PyPDF2.PdfFileWriter() >>> for pageNum in range(pdf1Reader.numPages): ❹ pageObj = pdf1Reader.getPage(pageNum) ❺ pdfWriter.addPage(pageObj) >>> for pageNum in range(pdf2Reader.numPages): ❻ pageObj = pdf2Reader.getPage(pageNum) ❼ pdfWriter.addPage(pageObj) ❽ >>> pdfOutputFile = open('combinedminutes.pdf', 'wb') >>> pdfWriter.write(pdfOutputFile) >>> pdfOutputFile.close() >>> pdf1File.close() >>> pdf2File.close() 9 hoặc f = open('foobar.docx', 'rb') document = Document(f) f.close() # or with open('foobar.docx', 'rb') as f: source_stream = StringIO(f.read()) document = Document(source_stream) source_stream.close() ... target_stream = StringIO() document.save(target_stream) 38 là gì?
Q:	12. Làm thế nào để bạn tạo một đối tượng document = Document('existing-document-file.docx') document.save('new-file-name.docx') 81 cho một tài liệu Word mới?
Q:	13. Làm thế nào để bạn thêm một đoạn văn với văn bản >>> import PyPDF2 >>> pdfFileObj = open('meetingminutes.pdf', 'rb') >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj) ❶ >>> pdfReader.numPages 19 ❷ >>> pageObj = pdfReader.getPage(0) ❸ >>> pageObj.extractText() 'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015 \n The Board of Elementary and Secondary Education shall provide leadership and create policies for education that expand opportunities for children, empower families and communities, and advance Louisiana in an increasingly competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION ' 91 vào một đối tượng document = Document('existing-document-file.docx') document.save('new-file-name.docx') 81 được lưu trữ trong một biến có tên >>> import PyPDF2 >>> pdfFileObj = open('meetingminutes.pdf', 'rb') >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj) ❶ >>> pdfReader.numPages 19 ❷ >>> pageObj = pdfReader.getPage(0) ❸ >>> pageObj.extractText() 'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015 \n The Board of Elementary and Secondary Education shall provide leadership and create policies for education that expand opportunities for children, empower families and communities, and advance Louisiana in an increasingly competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION ' 23?
Q:	14. Những số nguyên nào đại diện cho các mức tiêu đề có sẵn trong các tài liệu từ?

Thực hành các dự án

Để thực hành, hãy viết các chương trình làm như sau.

PDF hoang tưởng

Sử dụng hàm

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

94 từ Chương & NBSP; 9, hãy viết một tập lệnh sẽ đi qua mọi PDF trong một thư mục (và các thư mục con của nó) và mã hóa các tệp PDF bằng mật khẩu được cung cấp trên dòng lệnh. Lưu mỗi pdf được mã hóa bằng một hậu tố _encrypted.pdf được thêm vào tên tệp gốc. Trước khi xóa tệp gốc, hãy cố gắng đọc và giải mã tệp để đảm bảo rằng nó được mã hóa chính xác._encrypted.pdf suffix added to the original filename. Before deleting the original file, have the program attempt to read and decrypt the file to ensure that it was encrypted correctly.

Sau đó, viết một chương trình tìm thấy tất cả các tệp PDF được mã hóa trong một thư mục (và các thư mục con của nó) và tạo một bản sao được giải mã của PDF bằng cách sử dụng mật khẩu được cung cấp. Nếu mật khẩu không chính xác, chương trình sẽ in tin nhắn cho người dùng và tiếp tục chuyển sang tệp PDF tiếp theo.

Lời mời tùy chỉnh làm tài liệu từ

Giả sử bạn có một tệp văn bản của tên khách. Tệp khách này.TXT có một tên trên mỗi dòng, như sau:guests.txt file has one name per line, as follows:

f = open('foobar.docx', 'rb')
document = Document(f)
f.close()

# or

with open('foobar.docx', 'rb') as f:
    source_stream = StringIO(f.read())
document = Document(source_stream)
source_stream.close()
...
target_stream = StringIO()
document.save(target_stream)

Viết một chương trình tạo ra một tài liệu Word với các lời mời tùy chỉnh trông giống như Hình & NBSP; 13-11.

Vì Python-docx chỉ có thể sử dụng những kiểu đã tồn tại trong tài liệu Word, trước tiên bạn sẽ phải thêm các kiểu này vào một tệp từ trống và sau đó mở tệp đó bằng python-docx. Cần có một lời mời trên mỗi trang trong tài liệu Word kết quả, vì vậy hãy gọi

   >>> import PyPDF2
   >>> pdfFileObj = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
❶ >>> pdfReader.numPages
   19
❷ >>> pageObj = pdfReader.getPage(0)
❸ >>> pageObj.extractText()
   'OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of March 7, 2015
   \n     The Board of Elementary and Secondary Education shall provide leadership
   and create policies for education that expand opportunities for children,
   empower families and communities, and advance Louisiana in an increasingly
   competitive global market. BOARD of ELEMENTARY and SECONDARY EDUCATION '

49 để thêm một trang ngắt sau đoạn cuối của mỗi lời mời. Bằng cách này, bạn sẽ chỉ cần mở một tài liệu từ để in tất cả các lời mời cùng một lúc.

Hình & nbsp; 13-11. & Nbsp; tài liệu từ được tạo bởi tập lệnh mời tùy chỉnh của bạn

Bạn có thể tải xuống một tệp khách.TXT từ http://nostarch.com/automatestuff/.guests.txt file from http://nostarch.com/automatestuff/.

Bộ ngắt mật khẩu PDF Brute-Force

Giả sử bạn có một bản PDF được mã hóa mà bạn đã quên mật khẩu, nhưng bạn nhớ đó là một từ tiếng Anh duy nhất. Cố gắng đoán mật khẩu quên của bạn là một nhiệm vụ khá nhàm chán. Thay vào đó, bạn có thể viết một chương trình sẽ giải mã PDF bằng cách thử mọi từ tiếng Anh có thể cho đến khi nó tìm thấy một chương trình hoạt động. Đây được gọi là một cuộc tấn công mật khẩu vũ phu. Tải xuống từ điển tệp văn bản.txt từ http://nostarch.com/automatestuff/. Tệp từ điển này chứa hơn 44.000 từ tiếng Anh với một từ trên mỗi dòng.brute-force password attack. Download the text file dictionary.txt from http://nostarch.com/automatestuff/. This dictionary file contains over 44,000 English words with one word per line.

Sử dụng các kỹ năng đọc tệp bạn đã học trong Chương & NBSP; 8, tạo một danh sách các chuỗi từ bằng cách đọc tệp này. Sau đó lặp qua từng từ trong danh sách này, chuyển nó sang phương thức

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❶ >>> page = pdfReader.getPage(0)
❷ >>> page.rotateClockwise(90)
   {'/Contents': [IndirectObject(961, 0), IndirectObject(962, 0),
   --snip--
   }
   >>> pdfWriter = PyPDF2.PdfFileWriter()
   >>> pdfWriter.addPage(page)
❸ >>> resultPdfFile = open('rotatedPage.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> resultPdfFile.close()
   >>> minutesFile.close()

0. Nếu phương thức này trả về số nguyên

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❶ >>> page = pdfReader.getPage(0)
❷ >>> page.rotateClockwise(90)
   {'/Contents': [IndirectObject(961, 0), IndirectObject(962, 0),
   --snip--
   }
   >>> pdfWriter = PyPDF2.PdfFileWriter()
   >>> pdfWriter.addPage(page)
❸ >>> resultPdfFile = open('rotatedPage.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> resultPdfFile.close()
   >>> minutesFile.close()

4, mật khẩu đã sai và chương trình của bạn sẽ tiếp tục chuyển sang mật khẩu tiếp theo. Nếu

   >>> import PyPDF2
   >>> minutesFile = open('meetingminutes.pdf', 'rb')
   >>> pdfReader = PyPDF2.PdfFileReader(minutesFile)
❶ >>> page = pdfReader.getPage(0)
❷ >>> page.rotateClockwise(90)
   {'/Contents': [IndirectObject(961, 0), IndirectObject(962, 0),
   --snip--
   }
   >>> pdfWriter = PyPDF2.PdfFileWriter()
   >>> pdfWriter.addPage(page)
❸ >>> resultPdfFile = open('rotatedPage.pdf', 'wb')
   >>> pdfWriter.write(resultPdfFile)
   >>> resultPdfFile.close()
   >>> minutesFile.close()

0 trả về

document = Document('existing-document-file.docx')
document.save('new-file-name.docx')

62, thì chương trình của bạn sẽ thoát ra khỏi vòng lặp và in mật khẩu bị hack. Bạn nên thử cả hình thức chữ hoa và trường hợp thấp hơn của mỗi từ. .

Có python

Win32com → Làm việc với các tệp MS Word .doc Mặc dù dễ sử dụng, mô-đun Python-Docx không thể thực hiện phần mở rộng .doc và tin hay không,. Tệp tài liệu vẫn là trình xử lý văn bản cho nhiều bên liên quan (mặc dù Docx đã tồn tại trong hơn một thập kỷ).the python-docx module cannot take in the aging . doc extension, and believe it or not, . doc file is still the go-to word processor for lots of stakeholders (despite the . docx being around for over a decade).

Python có thể đọc các tệp Word không?

Gói DOCX2TXT Đây là gói Python cho phép bạn xóa văn bản và hình ảnh từ các tài liệu từ. Ví dụ dưới đây đọc trong một tài liệu từ có chứa Zen of Python. Như bạn có thể thấy, một khi chúng tôi đã nhập DocX2TXT, tất cả những gì chúng tôi cần là một dòng mã để đọc trong văn bản từ tài liệu từ. This is a Python package that allows you to scrape text and images from Word Documents. The example below reads in a Word Document containing the Zen of Python. As you can see, once we've imported docx2txt, all we need is one line of code to read in the text from the Word Document.

Làm cách nào để mở một tài liệu từ trong Python?

Khoa học dữ liệu thực tế bằng cách sử dụng Python để đọc một tài liệu từ mà chúng tôi nhận trợ giúp của mô -đun có tên Docx.Trước tiên chúng tôi cài đặt DOCX như hình dưới đây.Sau đó viết một chương trình để sử dụng các chức năng khác nhau trong mô -đun DOCX để đọc toàn bộ tệp theo các đoạn văn.Chúng tôi sử dụng lệnh dưới đây để đưa mô -đun DOCX vào môi trường của chúng tôi.To read a word document we take help of the module named docx. We first install docx as shown below. Then write a program to use the different functions in docx module to read the entire file by paragraphs. We use the below command to get the docx module into our environment.

Tài liệu DOCX trong Python là gì?

Python-DOCX là một thư viện Python để tạo và cập nhật các tệp Microsoft Word (.docx).a Python library for creating and updating Microsoft Word (. docx) files.

Hướng dẫn can python open docx files? - python có thể mở tệp docx không?

Mở một tài liệu

Thực sự mở một tài liệu

Mở một tài liệu giống như tệp

Tài liệu PDF

Trích xuất văn bản từ PDFS

Giải mã PDFS

Tạo pdfs

Trang sao chép

Ghi chú

Trang quay

Lớp phủ trang

Mã hóa PDFS

Dự án: Kết hợp các trang chọn lọc từ nhiều tệp PDF

Bước 1: Tìm tất cả các tệp PDF

Bước 2: Mở từng tệp PDF

Bước 3: Thêm từng trang

Bước 4: Lưu kết quả

Ý tưởng cho các chương trình tương tự

Tài liệu từ

Ghi chú

Đọc tài liệu từ

Nhận toàn bộ văn bản từ tệp .docx

Đoạn đồ tạo kiểu và các đối tượng chạy

Tạo tài liệu Word với kiểu chữ Nondefault

Chạy các thuộc tính

Viết tài liệu từ

Thêm các tiêu đề

Thêm dòng và ngắt trang

Thêm hình ảnh

Bản tóm tắt

Thực hành câu hỏi

Thực hành các dự án

PDF hoang tưởng

Lời mời tùy chỉnh làm tài liệu từ

Bộ ngắt mật khẩu PDF Brute-Force

Có python

Python có thể đọc các tệp Word không?

Làm cách nào để mở một tài liệu từ trong Python?

Tài liệu DOCX trong Python là gì?

Bài Viết Liên Quan

Quảng Cáo

Có thể bạn quan tâm

Toplist được quan tâm

Quảng cáo

Xem Nhiều

Quảng cáo

Chúng tôi

Điều khoản

Trợ giúp

Mạng xã hội