How to read a text file and split in python

Given this file:

$ cat words.txt
line1 word1 word2
line2 word3 word4
line3 word5 word6

If you just want one word at a time (ignoring the meaning of spaces vs line breaks in the file):

with open('words.txt','r') as f:
    for line in f:
        for word in line.split():
           print(word)    

Prints:

line1
word1
word2
line2
...
word6 

Similarly, if you want to flatten the file into a single flat list of words in the file, you might do something like this:

with open('words.txt') as f:
    flat_list=[word for line in f for word in line.split()]

>>> flat_list
['line1', 'word1', 'word2', 'line2', 'word3', 'word4', 'line3', 'word5', 'word6']

Which can create the same output as the first example with print '\n'.join(flat_list)...

Or, if you want a nested list of the words in each line of the file (for example, to create a matrix of rows and columns from a file):

with open('words.txt') as f:
    matrix=[line.split() for line in f]

>>> matrix
[['line1', 'word1', 'word2'], ['line2', 'word3', 'word4'], ['line3', 'word5', 'word6']]

If you want a regex solution, which would allow you to filter wordN vs lineN type words in the example file:

import re
with open("words.txt") as f:
    for line in f:
        for word in re.findall(r'\bword\d+', line):
            # wordN by wordN with no lineN

Or, if you want that to be a line by line generator with a regex:

 with open("words.txt") as f:
     (word for line in f for word in re.findall(r'\w+', line))

Python is one of the most popular programming languages in the world. One reason for its popularity is that Python makes it easy to work with data.

Reading data from a text file is a routine task in Python. In this post, we’re going to look at the fastest way to read and split a text file using Python. Splitting the data will convert the text to a list, making it easier to work with.

We’ll also cover some other methods for splitting text files in Python, and explain how and when these methods are useful.

In the following examples, we’ll see how Python can help us master reading text data. Taking advantage of Python’s many built-in functions will simplify our tasks.

Introducing the split() method

The fastest way to split text in Python is with the split() method. This is a built-in method that is useful for separating a string into its individual parts.

The split() method will return a list of the elements in a string. By default, Python uses whitespace to split the string, but you can provide a delimiter and specify what character(s) to use instead.

For example, a comma(,) is often used to separate string data. This is the case with Comma Separated Value (CSV) files. Whatever you choose as the separator, Python will use to split the string.

Splitting text file with the split() method

In our first example, we have a text file of employee data, including the names of employees, their phone numbers, and occupations.

We’ll need to write a Python program that can read this randomly generated information and split the data into lists.

How to read a text file and split in python

employee_data.txt
Lana Anderson 485-3094-88 Electrician
Elian Johnston 751-5845-87 Interior Designer
Henry Johnston 777-6561-52 Astronomer
Dale Johnston 248-1843-09 Journalist
Luke Owens 341-7471-63 Teacher
Amy Perry 494-3532-17 Electrician
Chloe Baker 588-7165-01 Interior Designer

After using a Python with statement to open the data file, we can iterate through the file’s contents with a for loop. Once the data is read, the split() method is used to separate the text into words.

In our case, the text is separated using whitespace, which is the default behavior of the split() method.

Example 1: Splitting employee data with Python

with open("employee_data.txt",'r') as data_file:
    for line in data_file:
        data = line.split()
        print(data)

Output

['Lana', 'Anderson', '485-3094-88', 'Electrician']
['Elian', 'Johnston', '751-5845-87', 'Interior', 'Designer']
['Henry', 'Johnston', '777-6561-52', 'Astronomer']
['Dale', 'Johnston', '248-1843-09', 'Journalist']
['Luke', 'Owens', '341-7471-63', 'Teacher']
['Amy', 'Perry', '494-3532-17', 'Electrician']
['Chloe', 'Baker', '588-7165-01', 'Interior', 'Designer']

Splitting strings with a comma

We provide an optional separator to the split() method to specify which character to split the string with. The default delimiter is whitespace.

In the next example, we’ll use a comma to split test score data read from a file.

grades.txt
Janet,100,50,69
Thomas,99,76,100
Kate,102,78,65

Example 2: Splitting grades with a comma

with open("grades.txt",'r') as file:
    for line in file:
        grade_data = line.strip().split(',')
        print(grade_data)

The strip() method is used here to remove the newline character (\n) from the end of the lines.

Output

['Janet', '100', '50', '69']
['Thomas', '99', '76', '100']
['Kate', '102', '78', '65']

Splitting a text file with splitlines()

The splitlines() method is used to get a list of the lines in a text file. For the next examples, we’ll pretend we run a website that’s dedicated to a theatre company. We’re reading script data from text files and pushing it to the company’s website.

juliet.txt
O Romeo, Romeo, wherefore art thou Romeo?
Deny thy father and refuse thy name.
Or if thou wilt not, be but sworn my love
And I’ll no longer be a Capulet.

We can read the file and split the lines into a list with the splitlines() method. Afterwards, a for loop can be used to print the contents of the text data.

Example 3: Using splitlines() to read a text file

with open("juliet.txt",'r') as script:
    speech = script.read().splitlines()

for line in speech:
    print(line)

In Python, a generator is a special routine that can be used to create an array. A generator is similar to a function that returns an array, but it does so one element at a time.

Generators use the yield keyword. When Python encounters a yield statement, it stores the state of the function until later, when the generator is called again.

In the next example, we’ll use a generator to read the beginning of Romeo’s famous speech from Shakespeare’s Romeo and Juliet. Using the yield keyword ensures that the state of our while loop is saved during each iteration. This can be useful when working with large files.

romeo.txt
But soft, what light through yonder window breaks?
It is the east, and Juliet is the sun.
Arise, fair sun, and kill the envious moon,
Who is already sick and pale with grief
That thou, her maid, art far more fair than she.

Example 4: Splitting a text file with a generator

def generator_read(file_name):
    file = open(file_name,'r')
    while True:
        line = file.readline()
        if not line:
            file.close()
            break
        yield line

file_data = generator_read("romeo.txt")
for line in file_data:
    print(line.split())

Reading File Data with List Comprehension

Python list comprehension provides an elegant solution for working with lists. We can take advantage of shorter syntax to write our code with list comprehension. In addition, list comprehension statements are usually easier to read.

In our previous examples, we’ve had to use a for loop to read the text files. We can exchange our for loop for a single line of code using list comprehension.

List Comprehension Syntax:
my_list = [expression for element in list]

Once the data has been obtained via list comprehension, we use the split() method to separate the lines and add them to a new list.

Using the same romeo.txt file from the previous example, let’s see how list comprehension can provide a more elegant approach to splitting a text file in Python.

Example 5: Using list comprehension to read file data

with open("romeo.txt",'r') as file:
    lines = [line.strip() for line in file]

for line in lines:
    print(line.split())

Split a Text File into Multiple Smaller Files

What if we have a large file that we’d like to split into smaller files? We split a large file in Python using for loops and slicing.

With list slicing, we tell Python we want to work with a specific range of elements from a given list. This is done by providing a start point and end point for the slice.

In Python, a list can be sliced using a colon. In the following example, we’ll use list slicing to split a text file into multiple smaller files.

Split a File with List Slicing

A list can be split using Python list slicing. To do so, we first read the file using the readlines() method. Next, the top half of the file is written to a new file called romeo_A.txt. We’ll use list slicing within this for loop to write the first half of the original file to a new file.

Using a second for loop, we’ll write the rest of the text to another file. In order to perform the slice, we need the len() method to find the total number of lines in the original file.

Lastly, the int() method is used to convert the result of the division to an integer value.

Example 6: Splitting a single text file into multiple text files

with open("romeo.txt",'r') as file:
    lines = file.readlines()

with open("romeo_A.txt",'w') as file:
    for line in lines[:int(len(lines)/2)]:
        file.write(line)

with open("romeo_B.txt",'w') as file:
    for line in lines[int(len(lines)/2):]:
        file.write(line)

Running this program in the same directory as romeo.txt will create the following text files.

romeo_A.txt
But soft, what light through yonder window breaks?
It is the east, and Juliet is the sun.

romeo_B.txt
Arise, fair sun, and kill the envious moon,
Who is already sick and pale with grief
That thou, her maid, art far more fair than she.

We’ve seen how to use the split() method to split a text file. Additionally, our examples have shown how split() is used in tandem with Python generators and list comprehension to read large files more elegantly.

Taking advantage of Python’s many built-in methods, such as split() and readlines(), allows us to process text files more quickly. Using these tools will save us time and effort.

If you’re serious about mastering Python, it’s a good idea to invest some time in learning how to use these methods to prepare your own solutions.

If you’d like to learn more about programming with Python, please visit the following tutorials from Python for Beginners.

  • How a Python comment can make or break your program
  • Turbo charge your code with Python list comprehension

Course: Python 3 For Beginners

Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.

How do you split data in a text file in Python?

We can use a for loop to iterate through the contents of the data file after opening it with Python's 'with' statement. After reading the data, the split() method is used to split the text into words. The split() method by default separates text using whitespace.

How do I read a delimited text file in Python?

How to read a newline-delimited text file in Python.
a_file = open("sample.txt").
file_contents = a_file. read().
contents_split = file_contents. splitlines().
print(contents_split).
a_file. close().

How do you separate text in Python?

Python String split() Method The split() method splits a string into a list. You can specify the separator, default separator is any whitespace. Note: When maxsplit is specified, the list will contain the specified number of elements plus one.

How do I split a text file?

How to split a TXT document online.
Select and upload your TXT document for splitting..
Specify desired page numbers and click Split Now button..
Once your TXT document is splitted click on Download Now button..
Use Email button to send download link over email..

Tải thêm tài liệu liên quan đến bài viết How to read a text file and split in python