How to read a text file and split in python
Given this file:
If you just want one word at a time (ignoring the meaning of spaces vs line breaks in the file):
Similarly, if you want to flatten the file into a single flat list of words in the file, you might do something like this:
Which can create the same output as the first example with
Or, if you want a nested list of the words in each line of the file (for example, to create a matrix of rows and columns from a file):
If you want a regex solution, which would allow you to filter
Or, if you want that to be a line by line generator with a regex:
Python is one of the most popular programming languages in the world. One reason for its popularity is that Python makes it easy to work with data.
Reading data from a text file is a routine task in Python. In this post, we’re going to look at the fastest way to read and split a text file using Python. Splitting the data will convert the text to a list, making it easier to work with.
We’ll also cover some other methods for splitting text files in Python, and explain how and when these methods are useful.
In the following examples, we’ll see how Python can help us master reading text data. Taking advantage of Python’s many built-in functions will simplify our tasks.
Introducing the split() method
The fastest way to split text in Python is with the split() method. This is a built-in method that is useful for separating a string into its individual parts.
The split() method will return a list of the elements in a string. By default, Python uses whitespace to split the string, but you can provide a delimiter and specify what character(s) to use instead.
For example, a comma(,) is often used to separate string data. This is the case with Comma Separated Value (CSV) files. Whatever you choose as the separator, Python will use to split the string.
Splitting text file with the split() method
In our first example, we have a text file of employee data, including the names of employees, their phone numbers, and occupations.
We’ll need to write a Python program that can read this randomly generated information and split the data into lists.
After using a Python with statement to open the data file, we can iterate through the file’s contents with a for loop. Once the data is read, the split() method is used to separate the text into words.
In our case, the text is separated using whitespace, which is the default behavior of the split() method.
Example 1: Splitting employee data with Python
Splitting strings with a comma
We provide an optional separator to the split() method to specify which character to split the string with. The default delimiter is whitespace.
In the next example, we’ll use a comma to split test score data read from a file.
Example 2: Splitting grades with a comma
The strip() method is used here to remove the newline character (\n) from the end of the lines.
Splitting a text file with splitlines()
The splitlines() method is used to get a list of the lines in a text file. For the next examples, we’ll pretend we run a website that’s dedicated to a theatre company. We’re reading script data from text files and pushing it to the company’s website.
We can read the file and split the lines into a list with the splitlines() method. Afterwards, a for loop can be used to print the contents of the text data.
Example 3: Using splitlines() to read a text file
In Python, a generator is a special routine that can be used to create an array. A generator is similar to a function that returns an array, but it does so one element at a time.
Generators use the yield keyword. When Python encounters a yield statement, it stores the state of the function until later, when the generator is called again.
In the next example, we’ll use a generator to read the beginning of Romeo’s famous speech from Shakespeare’s Romeo and Juliet. Using the yield keyword ensures that the state of our while loop is saved during each iteration. This can be useful when working with large files.
Example 4: Splitting a text file with a generator
Reading File Data with List Comprehension
Python list comprehension provides an elegant solution for working with lists. We can take advantage of shorter syntax to write our code with list comprehension. In addition, list comprehension statements are usually easier to read.
In our previous examples, we’ve had to use a for loop to read the text files. We can exchange our for loop for a single line of code using list comprehension.
List Comprehension Syntax:
Once the data has been obtained via list comprehension, we use the split() method to separate the lines and add them to a new list.
Using the same romeo.txt file from the previous example, let’s see how list comprehension can provide a more elegant approach to splitting a text file in Python.
Example 5: Using list comprehension to read file data
Split a Text File into Multiple Smaller Files
What if we have a large file that we’d like to split into smaller files? We split a large file in Python using for loops and slicing.
With list slicing, we tell Python we want to work with a specific range of elements from a given list. This is done by providing a start point and end point for the slice.
In Python, a list can be sliced using a colon. In the following example, we’ll use list slicing to split a text file into multiple smaller files.
Split a File with List Slicing
A list can be split using Python list slicing. To do so, we first read the file using the readlines() method. Next, the top half of the file is written to a new file called romeo_A.txt. We’ll use list slicing within this for loop to write the first half of the original file to a new file.
Using a second for loop, we’ll write the rest of the text to another file. In order to perform the slice, we need the len() method to find the total number of lines in the original file.
Lastly, the int() method is used to convert the result of the division to an integer value.
Example 6: Splitting a single text file into multiple text files
Running this program in the same directory as romeo.txt will create the following text files.
We’ve seen how to use the split() method to split a text file. Additionally, our examples have shown how split() is used in tandem with Python generators and list comprehension to read large files more elegantly.
Taking advantage of Python’s many built-in methods, such as split() and readlines(), allows us to process text files more quickly. Using these tools will save us time and effort.
If you’re serious about mastering Python, it’s a good idea to invest some time in learning how to use these methods to prepare your own solutions.
If you’d like to learn more about programming with Python, please visit the following tutorials from Python for Beginners.
Recommended Python Training
Course: Python 3 For Beginners
Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.
How do you split data in a text file in Python?
We can use a for loop to iterate through the contents of the data file after opening it with Python's 'with' statement. After reading the data, the split() method is used to split the text into words. The split() method by default separates text using whitespace.
How do I read a delimited text file in Python?
How to read a newline-delimited text file in Python.
a_file = open("sample.txt").
file_contents = a_file. read().
contents_split = file_contents. splitlines().
How do you separate text in Python?
Python String split() Method The split() method splits a string into a list. You can specify the separator, default separator is any whitespace. Note: When maxsplit is specified, the list will contain the specified number of elements plus one.
How do I split a text file?
How to split a TXT document online.
Select and upload your TXT document for splitting..
Specify desired page numbers and click Split Now button..
Once your TXT document is splitted click on Download Now button..
Use Email button to send download link over email..