Reading binary file in Python and looping over each byte
New in Python 3.5 is the pathlib
module, which has a convenience method specifically to read in a file as bytes, allowing us to iterate over the bytes. I consider this a decent [if quick and dirty] answer:
import pathlib
for byte in pathlib.Path[path].read_bytes[]:
print[byte]
Interesting that this is the only answer to mention pathlib
.
In Python 2, you probably would do this [as Vinay Sajip also suggests]:
with open[path, 'b'] as file:
for byte in file.read[]:
print[byte]
In the case
that the file may be too large to iterate over in-memory, you would chunk it, idiomatically, using the iter
function with the callable, sentinel
signature - the Python 2 version:
with open[path, 'b'] as file:
callable = lambda: file.read[1024]
sentinel = bytes[] # or b''
for chunk in iter[callable, sentinel]:
for byte in chunk:
print[byte]
[Several other answers mention this, but few offer a sensible read size.]
Best practice for large files or buffered/interactive reading
Let's create a function to do this, including idiomatic uses of the standard library for Python 3.5+:
from pathlib import Path
from functools import partial
from io import DEFAULT_BUFFER_SIZE
def file_byte_iterator[path]:
"""given a path, return an iterator over the file
that lazily loads the file
"""
path = Path[path]
with path.open['rb'] as file:
reader = partial[file.read1, DEFAULT_BUFFER_SIZE]
file_iterator = iter[reader, bytes[]]
for chunk in file_iterator:
yield from chunk
Note that we use file.read1
. file.read
blocks until
it gets all the bytes requested of it or EOF
. file.read1
allows us to avoid blocking, and it can return more quickly because of this. No other answers mention this as well.
Demonstration of best practice usage:
Let's make a file with a megabyte [actually mebibyte] of pseudorandom data:
import random
import pathlib
path = 'pseudorandom_bytes'
pathobj = pathlib.Path[path]
pathobj.write_bytes[
bytes[random.randint[0, 255] for _ in range[2**20]]]
Now let's iterate over it and materialize it in memory:
>>> l = list[file_byte_iterator[path]]
>>> len[l]
1048576
We can inspect any part of the data, for example, the last 100 and first 100 bytes:
>>> l[-100:]
[208, 5, 156, 186, 58, 107, 24, 12, 75, 15, 1, 252, 216, 183, 235, 6, 136, 50, 222, 218, 7, 65, 234, 129, 240, 195, 165, 215, 245, 201, 222, 95, 87, 71, 232, 235, 36, 224, 190, 185, 12, 40, 131, 54, 79, 93, 210, 6, 154, 184, 82, 222, 80, 141, 117, 110, 254, 82, 29, 166, 91, 42, 232, 72, 231, 235, 33, 180, 238, 29, 61, 250, 38, 86, 120, 38, 49, 141, 17, 190, 191, 107, 95, 223, 222, 162, 116, 153, 232, 85, 100, 97, 41, 61, 219, 233, 237, 55, 246, 181]
>>> l[:100]
[28, 172, 79, 126, 36, 99, 103, 191, 146, 225, 24, 48, 113, 187, 48, 185, 31, 142, 216, 187, 27, 146, 215, 61, 111, 218, 171, 4, 160, 250, 110, 51, 128, 106, 3, 10, 116, 123, 128, 31, 73, 152, 58, 49, 184, 223, 17, 176, 166, 195, 6, 35, 206, 206, 39, 231, 89, 249, 21, 112, 168, 4, 88, 169, 215, 132, 255, 168, 129, 127, 60, 252, 244, 160, 80, 155, 246, 147, 234, 227, 157, 137, 101, 84, 115, 103, 77, 44, 84, 134, 140, 77, 224, 176, 242, 254, 171, 115, 193, 29]
Don't iterate by lines for binary files
Don't do the following - this pulls a chunk of arbitrary size until it gets to a newline character - too slow when the chunks are too small, and possibly too large as well:
with open[path, 'rb'] as file:
for chunk in file: # text newline iteration - not for bytes
yield from chunk
The above is only good for what are semantically human readable text files [like plain text, code, markup, markdown etc... essentially anything ascii, utf, latin, etc... encoded] that you should open without the 'b'
flag.
In this Python tutorial, we will learn how to read a binary file in python, and also we will cover these topics:
- How to read a binary file to an array in Python
- How to read a binary file into a byte array in Python
- How to read a binary file line by line in Python
- Python read a binary file to Ascii
- How to read a binary file into a NumPy array in Python
- How to read a binary file into CSV in Python
Here, we will see how to read a binary file in Python.
- Before reading a file we have to write the file. In this example, I have opened a file using file = open[“document.bin”,”wb”] and used the “wb” mode to write the binary file.
- The document.bin is the name of the file.
- I have taken a variable as a sentence and assigned a sentence “This is good”, To decode the sentence, I have used sentence = bytearray[“This is good”.encode[“ascii”]].
- And to write the sentence in the file, I have used the file.write[] method.
- The write[] is used to write the specified text to the file. And then to close the file, I have used the file.close[].
Example to write the file:
file = open["document.bin","wb"]
sentence = bytearray["This is good".encode["ascii"]]
file.write[sentence]
file.close[]
- To read the file, I have taken the already created file document.bin and used the “rb” mode to read the binary file.
- The document.bin is the file name. And, I have using the read[] method. The read[] method returns the specified number of bytes from the file.
Example to read the file:
file = open["document.bin","rb"]
print[file.read[4]]
file.close[]
In this output, you can see that I have used print[file.read[4]]. Here, from the sentence, it will read only four words. As shown in the output.
You may like Python Pandas CSV Tutorial and File does not exist Python.
Python read a binary file to an array
Here, we can see how to read a binary file to an array in Python.
- In this example, I have opened a file as array.bin and used the “wb” mode to write thebinary file. The array.bin is the name of the file.
- And assigned an array as num=[2,4,6,8,10] to get the array in byte converted format, I have used bytearray[]. The bytearray[] method returns the byte array objects.
- To writes the array in the file, I have used the file.write[]. And file.close[] to close the file.
Example to write an array to the file:
file=open["array.bin","wb"]
num=[2,4,6,8,10]
array=bytearray[num]
file.write[array]
file.close[]
- To read the written array from the file, I have used the same file i.e,file=open[“array.bin”,”rb”].
- The “rb” mode is used to read the array from the file.
- The list[] function is used to create the list object number=list[file.read[3]]. The file.read[] is used to read the bytes from the file.
- The file.read[3] is used to read-only three numbers from the array. The file.close[] is used to close the file.
Example to read an array from the file:
file=open["array.bin","rb"]
number=list[file.read[3]]
print [number]
file.close[]
To get the output, I have used print[number]. And to close the file, I have used file.close[]. In the below screenshot you can see the output.
- How to Convert Python string to byte array with Examples
- Python Array with Examples
- Create an empty array in Python
Python read a binary file into a byte array
Now, we can see how to read a binary file into a byte array in Python.
- In this example, I have opened a file called sonu.bin and “rb” mode is used to read a binary file, and sonu.bin is the name of the file. Here, I have stored some data in the sonu.bin file.
- The byte = file.read[3] is used to read the file, and file.read[3] is used to read only 3 bytes from the file.
- The while loop is used to read and iterate all the bytes from the file.
Example:
file = open["sonu.bin", "rb"]
byte = file.read[3]
while byte:
print[byte]
byte = file.read[3]
To read the byte from the file, I have used print[byte]. You can refer to the below screenshot for the output.
Python read a binary file line by line
Here, we can see how to read a binary file line by line in Python.
- In this example, I have taken a line as lines=[“Welcome to python guides\n”] and open a file named as file=open[“document1.txt”,”wb”] document1.txt is the filename.
- The “wb” is the mode used to write the binary files. The file.writelines[lines] is used to write the lines from the file.
- The writelines[] returns the sequence of string to the file. The file.close[] method is used to close the file.
Example to write the file:
lines=["Welcome to python guides\n"]
file=open["document1.txt","wb"]
file.writelines[lines]
file.close[]
- To read the written file, I have used the same filename as document1.txt, I have used file=open[“document1.txt”,”rb”] to open the file, “rb” mode is used to read the binary file and, To read the line from the file I have used line=file.readline[].
- The readline[] returns one line from the file.
Example to read the file:
file=open["document1.txt","rb"]
line=file.readline[]
print[line]
file.close[]
To get the output, print[line] is used and lastly to close the file, I have used file.close[].
Python read a binary file to Ascii
Now, we can see how to read a binary file to Ascii in Python.
- In this example, I have opened a file named test.bin using file = open[‘test.bin’, ‘wb’], The ‘wb’ mode is used to write the binary file and I have taken a variable as a sentence and assigned a sentence = ‘Hello Python’. To encode the sentence.
- I have used file_encode = sentence.encode[‘ASCII’]. To write the encoded sentence in the file, I have used the file.write[file_encode].
- The file.seek[] method returns the new position. To read the written file, I have used the file.read[] which returns a byte from the file.
- And then to convert the binary sentence into Ascii, I have used new_sentence = bdata. decode[‘ASCII’].
Example:
file = open['test.bin', 'wb']
sentence = 'Hello Python'
file_encode = sentence.encode['ASCII']
file.write[file_encode]
file.seek[0]
bdata = file.read[]
print['Binary sentence', bdata]
new_sentence = bdata.decode['ASCII']
print['ASCII sentence', new_sentence]
To get the output as an encoded sentence, I have used print[‘ASCII sentence’, new_sentence]. You can refer to the below screenshot for the output.
Python read a binary file into a NumPy array
Here, we can see how to read a binary file into a numpy array in Python.
- In this example, I have imported a module called NumPy. The array = np.array[[2,8,7]] is used to create an array, The .tofile is used to write all the array to the file. The array.bin is the name of the binary file.
- The np.fromfile is used to construct an array from the data in the file. The dtype=np.int8 is the datatype object. The output of the array changes if we change np.int8 to int32 or int64.
Example:
import numpy as np
array = np.array[[2,8,7]].tofile["array.bin"]
print[np.fromfile["array.bin", dtype=np.int8]]
To get the output, I have used print[np.fromfile[“array.bin”, dtype=np.int8]]. The below screenshot shows the output.
Python read a binary file into CSV
Here, we can see how to read binary file into csv in Python.
- In this example, I have imported a module called CSV. The CSV module is a comma-separated value module. It is used to read and write tabular data in CSV format.
- I have opened a file called lock.bin and “w” mode is used to write the file writer = csv.writer[f] is used to write the objects in the file. The lock.bin is the name of the file.
- The writer[] returns the write object which converts data into a string.
- The writer.writerows is used to write all the rows into the file. To close the file, f.close[] is used.
Example to write the csv file:
import csv
f = open["lock.bin", "w"]
writer = csv.writer[f]
writer.writerows[[["a", 1], ["b", 2], ["c", 3], ["d",4]]]
f.close[]
To read the CSV file, I have opened the file lock.bin in which data is already written, The ‘r‘ mode is used to read the file. To read the CSV file, I have used reader = csv.reader[file] to return a list of rows from the file.
Example to read the csv file:
import csv
with open['lock.bin', 'r'] as file:
reader = csv.reader[file]
for row in reader:
print[row]
To get the output I have used print[row]. The below screenshot shows the output.
You may like the following Python tutorials:
- How to draw a shape in python using Turtle
- Python ask for user input [Examples]
- How to Convert Python string to byte array with Examples
- Python pass by reference or value with examples
- Python select from a list + Examples
- Union of sets Python + Examples
- Introduction to Python Interface
- How to convert a String to DateTime in Python
- Python list comprehension using if-else
In this tutorial we have learned about Python read a binary file, also we have covered these topics:
- Python read a binary file to an array
- Python read a binary file into a byte array
- Python read a binary file line by line
- Python read a binary file to Ascii
- Python read a binary file into a NumPy array
- Python read a binary file into CSV
Python is one of the most popular languages in the United States of America. I have been working with Python for a long time and I have expertise in working with various libraries on Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… I have experience in working with various clients in countries like United States, Canada, United Kingdom, Australia, New Zealand, etc. Check out my profile.