How do i read bytes in python?

Reading binary file in Python and looping over each byte

New in Python 3.5 is the pathlib module, which has a convenience method specifically to read in a file as bytes, allowing us to iterate over the bytes. I consider this a decent (if quick and dirty) answer:

import pathlib

for byte in pathlib.Path(path).read_bytes():
    print(byte)

Interesting that this is the only answer to mention pathlib.

In Python 2, you probably would do this (as Vinay Sajip also suggests):

with open(path, 'b') as file:
    for byte in file.read():
        print(byte)

In the case that the file may be too large to iterate over in-memory, you would chunk it, idiomatically, using the iter function with the callable, sentinel signature - the Python 2 version:

with open(path, 'b') as file:
    callable = lambda: file.read(1024)
    sentinel = bytes() # or b''
    for chunk in iter(callable, sentinel): 
        for byte in chunk:
            print(byte)

(Several other answers mention this, but few offer a sensible read size.)

Best practice for large files or buffered/interactive reading

Let's create a function to do this, including idiomatic uses of the standard library for Python 3.5+:

from pathlib import Path
from functools import partial
from io import DEFAULT_BUFFER_SIZE

def file_byte_iterator(path):
    """given a path, return an iterator over the file
    that lazily loads the file
    """
    path = Path(path)
    with path.open('rb') as file:
        reader = partial(file.read1, DEFAULT_BUFFER_SIZE)
        file_iterator = iter(reader, bytes())
        for chunk in file_iterator:
            yield from chunk

Note that we use file.read1. file.read blocks until it gets all the bytes requested of it or EOF. file.read1 allows us to avoid blocking, and it can return more quickly because of this. No other answers mention this as well.

Demonstration of best practice usage:

Let's make a file with a megabyte (actually mebibyte) of pseudorandom data:

import random
import pathlib
path = 'pseudorandom_bytes'
pathobj = pathlib.Path(path)

pathobj.write_bytes(
  bytes(random.randint(0, 255) for _ in range(2**20)))

Now let's iterate over it and materialize it in memory:

>>> l = list(file_byte_iterator(path))
>>> len(l)
1048576

We can inspect any part of the data, for example, the last 100 and first 100 bytes:

>>> l[-100:]
[208, 5, 156, 186, 58, 107, 24, 12, 75, 15, 1, 252, 216, 183, 235, 6, 136, 50, 222, 218, 7, 65, 234, 129, 240, 195, 165, 215, 245, 201, 222, 95, 87, 71, 232, 235, 36, 224, 190, 185, 12, 40, 131, 54, 79, 93, 210, 6, 154, 184, 82, 222, 80, 141, 117, 110, 254, 82, 29, 166, 91, 42, 232, 72, 231, 235, 33, 180, 238, 29, 61, 250, 38, 86, 120, 38, 49, 141, 17, 190, 191, 107, 95, 223, 222, 162, 116, 153, 232, 85, 100, 97, 41, 61, 219, 233, 237, 55, 246, 181]
>>> l[:100]
[28, 172, 79, 126, 36, 99, 103, 191, 146, 225, 24, 48, 113, 187, 48, 185, 31, 142, 216, 187, 27, 146, 215, 61, 111, 218, 171, 4, 160, 250, 110, 51, 128, 106, 3, 10, 116, 123, 128, 31, 73, 152, 58, 49, 184, 223, 17, 176, 166, 195, 6, 35, 206, 206, 39, 231, 89, 249, 21, 112, 168, 4, 88, 169, 215, 132, 255, 168, 129, 127, 60, 252, 244, 160, 80, 155, 246, 147, 234, 227, 157, 137, 101, 84, 115, 103, 77, 44, 84, 134, 140, 77, 224, 176, 242, 254, 171, 115, 193, 29]

Don't iterate by lines for binary files

Don't do the following - this pulls a chunk of arbitrary size until it gets to a newline character - too slow when the chunks are too small, and possibly too large as well:

    with open(path, 'rb') as file:
        for chunk in file: # text newline iteration - not for bytes
            yield from chunk

The above is only good for what are semantically human readable text files (like plain text, code, markup, markdown etc... essentially anything ascii, utf, latin, etc... encoded) that you should open without the 'b' flag.

Binary files are files that are not normal text files. Example: An Image File. These files are also stored as a sequence of bytes in the computer hard disk. These types of binary files cannot be opened in the normal mode and read as text.

You can read binary file by opening the file in binary mode using the open('filename', 'rb').

When working with the problems like image classification in Machine learning, you may need to open the file in binary mode and read the bytes to create ML models. In this situation, you can open the file in binary mode, and read the file as bytes. In this case, decoding of bytes to the relevant characters will not be attempted. On the other hand, when you open a normal file in the normal read mode, the bytes will be decoded to string or the other relevant characters based on the file encoding.

If You’re in Hurry…

You can open the file using open() method by passing b parameter to open it in binary mode and read the file bytes.

open('filename', "rb") opens the binary file in read mode.

r– To specify to open the file in reading mode
b – To specify it’s a binary file. No decoding of bytes to string attempt will be made.

Example

The below example reads the file one byte at a time and prints the byte.

try:
    with open("c:\temp\Binary_File.jpg", "rb") as f:
        byte = f.read(1)
        while byte:
            # Do stuff with byte.
            byte = f.read(1)
            print(byte)
except IOError:
     print('Error While Opening the file!')  

If You Want to Understand Details, Read on…

In this tutorial, you’ll learn how to read binary files in different ways.

  • Read binary file byte by byte
  • Python Read Binary File into Byte Array
  • Python read binary file into numpy array
  • Read binary file Line by Line
  • Read Binary File Fully in One Shot
  • Python Read Binary File and Convert to Ascii
  • Read binary file into dataframe
  • Read binary file skip header
  • Readind Binary file using Pickle
  • Conclusion

Read binary file byte by byte

In this section, you’ll learn how to read a binary file byte by byte and print it. This is one of the fastest ways to read the binary file.

The file is opened using the open() method and the mode is mentioned as “rb” which means opening the file in reading mode and denoting it’s a binary file. In this case, decoding of the bytes to string will not be made. It’ll just be read as bytes.

The below example shows how the file is read byte by byte using the file.read(1) method.

The parameter value 1 ensures one byte is read during each read() method call.

Example

try:
    with open("c:\temp\Binary_File.jpg", "rb") as f:
        byte = f.read(1)
        while byte:
            # Do stuff with byte.
            byte = f.read(1)
            print(byte)
except IOError:
     print('Error While Opening the file!')  

Output

    b'\xd8'
    b'\xff'
    b'\xe0'
    b'\x00'
    b'\x10'
    b'J'
    b'F'
    b'I'
    b'F'
    b'\x00'
    b'\x01'
    b'\x01'
    b'\x00'
    b'\x00'
    b'\x01'
    b'\x00'
    b'\x01'
    b'\x00'
    b'\x00'
    b'\xff'
    b'\xed'
    b'\x00'
    b'|'
    b'P'
    b'h'
    b'o'
    b't'
    b'o'
    b's'
    b'h'
    b'o'
    b'p'
    b' '
    b'3'
    b'.'
    b'0'
    b'\xc6'
    b'\xb3'
    b'\xff'
    b'\xd9'
    b''

Python Read Binary File into Byte Array

In this section, you’ll learn how to read the binary files into a byte array.

First, the file is opened in therb mode.

A byte array called mybytearray is initialized using the bytearray() method.

Then the file is read one byte at a time using f.read(1) and appended to the byte array using += operator. Each byte is appended to the bytearray.

At last, you can print the bytearray to display the bytes that are read.

Example

try:
    with open("c:\temp\Binary_File.jpg", "rb") as f:

        mybytearray = bytearray()

        # Do stuff with byte.
        mybytearray+=f.read(1)
        mybytearray+=f.read(1)
        mybytearray+=f.read(1)
        mybytearray+=f.read(1)
        mybytearray+=f.read(1)

        print(mybytearray)

except IOError:
    print('Error While Opening the file!')    

Output

    bytearray(b'\xff\xd8\xff\xe0\x00\x10')

Python read binary file into numpy array

In this section, you’ll learn how to read the binary file into a NumPy array.

First, import numpy as np to import the numpy library.

Then specify the datatype as bytes for the np object using np.dtype('B')

Next, open the binary file in reading mode.

Now, create the NumPy array using the fromfile() method using the np object.

Parameters are the file object and the datatype initialized as bytes. This will create a NumPy array of bytes.

numpy_data = np.fromfile(f,dtype)

Example

import numpy as np

dtype = np.dtype('B')
try:
    with open("c:\temp\Binary_File.jpg", "rb") as f:
        numpy_data = np.fromfile(f,dtype)
    print(numpy_data)
except IOError:
    print('Error While Opening the file!')    

Output

[255 216 255 ... 179 255 217]


The bytes are read into the numpy array and the bytes are printed.

Read binary file Line by Line

In this section, you’ll learn how to read binary file line by line.

You can read the file line by line using the readlines() method available in the file object.

Each line will be stored as an item in the list. This list can be iterated to access each line of the file.

rstrip() method is used to remove the spaces in the beginning and end of the lines while printing the lines.

Example

f = open("c:\temp\Binary_File.jpg",'rb')

lines = f.readlines()

for line in lines:

    print(line.rstrip())

Output

    b'\x07\x07\x07\x07'
    b''
    b''
    b''
    b''
    b''
    b'\x0c\x0f\x0c\x0c\x0c\x0c\x0c\x0c\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x12\x12\x12\x12\x12\x12\x15\x15\x15\x15\x15\x17\x17\x17\x17\x17\x17\x17\x17\x17\x17\xff\xdb\x00C\x01\x04\x04\x04\x06\x06\x06'
    b'\x06\x06'

Read Binary File Fully in One Shot

In this section, you’ll learn how to read binary file in one shot.

You can do this by passing -1 to the file.read() method. This will read the binary file fully in one shot as shown below.

Example

try:
    f = open("c:\temp\Binary_File.jpg", 'rb')
    while True:
        binarycontent = f.read(-1)  
        if not binarycontent:
            break
        print(binarycontent)
except IOError:
    print('Error While Opening the file!')

Output

 b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xed\x00|Photoshop 3.0\x008BIM\x04\x04\x00\x00\x00\x00\x00\x1c\x02(\x00ZFBMD2300096c010000fe0e000032160000051b00003d2b000055300000d6360000bb3c0000ce4100008b490000\x00\xff\xdb\x00C\x00\x03\x03\x03\x03\x03\x03\x05\x03\x03\x05\x07\x05\x05\x05\x07\n\x07\x07\x07\x07\n\x0c\n\n\n\n\n\x0c\x0f\x0c\x0c\x0c\x0c\x0c\x0c\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x12\x12\x12\x12\x12\x12\x15\x15\x15\x15\x15\x17\x17\x17\x17\x17\x17\x17\x17\x17\x17\xff\xdb\x00C\x01\x04\x04\x04\x06\x06\x06\n\x06\x06\n\x18\x11\x0e\x11\x18\x18\x18\x18\x18\x18\x18\x18\x18\x18\x18\x18\x18\x18\x18

Python Read Binary File and Convert to Ascii

In this section, you’ll learn how to read a binary file and convert to ASCII using the binascii library. This will convert all the bytes into ASCII characters.

Read the file as binary as explained in the previous section.

Next, use the method binascii.b2a_uu(bytes). This will convert the bytes into ascii and return an ascii value.

Then you can print this to check the ascii characters.

Example

import binascii

try:
    with open("c:\temp\Binary_File.jpg", "rb") as f:

        mybytes = f.read(45)

        data_bytes2ascii = binascii.b2a_uu(mybytes)

        print("Binary String to Ascii")

        print(data_bytes2ascii)

except IOError:

    print("Error While opening the file!")

Output

 Binary String to Ascii
 b'M_]C_X  02D9)[email protected] ! 0   0 !  #_[0!\\4&AO=&]S:&]P(#,N,  X0DE-! 0 \n'

Read binary file into dataframe

In this section, you’ll learn how to read the binary file into pandas dataframe.

First, you need to read the binary file into a numpy array. Because there is no method available to read the binary file to dataframe directly.

Once you have the numpy array, then you can create a dataframe with the numpy array.

Pass the NumPy array data into the pd.DataFrame(). Then you’ll have the dataframe with the bytes read from the binary file.

Example

import numpy as np

import pandas as pd

# Create a dtype with the binary data format and the desired column names
try:

    dt = np.dtype('B')

    data = np.fromfile("c:\temp\Binary_File.jpg", dtype=dt)

    df = pd.DataFrame(data)

    print(df)

except IOError:

    print("Error while opening the file!")

Output

             0
    0      255
    1      216
    2      255
    3      224
    4        0
    ...    ...
    18822    0
    18823  198
    18824  179
    18825  255
    18826  217

    [18827 rows x 1 columns]

This is how you can read a binary file using NumPy and use that NumPy array to create the pandas dataframe.

With the NumPy array, you can also read the bytes into the dictionary.

Read binary file skip header

In this section, you’ll learn how to read binary file, skipping the header line in the binary file. Some binary files will be having the ASCII header in them.

This skip header method can be useful when reading the binary files with the ASCII headers.

You can use the readlines() method available in the File object and specify [1:] as an additional parameter. This means the line from index 1 will be read.

The ASCII header line 0 will be ignored.

Example

f = open("c:\temp\Binary_File.jpg",'rb')

lines = f.readlines()[1:]
for line in lines:
    print(line.rstrip())

Output

    b'\x07\x07\x07\x07'
    b''
    b''
    b''
    b''
    b''
    b'\x0c\x0f\x0c\x0c\x0c\x0c\x0c\x0c\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x12\x12\x12\x12\x12\x12\x15\x15\x15\x15\x15\x17\x17\x17\x17\x17\x17\x17\x17\x17\x17\xff\xdb\x00C\x01\x04\x04\x04\x06\x06\x06'
    b'\x06\x06'

    b"\x93\x80\x18\x98\xc9\xdc\x8bm\x90&'\xc5U\xb18\x81\xc7y\xf0\x80\x00\x14\x1c\xceQd\x83\x13\xa0\xbf-D9\xe0\xae;\x8f\\LK\xb8\xc3\x8ae\xd4\xd1C\x10\x7f\x02\x02\xa6\x822K&D\x9a\x04\xd4\xc8\xfbC\x87\xf2\x8d\xdcN\xdes)rq\xbbI\x92\xb6\xeeu8\x1d\xfdG\xabv\xe8q\xa5\xb6\xb56\xe0\xa1\x06\x84n#\xf0\x1c\x86\xb0\x83\xee\x99\xe7\xc6\xaaN\xafY\xdf\xd9\xcfe\xd5\x84"

    b'\xd9\x0b\xc2\x1b0\xa1Q\x17\x88\xb4et\x81u8\xed\xf5\xe8\xd9#c\t\xf9\xc0\xa7\x06\xa2/={\x87l\x01K\x870\xe3\xa1\x024\xdc^\x11\x96\x96\xba\[email protected]\x91A\xd6U\xea\xe1\xbb\xb733'

Readind Binary file using Pickle

In this section, you’ll learn how to read binary files in python using the Pickle.

This is really tricky as all the types of binary files cannot be read in this mode. You may face problems while pickling a binary file. As invalid load key errors may occur.

Hence it’s not recommended to use this method.

Example

import pickle


file_to_read = open("c:\temp\Binary_File.jpg", "rb")

loaded_dictionary = pickle.load(file_to_read)

print(loaded_dictionary)

Output

    ---------------------------------------------------------------------------

    UnpicklingError                           Traceback (most recent call last)

     in 
          7 file_to_read = open("E:\Vikram_Blogging\Stack_Vidhya\Python_Notebooks\Read_Binary_File_Python\Binary_File.jpg", "rb")
          8 
    ----> 9 loaded_dictionary = pickle.load(file_to_read)
         10 
         11 print(loaded_dictionary)


    UnpicklingError: invalid load key, '\xff'.

Conclusion

Reading a binary file is an important functionality. For example, reading the bytes of an image file is very useful when you are working with image classification problems. In this case, you can read the image file as binary and read the bytes to create the model.

In this tutorial, you’ve learned the different methods available to read binary files in python and the different libraries available in it.

If you have any questions, feel free to comment below.

How do you access bytes in Python?

The bytes() function returns a bytes object. It can convert objects into bytes objects, or create empty bytes object of the specified size. The difference between bytes() and bytearray() is that bytes() returns an object that cannot be modified, and bytearray() returns an object that can be modified.

How do you read the first 10 bytes of a binary file in Python?

You can open the file using open() method by passing b parameter to open it in binary mode and read the file bytes. open('filename', "rb") opens the binary file in read mode.

How do you read first 10 bytes of a binary file?

Then (here, I use the special character ⛶ for: there was no newline. ). get10 pdf 4 <$infile >$outfile printf %b ${pdf[@]/#/\\x} %PDF⛶ echo $(( $(stat -c %s $infile) - $(stat -c %s $outfile) )) 4 get10 test 8 <<<'Hello world' rld!

What is bytes type in Python?

In short, the bytes type is a sequence of bytes that have been encoded and are ready to be stored in memory/disk. There are many types of encodings (utf-8, utf-16, windows-1255), which all handle the bytes differently. The bytes object can be decoded into a str type. The str type is a sequence of unicode characters.