Can someone tell me if it is possible to read a csv file directly from Azure blob storage as a stream and process it using Python? I know it can be done using C#.Net [shown below] but wanted to know the equivalent library in Python to do this.
CloudBlobClient client = storageAccount.CreateCloudBlobClient[];
CloudBlobContainer container = client.GetContainerReference["outfiles"];
CloudBlob blob = container.GetBlobReference["Test.csv"];*
Jay Gong
22.4k2 gold badges20 silver badges28 bronze badges
asked Feb 20, 2018 at 8:57
1
Yes, it is certainly possible to do so. Check out Azure Storage SDK for Python
from azure.storage.blob import BlockBlobService
block_blob_service = BlockBlobService[account_name='myaccount', account_key='mykey']
block_blob_service.get_blob_to_path['mycontainer', 'myblockblob', 'out-sunset.png']
You can read the complete SDK documentation here: //azure-storage.readthedocs.io.
answered Feb 20, 2018 at 9:01
Gaurav MantriGaurav Mantri
120k11 gold badges187 silver badges219 bronze badges
8
Here's a way to do it with the new version of the SDK [12.0.0]:
from azure.storage.blob import BlobClient
blob = BlobClient[account_url="//.blob.core.windows.net"
container_name="",
blob_name="",
credential=""]
with open["example.csv", "wb"] as f:
data = blob.download_blob[]
data.readinto[f]
See here for details.
answered Nov 6, 2019 at 13:41
4
One can stream from blob with python like this:
from tempfile import NamedTemporaryFile
from azure.storage.blob.blockblobservice import BlockBlobService
entry_path = conf['entry_path']
container_name = conf['container_name']
blob_service = BlockBlobService[
account_name=conf['account_name'],
account_key=conf['account_key']]
def get_file[filename]:
local_file = NamedTemporaryFile[]
blob_service.get_blob_to_stream[container_name, filename, stream=local_file,
max_connections=2]
local_file.seek[0]
return local_file
answered May 27, 2019 at 10:08
Daniel RDaniel R
1301 silver badge10 bronze badges
2
Provide Your Azure subscription Azure storage name and Secret Key as Account Key here
block_blob_service = BlockBlobService[account_name='$$$$$$', account_key='$$$$$$']
This still get the blob and save in current location as 'output.jpg'
block_blob_service.get_blob_to_path['you-container_name', 'your-blob', 'output.jpg']
This will get text/item from blob
blob_item= block_blob_service.get_blob_to_bytes['your-container-name','blob-name']
blob_item.content
answered Sep 5, 2019 at 17:47
I recommend using smart_open.
import os
from azure.storage.blob import BlobServiceClient
from smart_open import open
connect_str = os.environ['AZURE_STORAGE_CONNECTION_STRING']
transport_params = {
'client': BlobServiceClient.from_connection_string[connect_str],
}
# stream from Azure Blob Storage
with open['azure://my_container/my_file.txt', transport_params=transport_params] as fin:
for line in fin:
print[line]
# stream content *into* Azure Blob Storage [write mode]:
with open['azure://my_container/my_file.txt', 'wb', transport_params=transport_params] as fout:
fout.write[b'hello world']
answered Jul 27, 2020 at 13:46
pistolpetepistolpete
9009 silver badges19 bronze badges
2
Here is the simple way to read a CSV using Pandas from a Blob:
import os
from azure.storage.blob import BlobServiceClient
service_client = BlobServiceClient.from_connection_string[os.environ['AZURE_STORAGE_CONNECTION_STRING']]
client = service_client.get_container_client["your_container"]
bc = client.get_blob_client[blob="your_folder/yourfile.csv"]
data = bc.download_blob[]
with open["file.csv", "wb"] as f:
data.readinto[f]
df = pd.read_csv["file.csv"]
answered Feb 18, 2021 at 10:31
IlyasIlyas
1,68814 silver badges9 bronze badges
1
Since I wasn't able to find what I needed on this thread, I wanted to follow up on @SebastianDziadzio's answer to retrieve the data without downloading it as a local file, which is what I was trying to find for myself.
Replace the with
statement with the following:
from io import BytesIO
import pandas as pd
with BytesIO[] as input_blob:
blob_client_instance.download_blob[].download_to_stream[input_blob]
input_blob.seek[0]
df = pd.read_csv[input_blob, compression='infer', index_col=0]
answered Aug 3 at 10:57
I know this is an old post but if someone wants to do the same. I was able to access as per below codes
Note: you need to set the AZURE_STORAGE_CONNECTION_STRING which can be obtained from Azure Portal -> Go to your storage -> Settings -> Access keys and then you will get the connection string there.
For Windows: setx AZURE_STORAGE_CONNECTION_STRING ""
For Linux: export AZURE_STORAGE_CONNECTION_STRING=""
For macOS: export AZURE_STORAGE_CONNECTION_STRING=""
import os
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
connect_str = os.getenv['AZURE_STORAGE_CONNECTION_STRING']
print[connect_str]
blob_service_client = BlobServiceClient.from_connection_string[connect_str]
container_client = blob_service_client.get_container_client["Your Storage Name Here"]
try:
print["\nListing blobs..."]
# List the blobs in the container
blob_list = container_client.list_blobs[]
for blob in blob_list:
print["\t" + blob.name]
except Exception as ex:
print['Exception:']
print[ex]
answered May 8, 2021 at 18:12