I read the filenames in my S3 bucket by doing
objs = boto3.client.list_objects[Bucket='my_bucket']
while 'Contents' in objs.keys[]:
objs_contents = objs['Contents']
for i in range[len[objs_contents]]:
filename = objs_contents[i]['Key']
Now, I need to get the actual content of the file, similarly to a open[filename].readlines[]
. What is the best way?
asked Mar 24, 2016 at 16:41
boto3 offers a resource model that makes tasks like iterating through objects easier. Unfortunately, StreamingBody doesn't provide readline
or readlines
.
s3 = boto3.resource['s3']
bucket = s3.Bucket['test-bucket']
# Iterates through all the objects, doing the pagination for you. Each obj
# is an ObjectSummary, so it doesn't contain the body. You'll need to call
# get to get the whole body.
for obj in bucket.objects.all[]:
key = obj.key
body = obj.get[]['Body'].read[]
answered Mar 24, 2016 at 16:57
Jordon PhillipsJordon Phillips
13.7k4 gold badges34 silver badges42 bronze badges
13
You might consider the smart_open
module, which supports iterators:
from smart_open import smart_open
# stream lines from an S3 object
for line in smart_open['s3://mybucket/mykey.txt', 'rb']:
print[line.decode['utf8']]
and context managers:
with smart_open['s3://mybucket/mykey.txt', 'rb'] as s3_source:
for line in s3_source:
print[line.decode['utf8']]
s3_source.seek[0] # seek to the beginning
b1000 = s3_source.read[1000] # read 1000 bytes
Find smart_open
at
//pypi.org/project/smart_open/
answered Dec 14, 2018 at 18:30
caffreydcaffreyd
1,0741 gold badge17 silver badges25 bronze badges
2
Using the client instead of resource:
s3 = boto3.client['s3']
bucket='bucket_name'
result = s3.list_objects[Bucket = bucket, Prefix='/something/']
for o in result.get['Contents']:
data = s3.get_object[Bucket=bucket, Key=o.get['Key']]
contents = data['Body'].read[]
print[contents.decode["utf-8"]]
Ryan M♦
16.5k30 gold badges56 silver badges65 bronze badges
answered Jan 27, 2021 at 18:08
0
When you want
to read a file with a different configuration than the default one, feel free to use either mpu.aws.s3_read[s3path]
directly or the copy-pasted code:
def s3_read[source, profile_name=None]:
"""
Read a file from an S3 source.
Parameters
----------
source : str
Path starting with s3://, e.g. 's3://bucket-name/key/foo.bar'
profile_name : str, optional
AWS profile
Returns
-------
content : bytes
botocore.exceptions.NoCredentialsError
Botocore is not able to find your credentials. Either specify
profile_name or add the environment variables AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN.
See //boto3.readthedocs.io/en/latest/guide/configuration.html
"""
session = boto3.Session[profile_name=profile_name]
s3 = session.client['s3']
bucket_name, key = mpu.aws._s3_path_split[source]
s3_object = s3.get_object[Bucket=bucket_name, Key=key]
body = s3_object['Body']
return body.read[]
answered Aug 23, 2018 at 19:36
Martin ThomaMartin Thoma
113k148 gold badges570 silver badges875 bronze badges
0
If you already know the filename
, you can use the boto3
builtin
download_fileobj
import boto3
from io import BytesIO
session = boto3.Session[]
s3_client = session.client["s3"]
f = BytesIO[]
s3_client.download_fileobj["bucket_name", "filename", f]
print[f.getvalue[]]
answered Apr 6, 2020 at 2:33
reubanoreubano
4,8141 gold badge39 silver badges38 bronze badges
2
import boto3
print["started"]
s3 = boto3.resource['s3',region_name='region_name', aws_access_key_id='your_access_id', aws_secret_access_key='your access key']
obj = s3.Object['bucket_name','file_name']
data=obj.get[]['Body'].read[]
print[data]
answered Jun 16 at 8:33
1
This is the correct and tested code to access the file contents using boto3 from the s3 bucket. It is working for me till the date of posting.
def get_file_contents[bucket, prefix]:
s3 = boto3.resource['s3']
s3.meta.client.meta.events.register['choose-signer.s3.*', disable_signing]
bucket = s3.Bucket[bucket]
for obj in bucket.objects.filter[Prefix=prefix]:
key = obj.key
body = obj.get[]['Body'].read[]
print[body]
return body
get_file_contents['coderbytechallengesandbox', '__cb__']
answered Jul 4 at 1:08
bilalmohibbilalmohib
2703 silver badges14 bronze badges
the best way for me is this:
result = s3.list_objects[Bucket = s3_bucket, Prefix=s3_key]
for file in result.get['Contents']:
data = s3.get_object[Bucket=s3_bucket, Key=file.get['Key']]
contents = data['Body'].read[]
#if Float types are not supported with dynamodb; use Decimal types instead
j = json.loads[contents, parse_float=Decimal]
for item in j:
timestamp = item['timestamp']
table.put_item[
Item={
'timestamp': timestamp
}
]
once you have the content you can run it through another loop to write it to a dynamodb table for instance ...
answered Dec 4, 2021 at 15:31
aerioeusaerioeus
1,2181 gold badge12 silver badges35 bronze badges