How to read a file in S3 and store it in a String using Python and boto3

If you want to get a file from an S3 Bucket and then put it in a Python string, try the examples below.

boto3, the AWS SDK for Python, offers two distinct methods for accessing files or objects in Amazon S3: client method and the resource method.

Option 1 uses the boto3.client('s3') method, while options 2 and 3 use the boto3.resource('s3') method.

All 3 options do the exact same thing so get the one that you feel comfortable with or the one that will fit your use case.


Option 1: Reading a file in an S3 Bucket using boto3 S3 Client

import boto3

# Initialize boto3 to use the S3 client.
s3_client = boto3.client('s3')

try:
    # Get the file inside the S3 Bucket
    s3_response = s3_client.get_object(
        Bucket='radishlogic-bucket',
        Key='s3_folder/simple_file.txt'
    )

    # Get the Body object in the S3 get_object() response
    s3_object_body = s3_response.get('Body')

    # Read the data in bytes format and convert it to string
    content_str = s3_object_body.read().decode()

    # Print the file contents as a string
    print(content_str)

except s3_client.exceptions.NoSuchBucket as e:
    # S3 Bucket does not exist
    print('The S3 bucket does not exist.')
    print(e)

except s3_client.exceptions.NoSuchKey as e:
    # Object does not exist in the S3 Bucket
    print('The S3 objects does not exist in the S3 bucket.')
    print(e)

Option 2: Reading a file in an S3 Bucket using boto3 S3 Resource

import boto3

# Initialize boto3 to use S3 resource
s3_resource = boto3.resource('s3')

try:

    # Get the object from the S3 Bucket
    s3_object = s3_resource.Object(
        bucket_name='radishlogic-bucket', 
        key='s3_folder/simple_file.txt'
    )

    # Get the response from get_object()
    s3_response = s3_object.get()

    # Get the Body object from the S3 get_object() response
    s3_object_body = s3_response.get('Body')

    # Read the data in bytes format and convert it to string
    content_str = s3_object_body.read().decode()

    # Print the file contents as a string
    print(content_str)

except s3_resource.meta.client.exceptions.NoSuchBucket as e:
    # S3 Bucket does not exist
    print('NO SUCH BUCKET')
    print(e)

except s3_resource.meta.client.exceptions.NoSuchKey as e:
    # Object does not exist in the S3 Bucket
    print('NO SUCH KEY')
    print(e)

Option 3: Reading a file in an S3 Bucket using boto3 S3 Resource alternative

This is basically the same as Option 2, the only difference is that it first creates a variable that represents the S3 Bucket (s3_bucket), then from that variable, it gets the S3 file/object (s3_object) and reads the contents of the S3 file.

import boto3

# Initialize boto3 to use S3 resource
s3_resource = boto3.resource('s3')

try:
    # Get the S3 Bucket
    s3_bucket = s3_resource.Bucket(name='radishlogic-bucket')

    # Get the S3 Object from the S3 Bucket
    s3_object = s3_bucket.Object(key='s3_folder/simple_file.txt')

    # Get the response from get_object()
    s3_response = s3_object.get()

    # Get the Body object from the S3 get_object() response
    s3_object_body = s3_response.get('Body')

    # Read the data in bytes format and convert it to string
    content_str = s3_object_body.read().decode()

    # Print the file contents as a string
    print(content_str)

except s3_resource.meta.client.exceptions.NoSuchBucket as e:
    # S3 Bucket does not exist
    print('NO SUCH BUCKET')
    print(e)

except s3_resource.meta.client.exceptions.NoSuchKey as e:
    # Object does not exist in the S3 Bucket
    print('NO SUCH KEY')
    print(e)

I’m putting Option 3 here so that readers would know how to use boto3 S3 resource dynamically.


boto3 resource and boto3 client

As mentioned earlier, boto3 is the AWS SDK for python. It has two methods to access files or objects in AWS S3: the client method and the resource method.

These methods allow developers to interact with S3 in different ways, depending on their specific needs and preferences.

The boto3.client() method offers a low-level interface and is more suitable for advanced use cases.

The boto3.resource() method provides a higher-level abstraction and is easier to use for common operations.

With both methods available, developers can choose the approach that best suits their requirements and coding style.

In the background, boto3.resource('s3') uses the boto3.client('s3'). In fact, you can access the client using boto3.resource('s3').meta.client.

import boto3

# Initialize boto3 to use boto3 S3 resource
s3_resource = boto3.resource('s3')

# Get the boto3 client in the boto3 s3 resource
s3_client = s3_resource.meta.client

# Check the type of s3_client
print(type(s3_client))

The .get_object() method

If you look at the 3 codes closely, you will notice that the .get() method of boto3.resource('s3') (options 2 & 3) is the same as calling the .get_object() method via boto3.client('s3') (option 1).

That is why after calling the .get() method or the .get_object() method and putting the returned dictionary in the s3_response variable, the code in any of the 3 options is basically the same.

# OPTION 1
# Get the file inside the S3 Bucket
s3_response = s3_client.get_object(
    Bucket='radishlogic-bucket',
    Key='s3_folder/simple_file.txt'
)
# OPTIONS 2 and 3
# Get the response from get_object()
s3_response = s3_object.get()
# Same for all Options 1, 2 & 3

# Get the Body object from the S3 get_object() response
s3_object_body = s3_response.get('Body')

# Read the data in bytes format and convert it to string
content_str = s3_object_body.read().decode()

# Print the file contents as a string
print(content_str)

The response['Body'] object and .decode() method

The returned dictionary of the S3 .get_object() method has a lot of metadata. The real content of the S3 file that we are retrieving is actually in the Body key of the boto3 client response dictionary.

From the S3 .get_object() documentation, the value of the ‘Body’ key returns a StreamingBody() type. You can run the code below to check.

import boto3

# Initialize boto3 to use the S3 client.
s3_client = boto3.client('s3')

# Get the file inside the S3 Bucket
s3_response = s3_client.get_object(
    Bucket='radishlogic-bucket',
    Key='s3_folder/simple_file.txt'
)

# Print the type of 'Body'
print(type(s3_response.get('Body')))

Output

<class 'botocore.response.StreamingBody'>

You can also check the documentation of StreamingBody here.

There are many functions/methods in the StreamingBody but to read the contents of the S3 file we will need to use the .read() function. This will read and return all the data in the file in bytes.

You can check the .read() function returns a data type of bytes with the code below.

import boto3

# Initialize boto3 to use the S3 client.
s3_client = boto3.client('s3')

# Get the file inside the S3 Bucket
s3_response = s3_client.get_object(
    Bucket='radishlogic-bucket',
    Key='s3_folder/simple_file.txt'
)

# Get the Body object (StreamingBody) in the S3 get_object() response
s3_object_body = s3_response.get('Body')

# Print the data type of .read()
print(type(s3_object_body.read()))

Output

<class 'bytes'>

If you are expecting it to be in a String format, unfortunately, it is not, as some files (such as images or videos) are not text files.

If you are curious what the output would be when we print a data type of bytes, then you can run the code below.

import boto3

# Initialize boto3 to use the S3 client.
s3_client = boto3.client('s3')

# Get the file inside the S3 Bucket
s3_response = s3_client.get_object(
    Bucket='radishlogic-bucket',
    Key='s3_folder/simple_file.txt'
)

# Get the Body object (StreamingBody) in the S3 get_object() response
s3_object_body = s3_response.get('Body')

# Read the data in bytes format
content_bytes = s3_object_body.read()

# Print content in bytes format
print(content_bytes)

Output

b'Hello world!\r\nHello earth!\r\nHello planet!'

The b at the start is an indicator that its data type is bytes. Also, observe that it printed everything in one line bu there are \r\n, this is the new line in Windows Notepad.

This is the actual content of my simple_file.txt.

Hello world!
Hello earth!
Hello planet!

Since in this post, we are expecting the S3 object to be a text file, we need a way to convert the data type from ‘bytes’ to ‘string’. We can achieve that with the .decode() method.

By default it uses the ‘utf-8’ as its resulting encoding to convert to string.

In fact, the following codes would create the same result.

content_str = s3_object_body.read().decode()

content_str = s3_object_body.read().decode('utf-8')

content_str = s3_object_body.read().decode(encoding='utf-8')

Just use the one that you are comfortable using.


Reading and Processing S3 Text Files Line by Line

If you need to process the S3 text file line by line, then you can use the .splitlines() function of the string. Below are example codes.

S3 Client

import boto3

# Initialize boto3 to use the S3 client.
s3_client = boto3.client('s3')

# Get the file inside the S3 Bucket
s3_response = s3_client.get_object(
    Bucket='radishlogic-bucket',
    Key='s3_folder/simple_file.txt'
)

# Get the Body object (StreamingBody) in the S3 get_object() response
s3_object_body = s3_response.get('Body')

# Read the data in bytes format and convert it to string
content_str = s3_object_body.read().decode()

# Split the content of the text file per line and store in a list of strings
content_str_line_list = content_str.splitlines()

# Print per line
for line_str in content_str_line_list:
    print(line_str)

S3 Resource

import boto3

# Initialize boto3 to use S3 resource
s3_resource = boto3.resource('s3')

# Get the S3 Bucket
s3_bucket = s3_resource.Bucket(name='radishlogic-bucket')

# Get the S3 Object from the S3 Bucket
s3_object = s3_bucket.Object(key='s3_folder/simple_file.txt')

# Get the response from get_object()
s3_response = s3_object.get()

# Get the Body object from the S3 get_object() response
s3_object_body = s3_response.get('Body')

# Read the data in bytes format and convert it to string
content_str = s3_object_body.read().decode()

# Split the content of the text file per line and store in a list of strings
content_str_line_list = content_str.splitlines()

# Print per line
for line_str in content_str_line_list:
    print(line_str)

Why .splitlines()?

There are many ways to represent a new line in text files. There is \n, \r and \r\n line endings. .splitlines() automatically detects and handles these different newline styles so we get a list of strings per line without worrying about excess characters in the end.


Reading S3 Files via AWS Lambda Python Code

You can use the codes above with AWS Lambda to retrieve an S3 file and then put it in a string to be processed in the Lambda function.

Below are the boto3 s3 client and resource methods used in an AWS Lambda Function.

AWS Lambda Code for reading file in S3 and putting it in a string using boto3 S3 Client

import boto3

def lambda_handler(event, context):

    # Initialize boto3 to use the S3 client.
    s3_client = boto3.client('s3')

    try:
        # Get the file inside the S3 Bucket
        s3_response = s3_client.get_object(
            Bucket='radishlogic-bucket',
            Key='s3_folder/simple_file.txt'
        )

        # Get the Body object in the S3 get_object() response
        s3_object_body = s3_response.get('Body')

        # Read the data in bytes format and convert it to string
        content_str = s3_object_body.read().decode()

        # Print the file contents as a string
        print(content_str)

    except s3_client.exceptions.NoSuchBucket as e:
        # S3 Bucket does not exist
        print('The S3 bucket does not exist.')
        print(e)

    except s3_client.exceptions.NoSuchKey as e:
        # Object does not exist in the S3 Bucket
        print('The S3 objects does not exist in the S3 bucket.')
        print(e)

AWS Lambda Code for reading file in S3 and putting it in a string using boto3 S3 Resource

import boto3

def lambda_handler(event, context):

    # Initialize boto3 to use S3 resource
    s3_resource = boto3.resource('s3')

    try:
    
        # Get the object from the S3 Bucket
        s3_object = s3_resource.Object(
            bucket_name='radishlogic-bucket', 
            key='s3_folder/simple_file.txt'
        )
        
        # Get the response from get_object()
        s3_response = s3_object.get()
        
        # Get the Body object from the S3 get_object() response
        s3_object_body = s3_response.get('Body')
        
        # Read the data in bytes format and convert it to string
        content_str = s3_object_body.read().decode()
        
        # Print the file contents as a string
        print(content_str)
        print('hello')
        
        return content_str

    except s3_resource.meta.client.exceptions.NoSuchBucket as e:
        # S3 Bucket does not exist
        print('NO SUCH BUCKET')
        print(e)

    except s3_resource.meta.client.exceptions.NoSuchKey as e:
        # Object does not exist in the S3 Bucket
        print('NO SUCH KEY')
        print(e)

We hope that this post helped you read S3 objects/files using Python and boto3.

Let us know your experience in the comments below.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.