How to read a JSON file in S3 and store it in a Dictionary using boto3 and Python

If you want to get a JSON file from an S3 Bucket and load it into a Python Dictionary then you can use the example codes below.

There are 4 scenarios for the examples scripts below.

  1. Basic JSON file from S3 to Python Dictionary
  2. With Try/Except block
  3. With datetime, date, and time conversions
  4. Running the code in a Lambda Function

AWS boto3 provides 2 ways to access S3 files, the boto3.client('s3') and boto3.resource('s3'). For each of the example scenarios above, a code will be provided for the two methods.

Related: Writing a Dictionary to JSON file in S3 using boto3 and Python

Since both methods will function the same, you can choose whichever method you like.


Code on how to get a JSON file from S3 and loading it to a Python Dictionary

boto3.client(‘s3’)

import boto3
import json

# Initialize boto3 to use the S3 client.
s3_client = boto3.client('s3')

# Get the file inside the S3 Bucket
s3_response = s3_client.get_object(
    Bucket='radishlogic-bucket',
    Key='s3_folder/details.json'
)

# Get the Body object in the S3 get_object() response
s3_object_body = s3_response.get('Body')

# Read the data in bytes format
content = s3_object_body.read()

json_dict = json.loads(content)

# Print the file contents as a string
print(json_dict)
print(type(json_dict))

boto3.resource(‘s3’)

import boto3
import json

# Initialize boto3 to use S3 resource
s3_resource = boto3.resource('s3')

# Get the object from the S3 Bucket
s3_object = s3_resource.Object(
    bucket_name='radishlogic-bucket', 
    key='s3_folder/details.json'
)

# Get the response from get_object()
s3_response = s3_object.get()

# Get the Body object in the S3 get_object() response
s3_object_body = s3_response.get('Body')

# Read the data in bytes format
content = s3_object_body.read()

json_dict = json.loads(content)

# Print the file contents as a string
print(json_dict)
print(type(json_dict))

What the examples above do is get a the file in s3_folder folder with the name of details.json. The file is inside the S3 Bucket named radishlogic-bucket.

Once the script get the content of the details.json it converts it to a Python dictionary using the json.loads() function.

To get a file or an object from an S3 Bucket you would need to use the get_object() method.

For boto3.client(‘s3’) the get_object method is this part.

# Initialize boto3 to use the S3 client.
s3_client = boto3.client('s3')

# Get the file inside the S3 Bucket
s3_response = s3_client.get_object(
    Bucket='radishlogic-bucket',
    Key='s3_folder/details.json'
)

While for boto3.resource(‘s3’) the get_object() method is this part.

# Initialize boto3 to use S3 resource
s3_resource = boto3.resource('s3')

# Get the object from the S3 Bucket
s3_object = s3_resource.Object(
    bucket_name='radishlogic-bucket', 
    key='s3_folder/details.json'
)

# Get the response from get_object()
s3_response = s3_object.get()

Once we get the s3_response, we need to get the file’s contents using the ‘Body’ key.

s3_object_body = s3_response.get('Body')

Then to read the content, we will need the .read() function.

content = s3_object_body.read()

Once we get the content of the JSON file, then we can convert it to a dictionary using json.loads() function.

import json

json_dict = json.loads(content)

Adding Try and Except Block to Catch Errors

If you want to catch errors like if the S3 Bucket or the S3 Object exists or if the JSON format is correct then we need to add a Try / Except block.

To use a Try / Except block to get an S3 object/file then load it as a Python Dictionary then you can use the codes below.

boto3.client(‘s3’)

import boto3
import json

# Initialize boto3 to use the S3 client.
s3_client = boto3.client('s3')

try:
    # Get the file inside the S3 Bucket
    s3_response = s3_client.get_object(
        Bucket='radishlogic-bucket',
        Key='s3_folder/details.json'
    )

    # Get the Body object in the S3 get_object() response
    s3_object_body = s3_response.get('Body')

    # Read the data in bytes format
    content = s3_object_body.read()

    try:
        # Parse JSON content to Python Dictionary
        json_dict = json.loads(content)

        # Print the file contents as a string
        print(json_dict)
        print(type(json_dict))

    except json.decoder.JSONDecodeError as e:
        # JSON is not properly formatted
        print('JSON file is not properly formatted')
        print(e)

except s3_client.exceptions.NoSuchBucket as e:
    # S3 Bucket does not exist
    print('The S3 bucket does not exist.')
    print(e)

except s3_client.exceptions.NoSuchKey as e:
    # Object does not exist in the S3 Bucket
    print('The S3 objects does not exist in the S3 bucket.')
    print(e)

boto3.resource(‘s3’)

import boto3
import json

# Initialize boto3 to use S3 resource
s3_resource = boto3.resource('s3')

try:
    # Get the object from the S3 Bucket
    s3_object = s3_resource.Object(
        bucket_name='radishlogic-bucket', 
        key='s3_folder/details.json'
    )

    # Get the response from get_object()
    s3_response = s3_object.get()

    # Get the Body object in the S3 get_object() response
    s3_object_body = s3_response.get('Body')

    # Read the data in bytes format
    content = s3_object_body.read()

    try:
        # Parse JSON content to Python Dictionary
        json_dict = json.loads(content)

        # Print the file contents as a string
        print(json_dict)
        print(type(json_dict))

    except json.decoder.JSONDecodeError as e:
        # JSON is not properly formatted
        print('JSON file is not properly formatted')
        print(e)

except s3_resource.meta.client.exceptions.NoSuchBucket as e:
    # S3 Bucket does not exist
    print('NO SUCH BUCKET')
    print(e)

except s3_resource.meta.client.exceptions.NoSuchKey as e:
    # Object does not exist in the S3 Bucket
    print('NO SUCH KEY')
    print(e)

There are three exceptions that we are watching out in the code.

json.decoder.JSONDecodeError

This exception will be raised if the JSON file is not formatted correctly.

The print(e) part will also print where in the JSON file is the format wrong.

s3_client.exceptions.NoSuchBucket

The NoSuchBucket exception will be raised if the name of the S3 Bucket does not exist.

s3_client.exceptions.NoSuchKey

The NoSuchKey exception will be raised if the key of the target S3 JSON file does not exist. In short, if the key that you are trying to access does not exist.


Converting Date and Time to datetime, date and time

If you need to convert the strings that have a date or time format to a Python datetime, date or time objects for further processing, then you can use the code below.

boto3.client(‘s3’) method

import boto3
import json
from datetime import datetime, date, time

# Convert string to date, time, or datetime
def datetime_converter(value):
    if isinstance(value, str):
        try:
            return date.fromisoformat(value)
        except ValueError:
            try:
                return time.fromisoformat(value)
            except ValueError:
                try:    
                    return datetime.fromisoformat(value)
                except ValueError:
                    pass
    return value


# Define a custom decoder function to parse datetime, date, and time strings
def json_datetime_decoder(obj):

    for key, value in obj.items():
        if isinstance(value, str):
            obj[key] = datetime_converter(value)

        elif isinstance(value, dict):
            obj[key] = json_datetime_decoder(value)

        elif isinstance(value, list):
            temp_list = []
            for item in value:
                if isinstance(item, str):
                    temp_item = datetime_converter(item)
                    temp_list.append(temp_item)

                elif isinstance(item, dict):
                    temp_item = json_datetime_decoder(item)
                    temp_list.append(temp_item)

                else:
                    temp_list.append(item)
                    
            obj[key] = temp_list

    return obj

# Initialize boto3 to use the S3 client.
s3_client = boto3.client('s3')

# Get the file inside the S3 Bucket
s3_response = s3_client.get_object(
    Bucket='radishlogic-bucket',
    Key='s3_folder/details.json'
)

# Get the Body object in the S3 get_object() response
s3_object_body = s3_response.get('Body')

# Read the data in bytes format
content = s3_object_body.read()

# Convert the JSON string to a Python Dictionary 
# and convert date and time strings to datetime
json_dict = json.loads(content, object_hook=json_datetime_decoder)

# Print the file contents as a string
print(json_dict)
print(type(json_dict))

boto3.resource(‘s3’)

import boto3
import json
from datetime import datetime, date, time

# Convert string to date, time, or datetime
def datetime_converter(value):
    if isinstance(value, str):
        try:
            return date.fromisoformat(value)
        except ValueError:
            try:
                return time.fromisoformat(value)
            except ValueError:
                try:    
                    return datetime.fromisoformat(value)
                except ValueError:
                    pass
    return value


# Define a custom decoder function to parse datetime, date, and time strings
def json_datetime_decoder(obj):

    for key, value in obj.items():
        if isinstance(value, str):
            obj[key] = datetime_converter(value)

        elif isinstance(value, dict):
            obj[key] = json_datetime_decoder(value)

        elif isinstance(value, list):
            temp_list = []
            for item in value:
                if isinstance(item, str):
                    temp_item = datetime_converter(item)
                    temp_list.append(temp_item)

                elif isinstance(item, dict):
                    temp_item = json_datetime_decoder(item)
                    temp_list.append(temp_item)

                else:
                    temp_list.append(item)
                    
            obj[key] = temp_list

    return obj


# Initialize boto3 to use S3 resource
s3_resource = boto3.resource('s3')

# Get the object from the S3 Bucket
s3_object = s3_resource.Object(
    bucket_name='radishlogic-bucket', 
    key='s3_folder/details.json'
)

# Get the response from get_object()
s3_response = s3_object.get()

# Get the Body object in the S3 get_object() response
s3_object_body = s3_response.get('Body')

# Read the data in bytes format
content = s3_object_body.read()

# Convert the JSON string to a Python Dictionary 
# and convert date and time strings to datetime
json_dict = json.loads(content, object_hook=json_datetime_decoder)

# Print the file contents as a string
print(json_dict)
print(type(json_dict))

If you noticed that calling the json.loads() function has an additional argument with the name of object_hook.

# Convert the JSON string to a Python Dictionary 
# and convert date and time strings to datetime
json_dict = json.loads(content, object_hook=json_datetime_decoder)

object_hook is an optional parameter that accepts a function name. In our case, we are calling the json_datetime_decoder function. This function will be called right after json.loads() decodes the JSON string to a Python dictionary, but before it passes the dictionary as a result to json_dict.

The json_datetime_decoder() function goes through all the values of the dictionary and items in lists and if it detects a type of string, then it will call the datetime_converter() function.

from datetime import datetime, date, time

# Define a custom decoder function to parse datetime, date, and time strings
def json_datetime_decoder(obj):

    for key, value in obj.items():
        if isinstance(value, str):
            obj[key] = datetime_converter(value)

        elif isinstance(value, dict):
            obj[key] = json_datetime_decoder(value)

        elif isinstance(value, list):
            temp_list = []
            for item in value:
                if isinstance(item, str):
                    temp_item = datetime_converter(item)
                    temp_list.append(temp_item)

                elif isinstance(item, dict):
                    temp_item = json_datetime_decoder(item)
                    temp_list.append(temp_item)

                else:
                    temp_list.append(item)
                    
            obj[key] = temp_list

    return obj

The datetime_converter() function accepts a string input and tries to convert it a date, time, or datetime object, respectively.

# Convert string to date, time, or datetime
def datetime_converter(value):
    if isinstance(value, str):
        try:
            return date.fromisoformat(value)
        except ValueError:
            try:
                return time.fromisoformat(value)
            except ValueError:
                try:    
                    return datetime.fromisoformat(value)
                except ValueError:
                    pass
    return value

In short, as long as there is an element in your JSON that is within Python datetime format then it will be converted to a date, time or datetime object. Even if that element is in a dictionary within a list that is within a dictionary.

The date and time string formats that will be converted can be found on the links below.


Lambda Function Code for loading the JSON file from S3 then loading it to a Python Dictionary

If you are wondering if you can use the codes above to get a JSON file in an AWS S3 Bucket and load it to a Python Dictionary inside an AWS Lambda Function, then I am here to tell you that you definitely can using the codes below.

boto3.client(‘s3’) method

import boto3
import json

def lambda_handler(event, context):
    
    # Initialize boto3 to use the S3 client.
    s3_client = boto3.client('s3')
    
    try:
        # Get the file inside the S3 Bucket
        s3_response = s3_client.get_object(
            Bucket='radishlogic-bucket',
            Key='s3_folder/details.json'
        )
    
        # Get the Body object in the S3 get_object() response
        s3_object_body = s3_response.get('Body')
    
        # Read the data in bytes format
        content = s3_object_body.read()
    
        try:
            # Parse JSON content to Python Dictionary
            json_dict = json.loads(content)
    
            # Print the file contents as a string
            print(json_dict)
            print(type(json_dict))
    
        except json.decoder.JSONDecodeError as e:
            # JSON is not properly formatted
            print('JSON file is not properly formatted')
            print(e)
    
    except s3_client.exceptions.NoSuchBucket as e:
        # S3 Bucket does not exist
        print('The S3 bucket does not exist.')
        print(e)
    
    except s3_client.exceptions.NoSuchKey as e:
        # Object does not exist in the S3 Bucket
        print('The S3 objects does not exist in the S3 bucket.')
        print(e)

boto3.resource(‘s3’) method

import boto3
import json

def lambda_handler(event, context):
    
    # Initialize boto3 to use S3 resource
    s3_resource = boto3.resource('s3')
    
    try:
        # Get the object from the S3 Bucket
        s3_object = s3_resource.Object(
            bucket_name='radishlogic-bucket', 
            key='s3_folder/details.json'
        )
    
        # Get the response from get_object()
        s3_response = s3_object.get()
    
        # Get the Body object in the S3 get_object() response
        s3_object_body = s3_response.get('Body')
    
        # Read the data in bytes format
        content = s3_object_body.read()
    
        try:
            # Parse JSON content to Python Dictionary
            json_dict = json.loads(content)
    
            # Print the file contents as a string
            print(json_dict)
            print(type(json_dict))
    
        except json.decoder.JSONDecodeError as e:
            # JSON is not properly formatted
            print('JSON file is not properly formatted')
            print(e)
    
    except s3_resource.meta.client.exceptions.NoSuchBucket as e:
        # S3 Bucket does not exist
        print('NO SUCH BUCKET')
        print(e)
    
    except s3_resource.meta.client.exceptions.NoSuchKey as e:
        # Object does not exist in the S3 Bucket
        print('NO SUCH KEY')
        print(e)

Python Script to create a JSON file with datetime, date and time and upload it to S3

Here’s the Python script that I used to create my JSON file and upload it to S3.

I modified it in some test scenarios so it would be able to raise exceptions and have a very deep element that has a date time format.

import boto3
import json
from datetime import datetime

data_dict = {
    'Name': 'Daikon Retek',
    'DateTimeNow': datetime.now(),
    'DateNow': datetime.now().date(),
    'TimeNow': datetime.now().time(),
    'Subjects': ['Math', 'Science', 'History']
}

# Convert Dictionary to JSON String
data_string = json.dumps(data_dict, indent=2, default=str)


# Upload JSON String to an S3 Object
s3_resource = boto3.resource('s3')

s3_bucket = s3_resource.Bucket(name='radishlogic-bucket')

s3_bucket.put_object(
    Key='s3_folder/details.json',
    Body=data_string
)

That is all I have with loading a JSON file from Amazon S3 and loading it to a Python Dictionary. I hope this post was helpful to you.

Let me know your experience in the comments below.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.