How to list all objects in an S3 Bucket using boto3 and Python

If you need to list all files/objects inside an AWS S3 Bucket then you will need to use the list_objects_v2 method in boto3.

Below are 3 example codes of how to list all files in a target S3 Bucket.

You can use any of the 3 options since it does the same thing.

It will get all of the files inside the S3 Bucket radishlogic-bucket using Python boto3, put it inside a Python list, then print each object key. It will print the files inside folder recursively, regardless if they are inside a folder or not.

At the end, it will also print the number of items inside the S3 Bucket.

The Python scripts below will list all the S3 objects inside the bucket even if the number of files in the bucket exceeds 1,000.

Example 1: List all S3 object keys using boto3 resource

import boto3

# Initialize boto3 to use S3 resource
s3_resource = boto3.resource('s3')

# Get the S3 Bucket
s3_bucket = s3_resource.Bucket(name='radishlogic-bucket')

# Get the iterator from the S3 objects collection
s3_object_iterator = s3_bucket.objects.all()

# Initialize the resulting s3_object_key_list to an empty list
s3_object_key_list = []

# loop through all the objects inside the S3 bucket
for s3_object in s3_object_iterator:

    # Get the key of each S3 object
    s3_object_key = s3_object.key

    # Add the s3_object_key to the list of S3 object keys
    s3_object_key_list.append(s3_object_key)

# Print each S3 object key inside the list
for s3_object_key in s3_object_key_list:
    print(s3_object_key)

# Print the number of objects inside the S3 Bucket
print(len(s3_object_key_list))

Note: Even though it cannot be seen in the boto3 resource example, s3_bucket.objects.all() uses the list_objects_v2 method behind the scenes.

Example 2: List all S3 object keys using boto3 client paginator

import boto3

# Initialize boto3 to use s3 client
s3_client = boto3.client('s3')


# Get the paginator for list_objects_v2
s3_paginator = s3_client.get_paginator('list_objects_v2')

# Set the S3 Bucket to the paginator
s3_page_iterator = s3_paginator.paginate(
    Bucket='radishlogic-bucket'
)

# Initialize the resulting s3_object_key_list to an empty list
s3_object_key_list = []

# Get the S3 response for each page of the iterator
for s3_page_response in s3_page_iterator:

    # Get the list of S3 objects for each page response
    for s3_object in s3_page_response['Contents']:

        # Get the key of each S3 object
        s3_object_key = s3_object['Key']

        # Add the s3_object_key to the list of S3 object keys
        s3_object_key_list.append(s3_object_key)


# Print each S3 object key inside the list
for s3_object_key in s3_object_key_list:
    print(s3_object_key)

# Print the number of objects inside the S3 Bucket
print(len(s3_object_key_list))

Example 3: List all S3 object keys in S3 Bucket using boto3 client nextContinuationToken

import boto3

# Initialize boto3 to use s3 client
s3_client = boto3.client('s3')

# Initialize the resulting s3_object_key_list to an empty list
s3_object_key_list = []

# Arguments to be used for list_object_v2
operation_parameters = {
    'Bucket': 'radishlogic-bucket'
}

# Indicator whether to stop the loop or not
done = False

# while loop implemented as a do-while loop
while not done:

    # Calling list_objects_v2 function using the unpacked operation_parameters
    s3_response = s3_client.list_objects_v2(**operation_parameters)

    # Get the list of s3 objects for every s3_response
    for s3_object in s3_response['Contents']:

        # Get the S3 object key
        s3_object_key = s3_object['Key']

        # Add the s3_object_key to the list of S3 object keys
        s3_object_key_list.append(s3_object_key)

    # Get the next continuation token
    nextContinuationToken = s3_response.get('NextContinuationToken')

    if nextContinuationToken is None:
        # If the next continuation token does not exist, set the done indicator to True to exit the loop
        done = True
    else:
        # If the next continuation token exists, update the operation_parameters
        operation_parameters['ContinuationToken'] = nextContinuationToken

# Print each S3 object key inside the list
for s3_object_key in s3_object_key_list:
    print(s3_object_key)

# Print the number of objects inside the S3 Bucket
print(len(s3_object_key_list))

Which method to use?

My go-to method is using the boto3 resource for S3 (Example 1) since it is much easier to use than the other 2.

If I am required to use the low-level interface (boto3 client), I would use the paginator method (Example 2).

I would avoid using the NextContinuationToken method as much as possible unless there is a specific advantage.

Why are the examples pausing every 1000 objects?

If you try to print the S3 object keys when you retrieve them, it will pause every 1,000 objects. This is because list_objects_v2 has a parameter named MaxKeys, which has a default value of 1,000.

The MaxKeys parameter sets the maximum number of keys returned per call of the list_objects_v2. The response can be lower than 1,000 object keys, but it cannot be more than that.

You can read more about MaxKeys here.

We hope this helps you list all the files/objects in an S3 bucket using Python boto3.

Let us know your experience in the comments below.

Example 1: List all S3 object keys using boto3 resource

Example 2: List all S3 object keys using boto3 client paginator

Example 3: List all S3 object keys in S3 Bucket using boto3 client nextContinuationToken

Which method to use?

Why are the examples pausing every 1000 objects?

Related Posts

Leave a Reply Cancel reply