How To Upload and Download Files in AWS S3 with Python and Boto3

By Adam McQuistan in Python  03/27/2020 Comment

Introduction

In this How To tutorial I demonstrate how to perform file storage management with AWS S3 using Python's boto3 AWS library.  Specifially I provide examples of configuring boto3, creating S3 buckets, as well as uploading and downloading files to and from S3 buckets.

Creating a Boto3 User in AWS For Programmatic Access

As a first step I make a new user in AWS's management console that I'll use in conjunction with the boto3 library to access my AWS account programmatically. Its considered a best practice to create a separate and specific user for use with boto3 as it makes it easier to track and manage.

To start I enter IAM in the search bar of the services menu and select the menu item.

Following that I click the Add user button.

On the following screen I enter a username of boto3-demo and make sure only Programmatic access item is selected and click the next button.

On the next screen I attach a permission policy of AmazonS3FullAccess then click the next button

Then click next until the credentials screen is show as seen below. On this screen I click the Download .csv button. I will need these credentials to configure Boto3 to allow me to access my AWS account programmatically.

Installing Boto3

Before writing any Python code I must install the AWS Python library named Boto3 which I will use to interact with the AWS S3 service.  To accomplish this I set up a Python3 virtual environment as I feel that is a best practice for any new project regardless of size and intent.

$ python mkdir aws_s3
$ python cd aws_s3 
$ aws_s3 python3 -m venv venv
$ aws_s3 source venv/bin/activate
(venv) $ aws_s3 pip install boto3

Configuring Boto3 and Boto3 User Credentials

With the boto3-demo user created and the Boto3 package installed I can now setup the configuration to enable authenticated access to my AWS account.  There a few different ways to handle this and the one I like best is to store the access key id and secret access key values as environment variables then use the Python os module from the standard library to feed them into the boto3 library for authentication.  There is a handy Python package called python-dotenv which allows you to put environment variables in a file named .env then load them into you Python source code so, I'll begin this section by installing it.

(venv) $ aws_s3 pip install python-dotenv

Following this I make a .env file and place the two variables in it as shown below but, obviously you'll want to put in your own values for these that you downloaded in the earlier step for creating the boto3 user in AWS console.

AWS_ACCESS_KEY_ID=MYACCESSKEYID
AWS_ACCESS_KEY_SECRET=MYACCESSKEYSECRET

Next I make a Python module named file_manager.py then inside I import the os and boto3 modules as well as the load_dotenv function from the python-dotenv package.  Following that I call the load_dotenv() function which will autofind a .env file in the same directory and read in the variable into the environment making them accessible via the os module.

Then I create a function named aws_session(...) for generating an authenticated Session object accessing the environmental variables with the os.getenv(...) function while returning a session object.

# file_manager.py

import os
import boto3

from dotenv import load_dotenv
load_dotenv(verbose=True)

def aws_session(region_name='us-east-1'):
    return boto3.session.Session(aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'),
                                aws_secret_access_key=os.getenv('AWS_ACCESS_KEY_SECRET'),
                                region_name=region_name)

I will then use this session object to interact with the AWS platform via a high-level abstraction object Boto3 provides known as the AWS Resource. When used in conjunction with my aws_session() function I can create a S3 resource like so.

session = aws_session()
s3_resource = session.resource('s3')

Creating an S3 Bucket Programmically with Boto3

I can now move on to making a publically readable bucket which will serve as the top level container for file objects within S3.  I will do this inside a function named make_bucket as shown below.

def make_bucket(name, acl):
    session = aws_session()
    s3_resource = session.resource('s3')
    return s3_resource.create_bucket(Bucket=name, ACL=acl)

s3_bucket = make_bucket('tci-s3-demo', 'public-read')

The key point to note here is that I've used the Resource class's create_bucket method to create the bucket passing it a string name which conforms to AWS naming rules along with an ACL parameter which is a string represeting an Access Control List policy which in this case is for public reading.

Uploading a File to S3 Using Boto3

At this point I can upload files to this newly created buchet using the Boto3 Bucket resource class. Below is a demo file named children.csv that I'll be working with.

name, age
Kallin, 3
Cameron, 0

In conjunction with good practice of reusability I'll again make a function to upload files given a file path and bucket name as shown below.

def upload_file_to_bucket(bucket_name, file_path):
    session = aws_session()
    s3_resource = session.resource('s3')
    file_dir, file_name = os.path.split(file_path)

    bucket = s3_resource.Bucket(bucket_name)
    bucket.upload_file(
      Filename=file_path,
      Key=file_name,
      ExtraArgs={'ACL': 'public-read'}
    )

    s3_url = f"https://{bucket_name}.s3.amazonaws.com/{file_name}"
    return s3_url

s3_url = upload_file_to_bucket('tci-s3-demo', 'children.csv')
print(s3_url) # https://tci-s3-demo.s3.amazonaws.com/children.csv

Here I use the Bucket resource class's upload_file(...) method to upload the children.csv file. The parameters to this method are a little confusing so let me explain them a little. First you have the Filename parameter which is actually the path to the file you wish to upload then there is the Key parameter which is a unique identifier for the S3 object and must confirm to AWS object naming rules similar to S3 buckets.

The upload_file_to_bucket(...) function upload the given file to the specified bucket and returns the AWS S3 resource url to the calling code.

Downloading a File from S3 using Boto3

The last thing I'll demonstrate is downloading the same S3 file object that was just uploaded. This is very similar to uploading except you use the download_file method of the Bucket resource class.

def download_file_from_bucket(bucket_name, s3_key, dst_path):
    session = aws_session()
    s3_resource = session.resource('s3')
    bucket = s3_resource.Bucket(bucket_name)
    bucket.download_file(Key=s3_key, Filename=dst_path)

download_file_from_bucket('tci-s3-demo', 'children.csv', 'children_download.csv')
with open('children_download.csv') as fo:
    print(fo.read())

Which outputs the following from the downloaded file.

name,age
Kallin,3
Cameron,0

Resources for Learning More

The Coding Interface earns commision from linked learning resource products it recommends to continue providing free tutorials and content and thank you for supporting the authors and distributors or such resources

Conclusion

In this How To article I have demonstrated how to set up and use the Python Boto3 library to access files transferring them to and from AWS S3 object storage. 

For completeness here is the complete source code for the file_manager.py module that was used in this tutorial.

# file_manager.py

import os
import boto3

from dotenv import load_dotenv
load_dotenv(verbose=True)


def aws_session(region_name='us-east-1'):
    return boto3.session.Session(aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'),
                                aws_secret_access_key=os.getenv('AWS_ACCESS_KEY_SECRET'),
                                region_name=region_name)


def make_bucket(name, acl):
    session = aws_session()
    s3_resource = session.resource('s3')
    return s3_resource.create_bucket(Bucket=name, ACL=acl)


def upload_file_to_bucket(bucket_name, file_path):
    session = aws_session()
    s3_resource = session.resource('s3')
    file_dir, file_name = os.path.split(file_path)

    bucket = s3_resource.Bucket(bucket_name)
    bucket.upload_file(
      Filename=file_path,
      Key=file_name,
      ExtraArgs={'ACL': 'public-read'}
    )

    s3_url = f"https://{bucket_name}.s3.amazonaws.com/{file_name}"
    return s3_url


def download_file_from_bucket(bucket_name, s3_key, dst_path):
    session = aws_session()
    s3_resource = session.resource('s3')
    bucket = s3_resource.Bucket(bucket_name)
    bucket.download_file(Key=s3_key, Filename=dst_path)

 

As always, I thank you for reading and feel free to ask questions or critique in the comments section below.

 

Share with friends and colleagues

[[ likes ]] likes

Community favorites for Python

theCodingInterface