How To Upload and Download Files in AWS S3 with Python and Boto3

By Adam McQuistan in Python  03/27/2020 Comment

Introduction

In this How To tutorial I demonstrate how to perform file storage management with AWS S3 using Python's boto3 AWS library.  Specifially I provide examples of configuring boto3, creating S3 buckets, as well as uploading and downloading files to and from S3 buckets.

Creating a Boto3 User in AWS For Programmatic Access

As a first step I make a new user in AWS's management console that I'll use in conjunction with the boto3 library to access my AWS account programmatically. Its considered a best practice to create a separate and specific user for use with boto3 as it makes it easier to track and manage.

To start I enter IAM in the search bar of the services menu and select the menu item.

Following that I click the Add user button.

On the following screen I enter a username of boto3-demo and make sure only Programmatic access item is selected and click the next button.

On the next screen I attach a permission policy of AmazonS3FullAccess then click the next button

Then click next until the credentials screen is show as seen below. On this screen I click the Download .csv button. I will need these credentials to configure Boto3 to allow me to access my AWS account programmatically.

Installing Boto3

Before writing any Python code I must install the AWS Python library named Boto3 which I will use to interact with the AWS S3 service.  To accomplish this I set up a Python3 virtual environment as I feel that is a best practice for any new project regardless of size and intent.

$ mkdir aws_s3
$ cd aws_s3 
$ python3 -m venv venv
$ source venv/bin/activate
(venv) $ pip install boto3

Configuring Boto3 and Boto3 User Credentials

With the boto3-demo user created and the Boto3 package installed I can now setup the configuration to enable authenticated access to my AWS account.  There a few different ways to handle this and the one I like best is to store the access key id and secret access key values as environment variables then use the Python os module from the standard library to feed them into the boto3 library for authentication.  There is a handy Python package called python-dotenv which allows you to put environment variables in a file named .env then load them into you Python source code so, I'll begin this section by installing it.

(venv) $ pip install python-dotenv

Following this I make a .env file and place the two variables in it as shown below but, obviously you'll want to put in your own values for these that you downloaded in the earlier step for creating the boto3 user in AWS console.

AWS_ACCESS_KEY_ID=MYACCESSKEYID
AWS_ACCESS_KEY_SECRET=MYACCESSKEYSECRET

Next I make a Python module named file_manager.py then inside I import the os and boto3 modules as well as the load_dotenv function from the python-dotenv package.  Following that I call the load_dotenv() function which will autofind a .env file in the same directory and read in the variable into the environment making them accessible via the os module.

Then I create a function named aws_session(...) for generating an authenticated Session object accessing the environmental variables with the os.getenv(...) function while returning a session object.

# file_manager.py

import os
import boto3

from dotenv import load_dotenv
load_dotenv(verbose=True)

def aws_session(region_name='us-east-1'):
    return boto3.session.Session(aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'),
                                aws_secret_access_key=os.getenv('AWS_ACCESS_KEY_SECRET'),
                                region_name=region_name)

I will then use this session object to interact with the AWS platform via a high-level abstraction object Boto3 provides known as the AWS Resource. When used in conjunction with my aws_session() function I can create a S3 resource like so.

session = aws_session()
s3_resource = session.resource('s3')

Creating an S3 Bucket Programmically with Boto3

I can now move on to making a publically readable bucket which will serve as the top level container for file objects within S3.  I will do this inside a function named make_bucket as shown below.

def make_bucket(name, acl):
    session = aws_session()
    s3_resource = session.resource('s3')
    return s3_resource.create_bucket(Bucket=name, ACL=acl)

s3_bucket = make_bucket('tci-s3-demo', 'public-read')

The key point to note here is that I've used the Resource class's create_bucket method to create the bucket passing it a string name which conforms to AWS naming rules along with an ACL parameter which is a string represeting an Access Control List policy which in this case is for public reading.

Uploading a File to S3 Using Boto3

At this point I can upload files to this newly created buchet using the Boto3 Bucket resource class. Below is a demo file named children.csv that I'll be working with.

name, age
Kallin, 3
Cameron, 0

In conjunction with good practice of reusability I'll again make a function to upload files given a file path and bucket name as shown below.

def upload_file_to_bucket(bucket_name, file_path):
    session = aws_session()
    s3_resource = session.resource('s3')
    file_dir, file_name = os.path.split(file_path)

    bucket = s3_resource.Bucket(bucket_name)
    bucket.upload_file(
      Filename=file_path,
      Key=file_name,
      ExtraArgs={'ACL': 'public-read'}
    )

    s3_url = f"https://{bucket_name}.s3.amazonaws.com/{file_name}"
    return s3_url

s3_url = upload_file_to_bucket('tci-s3-demo', 'children.csv')
print(s3_url) # https://tci-s3-demo.s3.amazonaws.com/children.csv

Here I use the Bucket resource class's upload_file(...) method to upload the children.csv file. The parameters to this method are a little confusing so let me explain them a little. First you have the Filename parameter which is actually the path to the file you wish to upload then there is the Key parameter which is a unique identifier for the S3 object and must confirm to AWS object naming rules similar to S3 buckets.

The upload_file_to_bucket(...) function uploads the given file to the specified bucket and returns the AWS S3 resource url to the calling code.

Uploading In-Memory Data to S3 using Boto3

While uploading a file that already exists on the filesystem is a very common use case when writing software that utilizes S3 object based storage there is no need to write a file to disk just for the sole purpose of uploading it to S3. You can instead upload any byte serialized data in a using the put(...) method on a Boto3 Object resource.

Below I am showing another new resuable function that takes bytes data, a bucket name and an s3 object key which it then uploads and saves to S3 as an object.

def upload_data_to_bucket(bytes_data, bucket_name, s3_key):
    session = aws_session()
    s3_resource = session.resource('s3')
    obj = s3_resource.Object(bucket_name, s3_key)
    obj.put(ACL='private', Body=bytes_data)

    s3_url = f"https://{bucket_name}.s3.amazonaws.com/{s3_key}"
    return s3_url


data = [
  'My name is Adam',
  'I live in Lincoln',
  'I have a beagle named Doc Holiday'
]
bytes_data = '\n'.join(data).encode('utf-8')
s3_url = upload_data_to_bucket(bytes_data, 'tci-s3-demo', 'about.txt')
print(s3_url) # 'https://tci-s3-demo.s3.amazonaws.com/about.txt'

Downloading a File from S3 using Boto3

Next I'll demonstrate downloading the same children.csv S3 file object that was just uploaded. This is very similar to uploading except you use the download_file method of the Bucket resource class.

def download_file_from_bucket(bucket_name, s3_key, dst_path):
    session = aws_session()
    s3_resource = session.resource('s3')
    bucket = s3_resource.Bucket(bucket_name)
    bucket.download_file(Key=s3_key, Filename=dst_path)

download_file_from_bucket('tci-s3-demo', 'children.csv', 'children_download.csv')
with open('children_download.csv') as fo:
    print(fo.read())

Which outputs the following from the downloaded file.

name,age
Kallin,3
Cameron,0

Downloading a File from S3 to Memory using Boto3

There will likely be times when you are only downloading S3 object data to immediately process then throw away without ever needing to save the data locally. Downloading data in this way still requires using some sort of file-like object in binary mode but, luckily the Python language provides the helpful streaming class BytesIO from the io module which handles in memory stream handling lke this.

To download the S3 object data in this way you will want to use the download_fileobj(...) method of the S3 Object resource class as demonstrated below by downloading the about.txt file uploaded from in-memory data perviously.

def download_data_from_bucket(bucket_name, s3_key):
    session = aws_session()
    s3_resource = session.resource('s3')
    obj = s3_resource.Object(bucket_name, s3_key)
    io_stream = io.BytesIO()
    obj.download_fileobj(io_stream)

    io_stream.seek(0)
    data = io_stream.read().decode('utf-8')

    return data


about_data = download_data_from_bucket('tci-s3-demo', 'about.txt')
print(about_data)

Prints the following data.

My name is Adam
I live in Lincoln
I have a beagle named Doc Holiday

Resources for Learning More

thecodinginterface.com earns commision from sales of linked products such as the books above. This enables providing continued free tutorials and content so, thank you for supporting the authors of these resources as well as thecodinginterface.com

Conclusion

In this How To article I have demonstrated how to set up and use the Python Boto3 library to access files transferring them to and from AWS S3 object storage. 

For completeness here is the complete source code for the file_manager.py module that was used in this tutorial.

# file_manager.py

import os
import boto3
import io

from dotenv import load_dotenv
load_dotenv(verbose=True)


def aws_session(region_name='us-east-1'):
    return boto3.session.Session(aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'),
                                aws_secret_access_key=os.getenv('AWS_ACCESS_KEY_SECRET'),
                                region_name=region_name)


def make_bucket(name, acl):
    session = aws_session()
    s3_resource = session.resource('s3')
    return s3_resource.create_bucket(Bucket=name, ACL=acl)


def upload_file_to_bucket(bucket_name, file_path):
    session = aws_session()
    s3_resource = session.resource('s3')
    file_dir, file_name = os.path.split(file_path)

    bucket = s3_resource.Bucket(bucket_name)
    bucket.upload_file(
      Filename=file_path,
      Key=file_name,
      ExtraArgs={'ACL': 'public-read'}
    )

    s3_url = f"https://{bucket_name}.s3.amazonaws.com/{file_name}"
    return s3_url


def download_file_from_bucket(bucket_name, s3_key, dst_path):
    session = aws_session()
    s3_resource = session.resource('s3')
    bucket = s3_resource.Bucket(bucket_name)
    bucket.download_file(Key=s3_key, Filename=dst_path)
 

def upload_data_to_bucket(bytes_data, bucket_name, s3_key):
    session = aws_session()
    s3_resource = session.resource('s3')
    obj = s3_resource.Object(bucket_name, s3_key)
    obj.put(ACL='private', Body=bytes_data)

    s3_url = f"https://{bucket_name}.s3.amazonaws.com/{s3_key}"
    return s3_url


def download_data_from_bucket(bucket_name, s3_key):
    session = aws_session()
    s3_resource = session.resource('s3')
    obj = s3_resource.Object(bucket_name, s3_key)
    io_stream = io.BytesIO()
    obj.download_fileobj(io_stream)

    io_stream.seek(0)
    data = io_stream.read().decode('utf-8')

    return data

 

As always, I thank you for reading and feel free to ask questions or critique in the comments section below.

Share with friends and colleagues

[[ likes ]] likes

Navigation

Community favorites for Python

theCodingInterface