Blog | The Coding Interface

Serverless Conversions From GZip to Parquet Format with Python AWS Lambda and S3 Uploads

In this article I demonstrate using a Python based AWS Lambda SAM project with the AWS Data Wrangler Lambda Layer to perform data format translation from GZipped JSON files into Parquet upon an S3 upload event.

By Adam McQuistan on 04/05/2021

AWS Serverless Application Model AWS AWS-Lambda AWS S3 Python

How To Use Window Functions in SQL

Data Engineering

When it comes to quantitative analysis on data in database tables standard SQL provides a set of aggregate functions like SUM(), MAX(), and MIN(). There are two main ways these functions get used in practice: (i) collapsing the table data down to represent the aggregate calculation result set or, (ii) presenting the aggregate calculation per row maintaining the granularity of the complete table. Windowing functions are used to accomplish this second option and will the focus of this article.

By Adam McQuistan on 03/29/2021

Data Engineering PostgreSQL Databases

Intro to Machine Learning with Spammy Emails, Python and, SciKit Learn

Machine Learning

Machine Learning is capturing significant attention among technologists and innovators due to a desire to shift from descriptive analytics focused on understanding what happened in the past towards predicting what is likely to occur in the future as well as prescribe actions to take in response to that prediction. In this article I focus on the use case of classifying email messages as either spam or ham with supervised machine learning using Python and SciKit Learn.

By Adam McQuistan on 03/19/2021

SciKit Learn MachineLearning Python

Managing S3 Data Store Partitions with AWS Glue Crawlers and Glue Partitions API using Boto3 SDK

Data Engineering

In this article I dive into partitions for S3 data stores within the context of the AWS Glue Metadata Catalog covering how they can be recorded using Glue Crawlers as well as the the Glue API with the Boto3 SDK.

By Adam McQuistan on 03/16/2021

AWS Glue AWS AWS S3 Python

Example Driven High Level Overview of Spark with Python

Data Engineering

In this article I give a high level, example driven, overview of writing data processing programs using the Python programming language bindings for Spark which is commonly known as PySpark. I specifically cover the Spark SQL DataFrame API which I've found to be the most useful way to write data analytics code with PySpark. The target audience for this article are Python developers, ideally who have a cursory understanding of other popular PyData Stack libraries such as Pandas and Numpy.

By Adam McQuistan on 03/12/2021

PySpark Data Engineering Python

Terraform for EC2 and Elastic Block Storage: Provisioning, Attaching and Mounting EBS on Linux

DevOps

In this article I demonstrate how to provision Linux based EC2 Virtual Private Servers along with Elastic Block Storage (EBS) Volumes using Terraform. Then I complete the use case by demonstrating how to mount the EBS volume devices on Linux XFS filesystems.

By Adam McQuistan on 03/08/2021

AWS sysadmin Ubuntu Linux DevOps

Introduction to Redshift using Pagila Sample Dataset Including ETL from Postgres using AWS Glue

Data Engineering

In this article I give a practical introductory tutorial to using Amazon Redshift as an OLAP Data Warehouse solution for the popular Pagila Movie Rental dataset. I start with a basic overview of the unique architecture Redshift uses to accomplish its scalable and robust use case as an enterprise cloud data warehouse. Then armed with this basic knowledge of Redshift architecture I move on to give a practical example of designing a schema optimal for Redshift based off the Pagila sample dataset.

By Adam McQuistan on 03/05/2021

AWS Glue Redshift Data Engineering AWS AWS S3 psql PostgreSQL Databases DevOps Python

Building Data Lakes in AWS with S3, Lambda, Glue, and Athena from Weather Data

Data Engineering

In this aricle I cover creating rudimentary Data Lake on AWS S3 filled with historical Weather Data consumed from a REST API. The S3 Data Lake is populated using traditional serverless technologies like AWS Lambda, DynamoDB, and EventBridge rules along with several modern AWS Glue features such as Crawlers, ETL PySpark Jobs, and Triggers.

By Adam McQuistan on 02/25/2021

PySpark AWS Glue Data Engineering AWS Serverless Application Model AWS AWS-Lambda DevOps Python

Exploring Online Analytical Processing Databases plus Extract, Transform and, Load in PostgreSQL

Data Engineering

In this article I give an introduction to Online Analytical Processing databases comparing them against traditional Online Transaction Processing Systems. Emphasis is put on designing and building Star Schemas and Reporting tables using Data Engineering processes like Extract, Transform and Load all within a Aurora PostgreSQL database.

By Adam McQuistan on 02/17/2021

Data Engineering AWS PostgreSQL Databases DevOps

Keeping Python AWS Serverless Apps DRY with Lambda Layers

Serverless

In this article I demonstrate how utilize Lambda Layers to share reuable code among multiple Python AWS Lambda functions within AWS Serverless Application Model (SAM) applications.

By Adam McQuistan on 02/11/2021

AWS Serverless Application Model AWS AWS-Lambda BeautifulSoup requests DevOps

theCodingInterface

theCodingInterface

Serverless Conversions From GZip to Parquet Format with Python AWS Lambda and S3 Uploads

How To Use Window Functions in SQL

Intro to Machine Learning with Spammy Emails, Python and, SciKit Learn

Managing S3 Data Store Partitions with AWS Glue Crawlers and Glue Partitions API using Boto3 SDK

Example Driven High Level Overview of Spark with Python

Terraform for EC2 and Elastic Block Storage: Provisioning, Attaching and Mounting EBS on Linux

Introduction to Redshift using Pagila Sample Dataset Including ETL from Postgres using AWS Glue

Building Data Lakes in AWS with S3, Lambda, Glue, and Athena from Weather Data

Exploring Online Analytical Processing Databases plus Extract, Transform and, Load in PostgreSQL

Keeping Python AWS Serverless Apps DRY with Lambda Layers

Navigation

Categories

Favorites

Tags

OAuth 2.0 and Open ID Connect Cheat Sheet

How To Construct an OpenCV Mat Object from C++ Arrays and Vectors

Implementing a Serverless Flask REST API using AWS SAM

How To Use Window Functions in SQL

JavaFX with Gradle, Eclipse, Scene Builder and OpenJDK 11: Java Coded Components

Aurora PostgreSQL Slow Query Logging and CloudWatch Alarms via AWS CDK

Setting Up OpenCV for C++ using CMake and VS Code on Mac OS

Django Authentication Part 1: Sign Up, Login, Logout

How To Upload and Download Files in AWS S3 with Python and Boto3

Bridging Node.js and Python with PyNode to Predict Home Prices

Building a Text Analytics App in Python with Flask, Requests, BeautifulSoup, and TextBlob

Django Authentication Part 2: Object Permissions with Django Guardian

theCodingInterface