Providing quality software engineering content in the form of tutorials, applications, services, and commentary suited for developers.
In this article I demonstrate using a Python based AWS Lambda SAM project with the AWS Data Wrangler Lambda Layer to perform data format translation from GZipped JSON files into Parquet upon an S3 upload event.
When it comes to quantitative analysis on data in database tables standard SQL provides a set of aggregate functions like SUM(), MAX(), and MIN(). There are two main ways these functions get used in practice: (i) collapsing the table data down to represent the aggregate calculation result set or, (ii) presenting the aggregate calculation per row maintaining the granularity of the complete table. Windowing functions are used to accomplish this second option and will the focus of this article.
Machine Learning is capturing significant attention among technologists and innovators due to a desire to shift from descriptive analytics focused on understanding what happened in the past towards predicting what is likely to occur in the future as well as prescribe actions to take in response to that prediction. In this article I focus on the use case of classifying email messages as either spam or ham with supervised machine learning using Python and SciKit Learn.
In this article I dive into partitions for S3 data stores within the context of the AWS Glue Metadata Catalog covering how they can be recorded using Glue Crawlers as well as the the Glue API with the Boto3 SDK.
In this article I give a high level, example driven, overview of writing data processing programs using the Python programming language bindings for Spark which is commonly known as PySpark. I specifically cover the Spark SQL DataFrame API which I've found to be the most useful way to write data analytics code with PySpark. The target audience for this article are Python developers, ideally who have a cursory understanding of other popular PyData Stack libraries such as Pandas and Numpy.
In this article I demonstrate how to provision Linux based EC2 Virtual Private Servers along with Elastic Block Storage (EBS) Volumes using Terraform. Then I complete the use case by demonstrating how to mount the EBS volume devices on Linux XFS filesystems.