Providing quality software engineering content in the form of tutorials, applications, services, and commentary suited for developers.
PyFlink is the Python API for Apache Flink which allows you to develop batch and stream data processing pipelines on modern distributed computing architectures.
In this article I demonstrate "How To" use Terraform to provision an AWS EMR Cluster along with establish Read Only S3 Bucket access for consuming data from another AWS Account.
In this article I review key characteristics and functionality of Apache Hive and how you can utilize Amazon Elastic MapReduce (EMR) to provision a Apache Hive Cluster for experimentation and big data processing and analytics.
In this How To article I demonstrate how to use the AWS CLI to create an Amazon Elastic Map Reduce (EMR) Cluster along with some common supplementary resources for experimentation and development on an EMR cluster.
Here I present an end-to-end example of a Serverless event driven architecture using Confluent Cloud for stream processing paired with AWS Lambda for event responsive logic using the Serverless Application Model (SAM) framework. Together this architecture will compose a system for fictitious financial stock quote email alerting.
In this How To article I demonstrate setting up a Docker Compose based implementation of the Community Components of the Confluent Platform complete with the kafka-connect-datagen plugin for Kafka Connect to generate test and/or developement data useful for working with Kafka.