Providing quality software engineering content in the form of tutorials, applications, services, and commentary suited for developers.
In this article I demonstrate how to use Python to perform rudimentary topic modeling and identification with the help of the GENSIM and Natural Language Toolkit (NLTK) libraries.
In this article I go over how to use Apache Flink Table API in Python to consume data from and write data to a Confluent Community Platform Apache Kafka Cluster running locally in Docker.
PyFlink is the Python API for Apache Flink which allows you to develop batch and stream data processing pipelines on modern distributed computing architectures.
In this article I demonstrate "How To" use Terraform to provision an AWS EMR Cluster along with establish Read Only S3 Bucket access for consuming data from another AWS Account.
In this article I review key characteristics and functionality of Apache Hive and how you can utilize Amazon Elastic MapReduce (EMR) to provision a Apache Hive Cluster for experimentation and big data processing and analytics.
In this How To article I demonstrate how to use the AWS CLI to create an Amazon Elastic Map Reduce (EMR) Cluster along with some common supplementary resources for experimentation and development on an EMR cluster.