I have been working with Amazon Kinesis Data Firehose for over a year, and I'm pretty happy with the way how it works. In a nutshell, it's a service that does writing to Amazon S3 with custom data transformation and buffering rules. My current use case is simple -- write events into Amazon S3 for further processing with Apache Spark.

Unfortunately, the more events I have, the more small files land on S3, and processing time with Spark slow down.

That is where Fireblender comes in. The idea of the project is simple -- given time range, join all data files into bigger chunks to allow faster processing and return URL.

I have started this project with a sample data generator called fireblender-datagen. It will be extended with additional data sets and generation strategies.

For the main part of the Fireblender, I want to start simple and focus on binary files concatenation and run all code from AWS Lambda. The first step will be the choice of underlying technology -- I have limited my choice to Python, C#, and Go as programming languages. I'm going to test raw file processing performance and then do the same operations with S3 integration.


Starting a blog is a big commitment, and that's why it's not a blog.

This page is a personal log of projects, ideas, and experiments that I'm doing. Some entries might be valuable, while other ones might be completely useless.

This page uses next.js, mdx, Tailwind CSS (inspired by Josh Comeau). I'm far from being an expert in those technologies and this is an opportunity to get better.

Topics on this page will include software engineering, data processing, 3d printing, woodworking, and other stuff that I find interesting.