Why?

For several reasons, professionally personal, as well as technical.

  • To apply all the Data skills I acquired at Oktopus S.A. and then at Edda Luxembourg S.A. after our merger.

  • To demonstrate my Infra/DevOps skills that I’ve never applied professionally in a project.

  • To serve as a demonstration and training material for all the tools presented.

As mentioned, the platform in this series of articles serves as a demonstration, with minimal configuration, to show that you can develop on your machine before deploying to a dev/test/quality/prod environment.

Another post, following this series, will showcase a secure data platform with centralized access rights management.

How?

Speaking of tools, I will present and use the following technologies:

Kubernetes
  • Kind, to set up a k8s dev/demo cluster

  • Helm, to deploy pre-packaged applications into this cluster

  • Helmfile, to store the configuration of Helm releases

Infra
  • PostgreSQL, a classic and performant database

  • Minio, an on-premise object storage that replaces Amazon S3

  • Cert-manager to automatically issue SSL certificates, Trust-manager to distribute the trust chain automatically

  • Traefik, a reverse-proxy with dynamic configuration

  • A container image registry, useful for some of the tools

Data

It is evident that the choice has been made in favor of

  • the use of containers and Kubernetes clusters, to facilitate deployment and scalability of applications

  • open-source or semi open-source tools, to ensure flexibility and scalability

  • the use of lakehouse, to separate storage and computing resources, improving performances and scalability of analytic workload

Use cases

For demonstrating this platform, I’ll use two use cases. First one will be to take movies data from IMDb, with csv files freely available. Second will be using structured data storing status of EV stations of Chargy brand in Luxembourg. I’m storing snapshots each 5 minutes since a year and a half. The Postgres data I’m providing is only for one month, as the whole dataset is too big to store in a docker image open to the public.

Overview of my data platform

My data platform

Let’s get started!

Links to the series' posts