Interview: Our CTO Fritz on 'Big Data' and Scalable Architectures
Fritz Richter is co-founder and CTO at adsquare. He’s responsible for platform development and leads the technology and data science departments. A born-and-bred Berliner, he is considered a guru in the field of scalable backend architectures and is an expert when it comes to big data, which is exactly why we sat down with him to pick his brains. Here are Fritz’s insights on what he and his team do best:
Describe adsquare in 3 sentences
adsquare is Europe’s leading data provider for mobile programmatic advertising. Our platform supercharges data-driven targeting. With our solution, advertisers and agencies can leverage data to reach their desired audiences and meet campaign goals and on the other side, publishers and third-party providers can onboard and monetize their data.
What are the main technical challenges your platform is facing?
There are a quite a few! First of all, programmatic advertising is a low-latency environment. We enrich the so-called bid stream with our data in real-time. We need to be extremely fast here hence our servers have to respond within 5miliseconds – always.
Secondly, our partners operate on a global scale so in order to fulfill the latency requirements, we run our services all around the world in more than 6 data centres. These are in USA, UK, Netherlands, Canada, France and Germany.
Last but not least, we are in every sense of the word a “Big Data company”. We process tens of thousands of requests every single second. We divide the physical world in to multi-dimensional 50×50 meter squares, attaching our local context data to these and enriching them with insights about consumer’s real-world behaviour. This results in billions of data points, which we access in-memory. We process 200 GB of data every day, and this number is increasing continuously.
Let’s talk about your tech stack
We run a fault-tolerant and highly distributed software architecture. The core of our platform is developed in Java and Scala. Our data science team is using Python and Spark to analyze data and create new algorithms. We decided to encapsulate business-logic into Micro-Services which communicate via REST JSON APIs.
As we have many different use cases for storing data, we are following Polyglot persistence approach. Our central data storage is Cassandra, but we are also using MongoDB, Postgres and MySQL for different types of data.
As response times must be fast, we are using distributed InMemory Databases such as Couchbase, which are replicated across the world.
You mentioned big data – how do you deal with that?
We are a young company and we had a variety of tools and technologies to choose from when we started. We are really proud of the fact that we were able to adopt so many cutting-edge technologies. Data is transmitted via Kafka, encoded with Snappy and AVRO and stored as Parquet Files in our HDFS Cluster. Every single ‘Event’ is processed in our Spark Streaming cluster. We went far beyond using a single Map/Reduce job as we developed heaps of Spark and Storm applications.
Which DevOps tools are you using?
We try to automate as much as possible. Servers are configured via Puppet and Micro-Services are deployed as stand-alone spring-boot applications with embedded Jetty servlet container. Services register themselves via Service Discovery (Consul).
The continuous integration process is built on top of GIT, TeamCity, Puppet and custom scripts. We just integrated DataDogs as central monitoring system, which gives us a perfect overview about our servers and services – especially as we are running over 70 servers right now!Back to the blog
Platform & Support
Copyright 2019 adsquare