Skip to content
cnu.name
Go back

Real-time log analytics using Probabilistic Data Structures in Redis

PyData Delhi 2017 • Delhi, India • October 25, 2017
View Slides →

There are two ways to solve any problem: Accurately or approximately. Accurate data structures has its disadvantages – too much memory usage and unscalable for real-time nature of data. In this talk I explained how to take advantage of the newly release Redis 4.0 with pluggable modules to build a data pipeline which uses probabilistic data structures to get real-time insights.

There are different insights and metrics that could be obtained from log events data. Processing the data in real-time and getting accurate results are possible in theory. In practice, not so easy.

Not all results and metrics need to be accurate. There are places where the tradeoff between accuracy and memory usage/scalability is worth it. That is where probabilistic data structures (PDS) come in. In this talk I explained about different PDSs and how they work. And I also talked about how to use Redis and it’s pluggable module system to use these data structure much more efficiently.

Problem: Parsing high volume & velocity log event data.

Proposal Link.

Slides