Why Your Company Needs Stream Processing
In recent years, an increasing number of organizations have begun to feel the need to respond quickly to the flow of IT data. Stream processing of data in the system can fulfill this requirement.
From this article you will learn:
- the business application of stream processing.
- what stream processing is and how it works.
- what’s on offer from the systems available in the market.
Why your company needs stream processing
Virtually every organization aspires to make its business decisions based on reliable and current data. Decision makers faced a dilemma regarding how to quickly obtain information about the occurrence of a significant event in order to respond to it appropriately. Is it necessary to laboriously review various data sources? How about building a mechanism that notifies the parties involved when an event occurs? Or maybe the system itself will be able to identify an emergency and respond to it?
The implementation of a stream processing system may be the solution to the above questions. In order to get a good understanding of the practical possibilities offered by such systems, their popular applications are presented below:
Sales analysis in an online store – Let us assume that the web store exchanges messages on orders and prices with the CRM system and on stock balances with the warehousing system (e.g., via http and REST API). Each of these message types creates a stream (order stream, price stream, balance stream). By analyzing these streams together in a correlated manner, the stream processing system can send alerts about significant events (e.g., an e-mail to a person responsible). Such significant events could be, for example, a significant drop in sales over an assumed time, increase in sales causing the stocks to shrink too fast, etc.
Failure prediction – the correlation of data from many different sensors in the device (e.g., a significant increase in temperature combined with voltage spikes indicates that there is a high probability of failure).
Identification of financial fraud – real-time analysis of transactions to identify potential fraud.
Fleet management – the option to redirect/re-route a vehicle in the event of traffic jams or change the delivery address (in real time based on GPS location and data from other systems).
Sports analytics – analyzing the dynamics of players during a soccer game and notifying the coach that a player is tired and needs a substitution.
Ad optimization – correlation of user behavior on the website (clicks) and social media data, enabling real-time adjustment of the displayed ads.
What Is Stream Processing?
To explain what stream processing is, let us start with defining a data stream. It is a set or sequence of data (messages) describing the occurrence of an event. For example – if the event is the receipt of an order in an online store, then a sequence of messages about each order will create a data stream (also called a message stream).
Events in such a data stream may concern any area of the company’s operations. The event can be, for instance, ATM withdrawals, temperature measurement using a sensor in a device, delivery of goods, information about a car’s location (GPS), device failure, etc. The obvious expectation in an organization is to correlate such streams, analyze them and – if necessary – perform some actions.
The traditional approach to analyzing and responding to data as it appears in an organization most often involves storing it in some resource (database, file system, etc.) and performing analyses or queries on a larger, established set of data.
However, sometimes this approach may not be sufficient (e.g., because of the response time). Stream processing involves immediately processing data “on the fly” when a certain event (or correlation of certain events) occurs.
There are many different terms on the web describing systems that implement this approach. They may be called: SP (Stream Processing), CEP (Complex Event Processing), ESP (Event stream processing) and BAM (Business activity monitoring). Some of these abbreviations stand for the same product, and there are certain differences between some of them. There are also terms that define how mature this type of solution is – e.g., stateful or complex processing of streaming data. In this article, we will focus on the capabilities of mature solutions in this class.
What Do Vendors Offer?
There are many stream processing solutions in the market. On one hand, there are commercial solutions, usually from such renowned suppliers as IBM, Cisco, Oracle or Microsoft. On the other hand, there are several interesting solutions available under open-source licenses that often are not inferior to their commercial counterparts. Products such as Apache Flink, Spark Streaming, Apache Samza, Apache Storm, WSO2 Analytics can be mentioned here. I would like to present the capabilities of stream processing systems from the technical side, using the example of the latter product.
WSO2 Analytics, as the name suggests, is a product of the integration solutions provider WSO2. An important feature of this vendor is making all their products available in full versions under an open-source license (only manufacturer support is paid). Below, you will find the characteristics of selected WSO2 Analytics stream processing functionalities:
- Analysis of all kinds of correlations between different event streams using an SQL-based query language (Siddhi) with all features specific to stream processing (filters, time/quantity windows, stream merging, patterns, sequences, extensions).
- The option to enable the persistence of received events (e.g., in the database) and using them as part of executed queries.
- High processing efficiency reaching (in the simplest cases) 900,000 messages per second, with average latency of 0.9 milliseconds (on 2 machines with 8 vCPU and 16GB RAM).
- The option to define different receivers and publishers, such as HTTP, TCP, Kafka, Email, JMS, RabbitMQ, MQTT.
- To option to run the system in different modes in a high availability environment (HA).
- Embedded specialized analytics, such as machine learning mechanisms for autonomous learning.
- Additional elements such as:
– dashboard – enabling the presentation of real-time data in the form of various charts and statistics,
– monitoring – enabling the monitoring of all processes included in the solution,
– business rules – enabling the configuration of queries and rules triggering particular actions by business users without technical knowledge. - One coherent solution enabling secure access to all components (development, dashboard, monitoring, business rules) via a graphical interface accessed using a web browser.
If you see the potential in stream processing and want to design its implementation for your business, consult an expert who will help you walk the design path successfully. Consulting a specialist can be crucial, as even open-source solutions require the skills and experience necessary to launch and maintain such solutions in the production environment.
If you want to learn more about API management, take a look at our system integration offer!