Lambda Architecture is a data processing design pattern designed for Big Data systems that need to process data in near real-time. This pattern works very well any Big Data solutions; including the Internet of Things (IoT). Once device event telemetry is ingested from thousands (or even millions) of IoT Devices, the processing of this data becomes a Big Data problem to solve. Lambda Architecture provides a technique for building a single system that can processing this near real-time streaming data, while simultaneously providing the ability to store and batch process the data using more traditional data processing techniques.
Data Paths in a Lambda Architecture
At the core of the Lambda Architecture is the idea that when one or more data streams are ingested, the data will be split into two data paths. Each of these paths can receive any subset or even all the data from the input stream(s). These two paths are referred to as the Hot Path and the Cold Path.
Hot Path of the Lambda Architecture
The Hot Path in a Lambda Architecture is where data from the input stream(s) is processed in near real-time. This path is also called the Fast or Real-Time path. This data path is useful for all systems that need to process data and take some kind of action in real-time.
The Hot path is especially great for solutions that depend on real-time data processing, such as Internet of Things (IoT) or any other solutions that needs to reach extremely high scalability.
Cold Path of the Lambda Architecture
The Cold Path in a Lambda Architecture is where data from the input stream(s) is stored and then processed at some later time. This path is also called the Batch or Slow path. This data path is the more traditional path where data is ingested, then at some later time the data is processed.
The Cold path is great for all types of solutions. There is always a reason to archive data, and generate reports based on historical data. You may also need to take the data that’s been stored from a solution and feed it into a Machine Learning model to gain unforeseen insights at a later time.
Hot + Cold Path
You can achieve a sort of “no compromises” solution by combining these two (Hot and Cold) data paths in a single system. The result is a solution that offers the benefits of both real-time data processing and decision making, combined with archival storage and reporting capabilities. Many modern software systems need to meet business requirements that ask for capabilities that are only possible by combining both Hot and Cold data paths in a single solution.
Older software architectures would traditionally rely solely on a cold data path being the primary data ingestion pattern. These systems would ingest data in batch on a scheduled interval; such as nightly, weekly, or monthly. Then all reports and other types of data processing would be generated on a scheduled interval as well. These system would always have delays in data processing that would cause frustration in business users or customers during certain scenarios.
Newer, more modern, software architectures look to process data and give insights to the users in real-time; or at least near real-time. The only way to implement this is to use a Hot data processing path. This enables the data to be processed in near real-time, and then reports, summaries, and actions can be delivered in near real-time as well. This eliminates the potential delays for business users and customers that may impact their overall job performance and use of the software.
When the Hot and Cold data paths get combined into a single software system, the benefits of near real-time and batch processing can be realized on the same data. There are cost implications of processing data in real-time that may be greater than batch processing at a later date. Additionally, not all data is required to be stored for batch processing, and not all data needs near real-time decision and action to be performed on.
Lambda Architecture Data Processing
When implementing a Lambda Architecture into any Internet of Things (IoT) or other Big Data system, the events / messages ingested will come into some kind of message Broker, and then be processed by a Stream Processor before the data is sent off to the Hot and Cold data paths. This design pattern does add some complexity to the overall solution architecture, but the benefits of both Hot and Cold data paths in a single solution are greatly advantageous.
As seen in the above diagram, the ingested data from devices or other sources is pulled into a Stream Processor that will determine what data to send to the Hot path, Cold path, or even Both paths. Once the data is sent to the Hot or Cold path, then there will be different applications or components that will be processing the data for that particular path.
There are several components that make up a Lambda Architecture. Here are the most common components that all software solutions implementing a Lambda Architecture will utilize:
- Broker – The broker is the ingestion point of data to the system. This is generally a message queue service of some kind that will receive any events / messages from downstream components of the solution; such as IoT Devices. Service used as a broker for the solution will need to be chosen based on the scalability requirements and data throughput necessary for this component of the solution.
- Stream Processor – This is a core component of the Lambda Architecture that routes data from the Broker to one or both of the output data paths. Generally, the stream processor will perform some kind of intermediate processing of the data before passing it on to the Hot or Cold data paths. The intermediate processing could involved aggregating summary data to pass on, or even integrating Machine Learning and other logic directly within the Stream Processor implementation.
- Real-Time (Hot) Path – This is not a component really, but rather a communication medium. Generally, data is sent directly from the Stream Processor to any Action component(s) in the solution. There are, however, times when an intermediate storage location is necessary before the data is handled by the Action component(s). For example, the data could be send to another Message Queue or NoSQL Database first.
- Batch (Cold) Path – This is not a component really, but rather a communication medium. Generally, data is sent directly from the Stream Processor to any Storage component(s) in the solution.
- Action – These are any components that will be processing the data and possibly taking action on it in real-time. The logic for taking action could be look at simple data properties, or even making calls to a Machine Learning model to make predictions that are used for action.
- Storage – These are any components that store the data for later batch processing. The storage mechanisms used could be storing the data as files on disk, or in some other database management system.
Azure Services for Lambda Architecture
There are many different Microsoft Azure services that can be used for various components of a Lambda Architecture. Microsoft Azure actually offers multiple options to choose for each of the Lambda Architecture components. This multitude of options offers the flexibility to design the correct Lambda Architecture your solution requires.
For the Broker component that provides the ingestion point of events / messages into the Lambda Architecture, these are a few options for different levels of scalability and features within Microsoft Azure:
- Highly Scalable Ingestion Services
- Azure Event Hub
- Azure IoT Hub
- Enterprise Messaging Services
- Azure Service Bus (Queues / Topics)
For the Stream Processor component that does some intermediate processing of the data stream(s) and then directs data out to the different data paths, Microsoft Azure has a couple options available; each with different features:
- Azure Stream Analytics
- Azure HDInsight Spark Streaming
For the Action component(s) that provides the compute for real-time processing of data on the Hot data path, Microsoft Azure has many options. Here are a few options available to choose from:
- Azure Functions
- Azure Logic Apps
- PaaS (Platform-as-a-Service)
- Azure App Service WebJobs
- IaaS (Infrastructure-as-a-Service)
- Custom code running on a Virtual Machine
For the Storage component(s) that provide storage and batch processing capabilities, Microsoft Azure has many options. Here are a few options available to choose from:
- Azure SQL Database
- Azure Cosmos DB
- Azure Data Lake
- Azure Service Bus
- Batch Processing
- Microsoft Power BI
If you’re looking to use a Lambda Architecture for building an Internet of Things (IoT) solution, then the Broker of choice should certainly be Azure IoT Hub. Then from there, you have a few options to choose from for the other components of the architecture. If you stick with highly scalable and serverless services for a more modern architecture, then Azure Stream Analytics, Azure Functions, and Cosmos DB will likely be great choices. Here’s a diagram that shows some of these options with the IoT and Serverless options highlighted:
There are obviously MANY more services and applications that can be run in Microsoft Azure to meet the needs of the various components of a Lambda Architecture. The services and options listed above should give you some ideas of what options to research more for building your own solutions using the Lambda Architecture design pattern.