Sensor data from various sensors, such as temperature or pressure sensors, starts as analog data and then gets converted into digital streams. It is then aggregated and sent downstream to systems for in-depth processing. Data Acquisition Systems (DAS) typically do these conversation and aggregation functions on-board the Edge Gateways. The DAS connects to the sensor network using various industrial protocols such as ModBus and OPCUA, aggregates data, converts the data from analog into digital streams and then forwards it to data processing systems. The Edge Gateway receives the data from the DAS and then, using a cloud connector, sends the data to Tier 3 cloud systems for further processing.

Edge Gateway 

The Edge Gateway is in close proximity to the sensors and actuators and it is usually located on a customer site or on the equipment itself. For example, a wind turbine might contain a hundreds sensors and actuators that feed data into an Edge Gateway. The Edge Gateway then collects, digitizes, and forwards the data to the cloud and, sometimes, may also have edge analytics which can take detect patterns and anomalies at the edge itself. Edge Gateways usually support various types of protocols, such as Modbus and OPCUA to connect with the various types of industrial sensors, as can be seen in the following diagram:

Edge Gateway that connects to the device and does some basic pressing before forwarding data to the cloud

Edge Gateway can also pre-process data by applying functions for filtering and resampling of data. The data generated by sensors creates large volumes of data rapidly and, by pre-possessing the data, the gateway can only send relevant data to cloud. Analog sensors, such as temperature, voltage, pressure, and vibration can generate huge volumes of continuous streams of data that are changing rapidly. As an example, a jet engine can have thousands of sensors, and these sensors can generate ~40 TB of data during aircraft takeoff or landing alone. Pre-processing data on the Edge is therefore beneficial in many cases.

Edge Gateways can build basic gateway functionality by adding such capabilities as security, analytics, malware protection, and data management services, which are collectively known as fog computing. Fog computing enables the analysis of data streams in near realtime by running machine learning algorithms directly onto the Edge Gateway. The Edge Gateway is the perfect junction where there are enough compute, storage and networking resources to mimic cloud capabilities at the edge, and hence support the local ingestion of data and the quick turnaround of results:

Fog computing versus edge computing versus cloud computing aspects


Edge Gateways are edge devices that are deployed on customer premises or customer data centers. In the wind turbine example, if you have a wind farm with many hundreds of turbines and you want to process data on-premises, you might have instant data at each turbine. You would then aggregate the information to create a wind farm wide view and pass the data on to the cloud for a company-wide view. Edge Gateway device hardware footprints are very diverse and supported in various form factors to support specific deployment needs. Most of the manufacturers will support many different specifications, all the way from onboard gateways to specialized servers called Edge Nodes. All of them have typically common characteristics, such as ease of deployment, being rugged, and the ability to support remote management of the devices using edge managers.

Because IIoT data is voluminous, it can easily consume large amounts of network bandwidth, it needs huge storage space and can swamp the resources in Cloud. It’s optimal to intelligent Edge Gateways which are capable of performing analytics as a way to get immediate insights. Alternatively, we can apply different time-based filtering such as a Bloom filter. Adding this type of filtering can reduce the resource needs and hence it can be much more efficient. By adding an intelligent Edge Gateway, or Edge Node, we can pre-process the data, run appropriate Edge Analytics to gain immediate insights, and then pass data to the cloud for further processing and correlations across multiple sites, and so on. For example, a power company may have multiple wind farms, and processing in each of these individual farms can happen at the site using the Edge Gateway or Nodes. In addition, these Nodes can send relevant data to the cloud for further processing to gain insights and comparisons between farms.

Edge Analytics is an up-and-coming field, whereby instead of sending the data to the cloud for processing and gaining insights, data is processed near the source on the Edge Gateway or Node. For example, machine learning algorithms, such as anomaly detection, can be applied to sensor data to scan for anomalies that can identify many different maintenance problems of the machines, thereby preventing any downtime, as long appropriate action is taken. In the next section, we will cover different ways to connect to the cloud to send data to the cloud.

Cloud Connectivity 

Connectivity from the edge to the cloud is essential and there are many ways to securely connect to the cloud. Typical infrastructure includes mutual authentication, which is essentially two-way authentication between the Edge Gateway and the cloud, at the same time using a certificate. Once the secure connection is established, the communication typically happens using one of the asynchronous protocols such as WebSockets, binary protocols using Event Hub, MQTT, STOMP, and HTTP 2. In addition to the connectivity,  typically an Edge Manager manages all the connected Edge devices to provide dashboards for operations capability. 

MQTT Communication

MQTT stands for Message Queuing Telemetry Transport. It is a lightweight messaging and bi-directional protocol that works well in resource constrained network scenarios, such as low network reliability or low-bandwidth, or high-latent clients. It provides a simple way to send telemetry information between devices, or from devices to cloud. The protocol uses a publish/subscribe communication paradigm, and is used for Machine-to-Machine (M2M) communication and is widely adopted in the IoT.

MQTT was originally developed by IBM to do M2M communication and is currently widely adopted in many different applications, such as messaging services and IoT applications. MQTT is very lightweight and requires only a 2 byte header and supports a payload of up to 256 MB of data. The format of the data is application specific. MQTT defines three levels of Quality of Service (QoS). QoS defines how the messages are delivered between the publisher and subscriber by the message broker. Publishers and subscribers can choose the type of QoS they would like and the broker will make sure that level of QoS is adhered. The first QoS is fire and forget, whereby the publisher sends the message to the broker and the broker sends it to the subscribers. However, there is no guarantee that the subscribers receive the messages correctly. The second QoS is at least once. In this case, the broker retries delivering the messages. They then get an acknowledgement from the subscriber that they have received the messages. The third QoS is exactly once, whereby the subscriber is guaranteed to get the message only once. MQTT is ideally suited for IoT applications due to its characteristics, such as very light overheads, flexible payloads, and various levels of QoS.

In considering MQTT for cloud connectivity, it is good to know some of the limitations as well. MQTT is primarily designed for use within an enterprise behind the firewall and mainly for communication between the devices. Hence, using it for connecting to the cloud will add security overheads, such as adding authentication headers and SSL/TLS encryption using client certificates. Even after adding these security measures, it is difficult to prevent unauthorized publishers from publishing to an MQTT topic (that is, anyone with the authentication credentials can publish messages to the topic and we need additional mechanisms to enforce authorization). Another limitation is a lack of interoperability in the message structure since it is open-ended and hence specific to a given application. It is also difficult to scale MQTT to many devices, and MQTT does not lend itself well for transfer of a large amount of data, as, for example, in a sensor data bulk ingestion.

WebSocket communication 

WebSocket cloud connectivity is another option to consider for IoT applications and solves some of the issues with MQTT. WebSockets is designed to be complementary to HTTP,  yet it is closer to TCP and lean—it only uses 2-6 bytes of headers for data transmission. WebSocket leverages the existing HTTP infrastructures such as servers, proxies, and security headers for authentications/authorizations. WebSocket is a bi-directional, full duplex, and low-level communication protocol that runs on top of TCP. A WebSocket starts off as a standard HTTP connection but then gets upgraded to a WebSocket connection to create a persistent TCP connection with the server using a single socket. WebSocket is part of the HTML5 standard. Many of the modern browsers support WebSocket protocols natively and web applications can use the JavaScript library to use   WebSocket communications. WebSocket outshines HTTP in the development of real-time, event-driven, and low latency applications and will be a great fit for IoT use cases.

Although WebSocket enables us to use the HTTP security infrastructure for the IoT devices, it does not eliminate interoperability issues since the messaging format is not defined by WebSockets, and it is up to the application to utilize standards such as JSON. WebSocket also requires a full web client to run on the device, which may not be possible in all situations due to device hardware footprints limitations. WebSocket is also less reliable than a messaging protocol such as MQTT and it is difficult to scale WebSockets servers as the load increases. Scaling WebSockets using a load balancer is a complex task.

MQTT over WebSockets

Another possibility is to use MQTT over WebSocket. This scenario is frequently used in IoT web application to display sensor data outputs in a time-series graphs, and so on. Also, this combination provides us the best of both worlds, such as utilizing the HTTP infrastructure for ports, security, and so on, in addition we can bring the Publish/Subscribe mechanism which is lacking in WebSockets but is the basis of MQTT. Also, MQTT provides a variety  of quality services as well.

This combination lends itself well to a larger footprint device or gateway and still lacks the messaging interoperability, but can scale well to support many devices. Another possibility is to use WebSocket with other messaging infrastructures such as Kafka, which is a common pattern in large scale deployment and can scale well to support a large volume of data ingestion and large numbers of subscribers.

Event/Message hub-based connectivity

The Event/Message hub Publish-Subscribe connectivity is another option for IoT application, specifically if the volume of data ingestion is huge and the number of clients is also large. Event Hub uses Kafka messaging infrastructure and can operate on top of many different connectors, such as WebSocket, HTTP, gRPC, MQTT, and so on. The best case architecture is to use WebSocket with message format in JSON to connect to the Kafka service, or you can use gRPC (HTTP/2 Streaming protocol) with protobuf messaging protocol. Event hub is built to be secure, massively scalable, fault tolerant, and language-agnostic. As can be seen in the following example, the Event Hub acts as a cloud connector and queuing system, which provides durability for many producers (such as devices that can produce time-series information) and forwards it to many subscribers. 

The following example shows that, with this model, we can stream data from the devices to the cloud and various types of applications can act on the data:

Event Hub, with many publishers from remote locations, connecting to the cloud to transmit data for subscribers to consume the data and act on it

The publish-subscribe model supports binary JSON over WebSocket and protobuf over gRPC streams for publishing, and binary (protobuf over gRPC) for subscribing. Devices communicate with the Event Hub service by publishing messages to topics. Applications, devices, and services can use Event Hub as a one-stop solution for their communication needs. You can use Event Hub to ingest streaming data from anywhere to the cloud for processing.

Event Hub uses gRPC for publishing and subscribing. A full-duplex streaming RPC framework, gRPC uses protocol buffers for wire protocol. gRPC is implemented over HTTP/2 and uses header compression, multiplexing TCP connections, and flow control. Protocol buffer is a binary protocol, suited for IIoT devices that publish high-velocity data over networks with low bandwidth.

The Event Hub also provides the following:

  • Streaming high volumes of data to the cloud for processing, from anywhere
  • Payload-agnostic publishing of any type of data for subscriber consumption
  • Handling message distribution for subscribers scaling from single to multiple instances
  • Handling of large-scale asynchronous message processing applications
  • Using OAuth provider for authentication and authorization