Data or Metadata? The IoT needs both

As the amount of machine-generated data scales, indexing and managing it via metadata will become critical.

David Knight
6 min readMar 30, 2022

Think about the most successful, widely scaled networks that let us function in today’s world. No I’m not talking about internet service providers, I mean the Really, Really Big Networks. The ones that without which modern civilization would be very different. The telephone system. Intermodal containerized shipping. Air traffic control. And they all have one vitally important enabling element that made them all scalable: a Control Layer that is not intrinsic to the electronic or physical streams that make up the network traffic, but lives alongside them. The phone networks established Signaling System 7, which has run the world of voice calls for decades. For intermodal shipping, it’s container manifests. For aviation, it’s ATC. And they truly do run the globe, and thus make our modern world possible.

What is a Control Layer?

In the analog era, telephone calls were “switched” using tones that were sent along the same wires that handled the calls. This was known as in-band signaling. While usable with the technology of the day, they were fraught with issues that included easy faking of the signals to get free international calls (known as “phreaking”), inability to change the path that a call could take once it was in progress, no direct association with billing or tariffing, and other problems. When it came time to deregulate the traditionally government-run phone companies, and thus lay the framework for the plethora of corporations that provide the bulk of phone service today, the business and technological elements of call-handling had to evolve. That led to SS7. Among its key tenets are the moving of the control over calls, billing and routing, into a control layer that is separate from the calls themselves. This allows calls to be switched “on the fly” between carriers, inter-company billing and many other features that were enabled by this shift to out-of-band (as its known in telco parlance) signaling.

Shipping containers arrived with out-of-band signaling as well

In the early 1960’s with a growing world population, an ever-increasing appetite for food and goods from international sources (largely fostered during World War II as troops went abroad), and the rise of computerized inventory and supply chain handling, the idea of having vessels, trucks and train cars built for specific types of cargo wasn’t going to scale with demand. So some enterprising folks came up with the standardized shipping container that we know today. That moved the command and control over how and where goods are routed, into a control layer based on paper and eventually fully electronic manifests — instead of the ship captains. This led to tremendous flexibility and cost-efficiencies. Without these, it would be hard to envision Walmart and then Amazon rising at the rates they have.

So went the airlines

Another advent seeded during WWII was the idea of global jet travel. The public had witnessed the start of globalization, and at the same time, pressurized aircraft (initially, high-altitude bombers) powered by jet engines become possible. With this radical increase in speed, range and passenger carrying capacity, coupled with the shift in passenger air travel from water-based to land-based terminals, it rapidly became imperative that the control of where and when the aircraft moved, needed to “get out of the cockpit” and into a ‘network’ which became ATC. It’s not as efficient as it could be, but ATC has been fundamentally safe and reliable, and scaled continually, for many decades (noting that it does have to evolve once again, to accommodate the upcoming plethora of autonomous passenger and cargo aircraft aka air taxis being developed and deployed now).

Now it’s time for the IoT to have its Control Layer

The quantity of information already being produced by the IoT, or less abstractly, machine-generated data (MGD), is truly staggering and showing no signs of lessening. In fact, if you take imagery, especially consumer-generated video, out of the equation, MGD already exceeds what humans have produced since the beginning of time. We are fast approaching the “needs to scale” tipping point that telephony, global logistics and aviation experienced. There are characteristics of MGD that make it a lot more akin to these “physical world” networks than what we’ve been experiencing with the rise of say, social media. For starters, this is the data that could get someone killed: road sensor data, if processed incorrectly, might cause all of the traffic lights in a city to turn red. Or open the floodgates of a dam. Or switch a train onto the wrong track. Plus there can be potentially beneficial uses for the same data. In addition to changing traffic signals more efficiently, the same stream of MGD might be utilized in actuarial calculations to price insurance premiums, or provide the edge to a quantitative analyst in a hedge fund.

With IoT data, context matters

Given the seriousness of MGD/IoT data use-cases like those above, it’s critical that the user (be it a person, company, government agency or a fleet of robots) be provided with as much contextual information as possible. This can include provenance ala identifying the source in specifics such as the make and type of sensors, literal context i.e. where the sensors are situated (e.g. inside a moving vehicle, attached to power poles, etc.) and whether the source-devices are constantly maintained and calibrated. All of this “bounding information” must necessarily be discovered, and ideally conveyed, outside of the data streams themselves. Tracking provenance and keeping the control layer separate also offers important security advantages, such as spoofing detection. Yet most IoT devices and implementations today don’t provide such a control layer.

Metadata is becoming the control layer for the IoT

De facto standards are emerging to establish the equivalent of SS7/Container Manifests/ATC for the IoT. Unlike those successful schema, achieving this for MGD encounters many more variables than almost anything encountered with before: early examples of ‘out-of-band signaling’ aka metadata standards for MGD thus far only exist within vertical industry sectors. These need to give way to a universal format for describing sensors, data provenance, regulatory and compliance issues, and of course, monetization, in a control layer. In software terms this is defined and carried in metadata. To scale and become an enabler, metadata must layer atop MGD generated and used within and across all industries — if not then sector-specific metadata would have to be ‘translated’ between the various formats, which almost never yields usable outcomes. It’s important to note that having a wide-ranging metadata schema means that the underlying data itself does not have to conform to any particular file or streaming format — this is critical to IoT data since converting it from one format to another is inevitably ‘lossy’ which can be dangerous when the data is being used to control machines or infrastructure systems. The metadata layer provides the fundamental information needed to discover and utilize the underlying data — if it needs to be converted, that can be done at or close to the point of usage, thus preserving the purity of the source data. This becomes extremely useful when that same source data is routed to many different applications, each utilizing different portions of that raw data file or stream.

The introduction of a metadata layer will provide strong benefits in terms of scaling and inducing new applications, whilst improving safety, security and even liquidity for the data. As it blends with technologies such as 5G, edge computing and blockchain, metadata integration with the operations of most industries and consumer experiences will increase dramatically. IoT is touching virtually every company and government agency on the planet — it’s time to start contemplating how all of this will impact you and your organization. Watch This Space.