AI & the IoT: The Missing Ingredient Is Metadata

As the world enters the Age of Artificial Intelligence, metadata describing the IoT will become mission-critical, even life-saving

David Knight
7 min readMar 3, 2022

Think about the most successful, widely scaled networks that let us function in today’s world. No I’m not talking about internet service providers, I mean the Really, Really Big Networks. The ones that without which modern civilization would be very different. The telephone system. Intermodal container shipping. Air traffic control. And they all have one vitally important enabling element that made them all scalable: a Control Layer that is not intrinsic to the electronic or physical streams that make up the network traffic. As telephony migrated from analog to digital, Signalling System 7 was developed and ran the world of voice calls for decades. For intermodal shipping, it’s container manifests. For aviation, it’s ATC. And they truly do, run the globe. Join me for a brief tour of how those systems came to be, and how such ‘constructs’ will matter greatly in an AI-enabled, IoT-sourced-and-controlled future.

What is a Control Layer?

In the analog era, telephone calls were “switched” using tones that were sent along the same wires that handled the calls. This was known as in-band signalling. While usable with the technology of the day, they were fraught with issues that included easy faking of the signals to get free international calls (known as “phreaking”), inability to change the path that a call could take once it was in progress, no direct association with billing or tariffing, and other problems. When it came time to deregulate or privatize the historically government-run phone companies, and thus lay the framework for the plethora of corporations that provide the bulk of phone service today, the business and technological elements of call-handling had to evolve. That led to SS7. Among its key tenets are the moving of the control over calls, billing and routing, into a control layer that is separate from the calls themselves. This allows calls to be switched “on the fly” between carriers, inter-company billing and many other features became enabled by this shift to out-of-band (as its known) signaling.

Shipping containers arrived with out-of-band signaling as well

In the early 1960’s with a growing world population, an ever-increasing appetite for food and goods from international sources (largely fostered during World War II as troops went abroad), and the rise of computerized inventory and supply chain handling, the idea of having vessels, trucks and train cars built for specific types of cargo wasn’t going to scale with demand. So some enterprising folks came up with the standardized shipping container that we know today. That moved the command and control over how and where goods are routed, into a “control layer” based on paper and eventually fully electronic manifests — instead of the ship captains. This led to tremendous flexibility and cost-efficiencies. Without these, it would be hard to envision Walmart and then Amazon rising at the rates they have.

So went the airlines

Another advent seeded during WWII was the idea of global jet travel. The public had witnessed the start of globalization, and at the same time, pressurized aircraft (initially, high-altitude bombers) powered by jet engines become possible. With this radical increase in speed, range and passenger carrying capacity, coupled with the shift in passenger air travel from water-based to land-based terminals, demanded that the control of where and when the aircraft moved, needed to get out of the cockpit and into a ‘network’ which became ATC. It’s not as efficient as it could be, but ATC has been fundamentally safe and reliable, and scaled continually, for many decades (noting that it does have to evolve once again, to accommodate the upcoming plethora of pilotless aircraft for both cargo and human transport).

Now it’s time for the IoT to have its Control Layer

The quantity of information already being produced by the IoT, or less abstractly, machine-generated data (MGD), is truly staggering and showing no signs of lessening. In fact, if you take imagery, especially consumer-generated video, out of the equation, MGD already exceeds what humans have produced since the beginning of time. We are fast approaching the “needs to scale” tipping point that telephony, global logistics and aviation experienced. There are characteristics of MGD that make it a lot more akin to these “physical world” networks than what we’ve been experiencing with the rise of say, social media. For starters, much of this is data that could get someone killed, or cause economic damage if not processed correctly. Road sensor data, if misinterpreted, might cause all of the lights in a city to turn green at once. Or weather sensors could open the floodgates of a dam. Or track sensors could switch a train onto the wrong path. Plus there can be potentially diverse uses for the same data. In addition to (beneficially) being used to control traffic signals, the same stream of MGD might be utilized in actuarial calculations to price insurance premiums, or provide the edge to a quantitative analyst in a hedge fund. And increasingly, these physical-world systems are going to be controlled by, you guessed it, AI’s.

With AI and IoT data, context matters

Given the seriousness of MGD/IoT data use-cases like those above, it’s critical that the an AI be provided with as much contextual information as possible. This can include provenance ala identifying the source in specifics such as make and type of sensors, literal context i.e. where the sensors are situated (e.g. inside a moving vehicle, attached to power poles, etc.) and whether the source-devices are maintained and calibrated. All of this bounding information is found, and ideally conveyed, outside of the data streams themselves. Tracking provenance and keeping the control layer separate also have key security advantages. Yet few IoT implementations today provide such a control layer.

Why is introducing metadata so important?

As AI’s increasingly “ask the questions” about data sourcing, the problem of minimizing false positives will come to the fore — the reason is that MGD essentially all looks the same, down to the level of filenames. An example is here:

[{“lat”:”36.14348400″,”lon”:”-115.17086400″,”time”:”2018–02–26 08:59:15″,”value”:”89.00″},{“lat”:”36.14402800″,”lon”:”-115.16944800″,”time”:”2018–02–26 08:59:30″,”value”:”89.00″},{“lat”:”36.14398000″,”lon”:”-115.16694400″,”time”:”2018–02–26 08:59:45″,”value”:”89.00″}

Are those readings from the drive shaft of a container ship, or a Tesla? Or from a satellite? This homogeneity of file-constructs, coupled with almost no contextualization vis a vis the files themselves, could lead to an AI-based control or analytics system ‘accidentally’ making the wrong choices — and nobody would know until something potentially threatening to lives, infrastructure or financial outcomes takes place. Imagine an AI-based traffic management system getting the wrong feed, and as a consequence believing that it needs to turn all of the signals in the city green, at once. Or send a delivery drone into a building. Or give a fleet of autonomous trucks bad weather information, causing them to reroute expensively.

Metadata provides the enabling control layer for AI to operate the IoT

Early examples of out-of-band signalling for MGD are within vertical industry sectors. These need to give way to a universal format for describing sensors, data provenance, regulatory and compliance issues, and of course, monetization, in a control layer. In software terms this is defined and carried in metadata. De facto standards efforts are underway to establish the equivalent of SS7/container manifests/ATC for the IoT. However unlike those prior successful schema, achieving this for MGD encounters many more variables than almost anything dealt with before. It’s a complex problem set that requires industry understanding, collaboration and acceptance. But in essence, introducing metadata for the IoT is akin to the product labels on food: it’s where you can find the ingredients, nutrition information and other key elements that you utilize in making informed buying decisions.

The control layer is almost here, just in time for AI+IoT

The approach of AI is accelerating, well beyond what almost anyone had predicted. As is the incredibly increase in the quantities and types of IoT data being generated. Thus the time has come for the control layer to arrive. It will boost the interchange of data between organizations, enable the long-anticipated monetization of IoT data, and as with the important ‘networks’ talked about above, can move the world. The next part of the control layer involves blockchain…which we’ll cover in a subsequent article. Watch This Space.