IoT – or why it is important to save as much data as possible
"Data are the new gold of small, medium and large businesses." With this statement Florian Stadtfeld and I have started several workshops around Data, Analytics and AI in our customer environment and in the partner landscape during the last months. What almost sounds like a phrase sometimes seems to be "difficult" in the context of data protection and other regulations – primarily in the EU. We are often asked "Is that okay?” and often the answer is: "It depends".
It’s easier with IoT data - at least if technical data are in focus without any dependency on people. Regardless of this, one question is always asked in IoT workshops
Which data should be stored in which intervals? I don’t want to store insignificant data - that generates unnecessary costs.
I for myself have a very clear answer to this question that I would like to share. But first the context is important in order to be able to classify the question and my answer. I support enterprise companies and smaller manufactures running their IoT projects: From the very first workshop to the completion of the project. The devices/sensors are of different kinds:
- Building control systems with many existing sensors
- Man-sized band saws for sawing stones and marble
- Stone mills to grind rocks
- Cutting time measurement for industrial blades
- Special sensors for flow rates, rotation, power measurement, ...
- "Cloud-born" sensors that can use WiFi connection natively
- …
The motivation for using IoT is usually clear: new business models and better services. Many other motivations could be mentioned but are often included in these two aspects. Here is an example:
New business models
- Sale of a machine capacity as a service: X $ per 60.000 meters cutting of sheet metal (incl. wear material)
- Turbine flying hours (instead of purchase)
- Combined heat and power unit for rent: Price per KWh output incl. wear material and maintenance
- …
Service
- Predictive Maintenance: The prediction of possible failures of a machine part and its timely repair
- Energy optimization for buildings, for example by automatic control of climate systems based on the forecasted weather
- Simplification of maintenance by a complete documentation of operating conditions (recognition of failure causes instead of only repairing things)
- …
IoT always includes an effective storage of large amounts of data and data analytics - especially if the interpretation of this data is necessary to get further results.
An example: A long-term measurement of the case frequency of a large band saw shows a correlation to a possible imminent brake of the band. The frequency increases over the time.
The action: The band is replaced directly after the current cutting process.
The advantage: The band does not tear the material which is not destroyed, and the band does not have to be removed costly.
Other data derivations are not as trivial as in the example above. Hundreds of factors/measured values over a long period of time can provide information about the behavior of machines/systems. These relationships can be so complex that it’s too difficult to be understood by a human. In such cases AI systems are used which can be trained accordingly to predict the state of a system autonomously (e.g. they can warn that a system will fail within the next 20-48 hours by a probability of 86%).
Today AI and Machine Learning are no longer strong hypes and topics– they are standard (even if a trained neural network still has something of a black box). Machine Learning has in its name “the challenge” in terms of data in IoT projects: A machine learning model needs to be trained and requires a large amount of classified data. Large in this case means data collected over a longer period (if you have even more data the model can work with a higher precision). Large also means many different sensors that play a role in the model: Directly and indirectly. And this is not always obvious.
This derivation also leads directly to the correct answer to the question above:
It is important to store as much data as possible in appropriate short intervals. Even if there is no business case currently, no marketing approach or if a certain sensor seems to be unimportant.
My reason for claiming this: If we develop a business model based on AI in 6 months, then we might need exactly these data. If we don't collect them today, they will be lost for the future.
The use of cloud technologies supports this approach: I use managed platform services from the Microsoft Azure Cloud for IoT, AI, data storage and processing. Especially storage of data in cloud structures is very efficient and cost-effective.