Great Data Comes with Great Responsibility
It’s often said that human imagination leaps several generations before even the actual invention happen. In the case of digital revolution, it seems that Science Fiction too has leaped the actual event by at least five decades.
In 1946, Murray Leinster, an American Sci-Fi writer, wrote a short story titled “A Logic named Joe”. In this story, a “Logic” is a computer that everyone owns. These logics were connected to a central computer “the Tank” which keep all information required for a civilization to function. One day, a logic named “Joe” acquire intelligence which other logics do not have. Joe was capable to turn on several switches in the tank and learn from correlation found among huge data points in the Tank. After several rounds of learning, Joe even managed to bypass information filtering protocol in the Tank and start assisting other human users beyond Joe’s own master. Chaos then ensues because Joe could disseminate hoax information to the tank, and even assist a felon to commit a perfect murder. Eventually, a logic technician saves the civilization by tracking Joe from network of logics and shut it down.
Messrs. Leinster might have only imagined the story based on his encounter with a then nascent computer of bygone era. However, the plot is eerily like our so-called modern age. These days, all of us have smartphones (logic) that are connected to the cloud (tank), and all Data Scientists all over the world are developing Artificial Intelligence (AI) for various applications (the Joes of our time).
Thankfully, we have not yet seen the chaos that was predicted by the story on the grand scale, although we do have seen several corrosive instances whence data and AI are used for nefarious purposes.
Now then is an apt time for us to stop and contemplate how we should move forward, especially given that we are seeing the rapid convergence of both big data and deep learning. Expectedly, such convergence will open new opportunities, which we all welcome. The flip side, though, must also be embraced and managed well if we – as a civilization — were to harness the emerging field of AI in our quotidian life.
Let’s start with the data-side. In the past decade, we are living in a unique time, for we are seeing the convergence of many technological advances. Arguably, the first such coming-together that usher the digital revolution is that of the rapid increase of computing power of Central Processing Unit (CPU), the rapid dissemination of smartphones, and the roll-out of mobile bandwidth to carry data-packages over the air. Let’s deep-dive into each of these factors and see how they individually and synergistically contribute to the age of big-data.
In 1965, Gordon Moore, one of the co-founders of Intel, was asked by The Electronic Magazine about the trend that he saw in the semiconductor industry. At that time, Moore observed that the number of transistors that can be squeezed into a single Silicon chip doubles every two years, and Moore predicted that the trend will continue in the next ten years. This prediction — nick-named “The Moore’s Law” — becomes self-fulfilling as it becomes the miniaturization target Semiconductor players have been aiming since 1965 up until now.
To put the transistor density number into perspective, earlier Intel 4004 series in 1976 housed only 2,250 transistors. In 1989, Intel i860 series housed 1,000,000 transistors; in 2010, Intel CPU Core i7 housed 1,170,000,000 transistors; and finally in 2022, Apple M1 Ultra houses 114,000,000,000 transistors. That’s nine orders of magnitude more transistors in modern CPU in 2022 compared to those in 1976. Or, if you like to use business lenses, the count of transistors in a CPU has been steadily growing at Compounded Annual Growth Rate (CAGR) 46% per year for the past 46 years!
Why does it matter? It matters a lot because the transistor density dictates the computing power of the CPU. What does this mean for our daily life? Well, that depends on your roles. For the Business Analysts, this means more data you can crunch with Excel. For the Digital Artists, this means better rendering faster. For Teachers, this means you can record more of your teaching materials in digital form and distribute them to your students everywhere. And for the Data Scientists, this means our everyday notebook can handle enough complex mathematical analysis to interrogate data obtained from smart devices.
Indeed, the second factor that contribute to the explosion of big data is the proliferation of smart phones. I am writing some portion of this text while on the road on my smartphone using mobile Microsoft Word App, searching relevant materials using Google, taking a picture that inspires my material using its built-in camera, and jotting down my scattered ideas in Evernote. In fact, I guess most of us also name three to four Apps that our spouse is using on daily basis, whether for catching up with friends, updating works, creating arts, or other activities.
So much of our life have depended on it that I think that the word “phone” in smartphone is a misnomer because these devices have evolved to become our personal digital assistant, of which phone is just but one of the functions. And, as it's evolving, all of us that own a smartphone practically exhume data continuously, sometimes even when we are not using the device.
Hence, if we multiply this ever-more powerful device with the ownership of smartphone across the globe, it’s just a logical extension then to expect that the volume of data would grow exponentially overtime. To review the number: in 2021, Statista estimated that the Top 20 countries by smartphone ownership cumulatively amass 2.9 billion smartphones. Indonesians alone own 170 million smartphones, which translates to roughly 60% penetration rates in 2021.
Now, considering that smartphone was introduced only in 2007 when the first iPhone was introduced, this is in fact rather an amazing achievement. There were only 1.3 million smartphones sold globally when Apple launched its first iPhone. In 2021, this number stood at 1.4 billion. In other words, the unit of smartphone sold globally has been growing roughly at CAGR 18.5% per year in the past 18 years.
Certainly, this impressive growth number does not happen magically. I think it’s worthwhile to contemplate about the possible factors that may contribute to this growth. For one, the increasing power of the CPU – as noted earlier – helps to make these smartphones getting smarter with newer generations.
In addition to this, though, I think that the cheapest end of smartphones can be afforded by most people helps to induce the high penetration. At the time of this writing, the lowest-end smartphone is sold as low as Rp 750,000 in e-commerce platforms in Indonesia. Considering that the average minimum wage in Indonesia (excluding Jakarta) stood at around Rp 1.800.000 per month, it’s no wonder than that such a price point help to boost smartphone penetration at 60% as stated earlier.
Finally, the function of smartphones cannot be separated from the mobile internet access, which bring us to the third trend that usher the age of big data. This final trend may not sound as sexy as The Moore Law, or the smartphones, but is equally impactful.
The age of mobile broadband was arguably started when Blackberry was the game in town back in 1990s. The Blackberry phone was revolutionary because it was the first device that allows users to access email in their mobile in a friendly way. Smart marketing combined with laser-focus effort on User Experience propelled RIM – the Blackberry manufacturer – from a small company from Waterloo, Canada to a giant mobile behemoth practically overnight. Users of all types from Wall Street executives to Down Street folks were clamoring for new Blackberry device. Realizing of the growth potential from data demand, mobile network operators all over the world started to invest mobile broadband technology from early 1990s.
The arrival of smartphones in 2007 just accelerate this trend even further, and the mobile broadband speed has been steadily increasing by 10x in each new generation. In the 1990s, the download / upload speed was typically in the range of 100 Kbit / sec using 2G infrastructure. Starting from 2018, some countries already rolled out 5G infrastructure that could support download speed up to 3000 Mbit / sec, and upload speed up to 1500 Mbit / sec.
In brief, considering that: (1) the Moore’s Law of transistors will keep its pace in the next few years, (2) the smartphone ownership that keep on rising, married with ever more powerful device with each new generation, and (3) the steady increase in the mobile broadband speed; it’s no wonder then Statista shows volume of data modern society produces doubles almost every two years – The Moore’s Law of Data. For example, in 2010, globally we produced around 2 Zettabytes per year (1 Zettabytes = 1 trillion Gigabytes); while in 2021, globally we produced around 79 Zettabytes per year, or roughly 6.5 trillion Gigabytes per month. Now, at least we can appreciate why we call this time of ours as the age of Big Data.
Alas, as many of us start to painfully aware, this data deluge does not directly translatable into wiser society. In other words, knowledge is not the same as the data itself. Why is this so? Because knowledge is something that we get from learning from the data. And most of us learn best using analogy, because it allows us to create mental connection between a new concept and something that we already know from our past knowledge repertoire. To put it another way, most of us learn by creating a map between new concept and old knowledge. Now, with Zettabytes of data, the combinatorics of map that can be established is just astronomical.
Thankfully, here, the increasing CPU power helps us to extract knowledge from all these Big Data – using a set of algorithms and mathematical techniques that we collectively now call Artificial Intelligence (AI).
Interestingly, the interest in AI peaked before the interest of Big Data, as can be seen from the Google Trend index movement of the two topics since 2004. The interest in AI then plummeted until the birth of Big Data revives the interest in the topic again around 2016.
Perhaps, what’s interesting from this trend is that it inform us that the general public is largely unaware that AI problem domain can be broadly divided into two major categories: (1) supervised learning whereby a human agent (usually a data scientist) guide the computer model with a set of training data & expected result; and (2) unsupervised learning, whereby a computer program is expected to generate knowledge, insight, or suggestion based on data without any training from human agent apriori.
Gartner, a consultancy, released a hype cycle report in 2021 which roughly tells us that the sexy topics that AI was pounded as the solution was just that – a hype. Company of all stripes and sectors were jumping on the promise that AI – especially the unsupervised learning type — could be their savior for solving various problems starting from predicting the best recipe, to automated news writing, to automated AI program to create another AI program, to name but few. And as such, under the burden of unrealistic expectation, companies left the AI hype in trove when they realize AI technology could not be their silver bullet.
When more data become available and the age of Big Data started, Data Scientists realized that they could extract observed correlation between these data sets, and train computer to recognize such connection for future data — in other words, supervised learning AI.
Thankfully, we don’t need the whole Zettabytes of data to make progress with supervised learning. And the problem that supervised learning typically solve are more of mundane types: recommendation system, dynamic pricing, translation – practically, all things that does not requires creativity. As it turns out, because the problems that supervised learning tackle usually is more focused and have smaller scope, AI progress in supervised learning is more consistent, and the result is rather immediate. Thankfully, it is these nature of supervised learning AI that have brought back interest in the topic.
Indeed, another survey from Gartner also shows that AI applications that receive significant investment boost in 2021 are those of supervised learning type. For example, AI Computer Vision application typically involves the data scientists feeding into the model many pictures and corresponding attributes to look for features present in the images. The computer then is then asked to identify similar features when presented with similar, but newly obtained image. Gartner survey indicated that in 2021, AI Computer Vision projects received an average boost of USD 679k investment in the field.
Alas, the field of AI – like it or not – is still highly technical. And pretty much like the science brethren, simplified explanation on AI concepts easily become convoluted since the mathematics behind it does not lend itself to easy explanation. This maybe a contributing factor why even though history have shown that unsupervised AI is hard and risky, business publications on AI-related topics still pound the sexy stuffs that AI can conjure.
This trend may keep alive the hype of AI for a foreseeable future, and as the volume of data growing exponentially, the field of supervised learning AI will produce ever more interesting insights from various data streams.
What’s worrying from this trend, though, is the fact that even as recent as 2022, not so many studies were conducted on the impact of AI-driven applications. Forbes, for example, cited that Bias removal studies in AI is the last among top five topics actively pursued in developing AI. Ironically, the same article also reported that the hottest trend in AI these days are Large Language Library, which aims to allow AI to generate contents like this one.
All of us have witnessed the toxicity that can be created if an agent – human or otherwise – create contents devoid of ethical boundaries. Hoax articles perhaps are the best instances of such corrosive trend in the recent development of digital revolution.
A brief analysis on correlation between hoaxes trend in Indonesia viz-a-viz other selected trending news topics in the country revealed peak coincidences in the search trend. Soberingly, these coincidences tell us that almost any news topics can be weaponized to serve personal agenda of whoever have the required resources – financial, intelligence, political, or otherwise.
The implication from this correlation is worrying enough, but at least hoaxes are still made by human – at least for now. Coming back to the Forbes article cited earlier, we can then appreciate the danger that can be imposed on our society when a machine lacking any knowledge of human ethics can create and post hoax articles atuomatically. Can you imagine the scale of divisiveness that such a machine can create? If such a technology ever to advance, I sure hope that the AI-driven companies have the sense of responsibility to inject knowledge of ethics into such machine – whether through model training, or hard-written into their codebase. On the flip side, though, if such a machine ever come to fruition without the proper ethical knowledge, then I sure hope that somebody could switch off Joe before it creates havoc to our society.
Retired - Pursuing My Volunteer Passions
3yVery well done, Putu, I am thrilled for your success. Congratulations.