Chief Scientist & CTO at MixMode, holds 10 patents, PhD from CalTech, and is a professor of engineering and mathematics at UCSB.
This article was written by Igor Mezic Forbes Councils Member
Forbes - May 23, 2022, 08:15am EDT
The release of ChatGPT made the general public aware of a subtopic in the development of artificial intelligence: generative models. Practitioners of AI point out that generative models aren't new and have been in development for a while, but the public release created massive interest and an increase in questions about AI, such as: • What are these models? • Why are they different? • Why are they hailed as the next big thing? • What kind of impact can they have in fields of national security importance (i.e., cybersecurity)?
The History And Development Of AI In Three Waves According to DARPA, AI is currently categorized into three waves. The first wave of AI is that of expert systems: rule-based decision machines based on programmer-designed logic. Some would say they're no different than the rest of the software. First-wave AI has reasoning capability but no learning capability.
In network security, a rule that produces an alert when a large file is leaving the network has a threshold that determines the size of the file above which an alert is triggered. It has “reasoning,” meaning it reasons to alert when the file size is bigger than the threshold value. However, it doesn't have learning capability. The threshold is set by the user. Second-wave AI—statistical machine-learning approaches—has learning capability. In the above example of large outgoing files, they determine the threshold from the data flowing on the network. But there's no reasoning, contextual understanding or extrapolation ability. For example, the size of files sent during the day might be quite different than the size of files sent at night. Second-wave methodologies have no inherent understanding of daytime versus nighttime context.
Enter the third wave of AI. Its key properties involve the ability to learn, reason and adapt to context and abstract information. It has the power to extrapolate and predict. In the example of large outbound file detection, a third-wave AI system would be able to learn the distribution of normal sizes of outgoing files for any day of the week and minute of the day. It would be able to reason over data to recognize unusual file sizes. It can then zero in on the entities that are exporting the files and check if they behave oddly. If so, it would provide the user with its assessment: “Unusually large files are leaving the network; the entities involved are A and B.”
Third-wave AI would also explain its reasoning to the user and present evidence for it. For example, “My observations indicate that at this time of day, on this subnetwork, with this level of activity, it is unusual to have a file of 10MB leaving the network. Here are some graphs you can look at to verify my conclusion.”
It would also be able to adapt to changing contexts. For example, with the addition of a new subnetwork, it would inform the user of what it detected and adapt to the changed environment. In short, it would do something Ironman’s J.A.R.V.I.S. would do.
Third-wave AI relies on generative models: It doesn't only connect the “inputs” of the system to its “outputs” in the way first- and second-wave technologies do. It's able to generate new outputs from the existing inputs. Third-wave AI recognizes types of activity on its own and then connects the dots for the human user. It perceives, reasons and communicates.
Generative AI Versus General Intelligence ChatGPT operates based on a generative model. During the learning process, it absorbs the written material it finds on the Web. In response to a query, it produces text that typically isn't equal to what it “read.” The underlying model is “generative” precisely because of this property. However, it's quite simple—it only generates the next word in a sequence based on the probability of an occurrence of words it generated from input data. In this sense, ChatGPT isn't third-wave AI technology, and it's far from what we consider “general intelligence.”
In contrast, in the field of dynamical systems, generative models are crucial and have been created for a variety of applications but largely by humans.
The first example is Newton’s laws of gravity, which gave us generative models enabling space flight and much more. In the last 20 years, there's been a realization that generative models based on a subfield of dynamical systems are powerful enough to usher in third-wave AI technologies, which can be constructed without human input, and enable learning, adaptability, reasoning and prediction.
In our example of the cybersecurity problem, a generative model inherently understands the time of day and its relationship with the file size distribution. These relationships are encoded in a small number of coefficients, in contrast with the billions of parameters encoding the “knowledge” of ChatGPT that are easily translated to human language to enable communication with the user.
Where an individual coefficient in the state file of ChatGPT has little human-discernible meaning, in operator-based AI, there's a specific coefficient that delineates the variation of the file size during the daily cycle. That coefficient will change if the underlying generative model, while constantly receiving new data, observes a different—but benign—daily variation that might have occurred because of the addition of a subnetwork. The properties of these generative models parallel the essential aspect of human intelligence: generating hypotheses from past learning and revising them based on observed data.
Cybersecurity Professionals: Move Forward With Caution Large language models like ChatGPT are interesting and useful. But they lack many aspects desired from a system that emulates at least some aspects of human intelligence. Deploying such systems in a field like network security seems to incur problems of computational cost, adaptability to dynamic changes and the lack of transparency of the internal model decision-making. Surely, this will improve over time—with some help from third-wave AI.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?
Igor Mezic Chief Scientist & CTO at MixMode, holds 10 patents, PhD from CalTech, and is a professor of engineering and mathematics at UCSB. Read Igor Mezic's full executive profile here.
Comments