Generative Adversarial Networks (GANs), are a new type of generative modelling technology that is gaining popularity in the world of AI. Although only five years old, this powerful model has captured the imagination of the IT world because of its ability to create realistic falsified information. Hyper-realistic deepfake images, audio and video have taken the internet by storm. The term “fake” is being thrown around a lot, but in reality GANs produce synthetic data that aims to be indistinguishable from the real deal.
This technology can, and is, being used by hackers to produce malware that can confuse and overwhelm online security systems. Flooding existing systems with seemingly harmless synthetic data can lead to malware that is extremely difficult to detect. As with all tech, there is another side to the story, and GANs will definitely have an important role to play in the cyber-security industry. When applied to malware protection, GANs provide IT security experts with the tools needed to quickly create useful labelled data that will help train AI security systems. These smart systems will be able to classify, detect and predict malware outbreaks more reliably and quickly than ever before.
Supervised vs Unsupervised Machine Learning
To better understand where GANs fit into the world of cyber-security, it is important to understand the basics of AI and machine learning. Machine learning is most broadly classified into supervised and unsupervised categories. Unsupervised methods try to find patterns in unlabeled data that we know very little about. A lot of the data used in cyber-security is unlabeled, so supervised machine learning is ideal for clustering similar data and finding and anticipating irregularities and potential threats. The volume of malevolent data is increasing exponentially, so learning methods need to be versatile and agile in order to scale and keep up.
On the other hand, supervised machine learning is implemented when data contains a distinct property that we want to monitor. In this case, we can implement machine learning to track, analyse and construct models based on a specific property. When used for malware protection, this type of AI can be put to work on figuring out whether a file or other software package is malicious. It can also predict and identify phishing attacks and increase security in complex IOT systems by classifying and modeling threats. Semi-supervised machine learning is somewhere in between as it uses both labelled and unlabeled data.
Deep and Shallow Models
Deep learning is a class of machine learning that consists of multiple layers of neural networks. Each layer works with different data representations, and the model can be both supervised and unsupervised. Traditional shallow models focus on individual feature extraction methods that are based on custom designed features. Deep learning takes the opposite route by starting with a mass of raw data and then deriving features during the first layer of the training procedure. This means that deep learning can be effective even if we do not have powerful feature extraction techniques needed to train our model.
Deep learning models are commonly used in cyber-security as they cut out the need for investing in the development of complicated feature extraction engines. The downside to this approach is that it inevitably results in a high rate of false positives. Another disadvantage is the long training time which may prove too slow to keep up with incoming threats. To be effective, both deep and shallow machine learning needs to be implemented.
Classifying GANs
So, where do GANs fit in? As most of the input data for this model is unlabeled, GANs are most commonly a type of unsupervised learning. GANs take this data, most commonly binary files image or video bundles, and try to learn everything they can about the structure of the data. It is important to note that GANs also utilize some aspects of supervised learning. The goal is to set up a system that will be able to pick out the synthetic from the original data. After the synthetic data has been quarantined, the generator then tries to find adversarial examples. These are most often samples that are likely to be misclassified by the discriminator.
GANs work by pitting Generator Neutral Networks against Discriminator Neutral Networks, and this adversarial concept can work with all machine learning models. Shallow methods can be used, but a deep neural network is required to really get the most out of a GAN. This is most evident when working with complex data such as executable files, images and video. Deep neural networks have power to represent complex functions that are used by both the generator and discriminator when detecting synthetic data.
GANs in Malware Detection
The ability of any machine learning system to detect malware is only as good as its training set. In light of this, the dataset used for training the system needs to be an exact representation of all the types of malware that has or can be detected. To stay effective, AI security training sets need to be constantly updated, and this is an area where GANs can help. Based on the distribution of existing data, GANs can generate new samples that will effectively increase the initial set with valuable data.
New types of malware are constantly being created, so GANs are uniquely positioned to help us learn about the data distribution and structure of unknown data. Using GANs in cyber-security can also help us learn about how new malware is generated. The GAN can do this by sampling provided data without the need for having a probability distribution model in place.
Security models can be further enhanced by feeding the GAN generated data back into the learning process. This increases the robustness of the system and increases the level of protection from adversarial attacks. This constant feedback loop lets security experts predict the actions of hackers and automated malevolent attacks. GANs are the latest tool in the AI toolbox and they have increased the sophistication of cyber-attacks but also provided the technology needed for smart and proactive security systems.