The objective of this study is to assess the efficacy of several machine learning (ML) algorithms in identifying cross-site scripting (XSS) vulnerabilities, which are a widespread and significant cybersecurity risk. Several studies have emphasized the absence of a rich data set for model training. This research employs a comprehensive dataset from open sources, which includes 219,176 scripts evenly divided into harmful and non-harmful categories. The purpose of this study is to train and evaluate the effectiveness of various machine learning approaches. The evaluation utilizes criteria such as accuracy, F1-scores, and the confusion matrix. The algorithms analyzed are support vector machines (SVM), artificial neural networks (ANN), and recurrent neural networks (RNN). Out of all the models, the Artificial Neural Network (ANN) proved to be the most efficient, with an accuracy rate of 99% and F1-scores surpassing 0.98 in all categories. It greatly outperformed the other models.
The results indicate that combining the advantages of each model with a hybrid approach could improve detection accuracy. Integrating Support Vector Machine (SVM) with Recurrent Neural Network (RNN) and Artificial Neural Network (ANN) models can provide a dependable solution. Initially, SVM can filter data, thereby reducing the analysis time. This, in turn, improves the efficiency of RNN or ANN in detecting cross-site scripting (XSS) attacks. This approach should result in a stronger detection system for XSS vulnerabilities by combining SVM's accuracy in handling non-malicious instances with the sophisticated pattern recognition abilities of RNN and ANN.