Anomaly detection in JSON structured data | Prikladnaya Diskretnaya Matematika - Applied Discrete Mathematics. 2022. № 56. DOI: 10.17223/20710410/56/5

In this paper, we address the problem of intrusion detection for modern web applications and mobile applications with the cloud-based server side, using malicious content detection in JSON data, which is currently one of the most popular data serialization and exchange formats between client and server parts of an application. We propose a method for building a JSON model for the given set of JSON objects capable of detection of structure and type anomalies. The model is based on the models for basic data types inside JSON collection objects and schema model that generalizes objects’ structure in the collection. We performed experiments using modifications of objects’ structures and insertions of code injection attack vectors such as SQL injections, OS command injections, and JavaScript/HTML injections. The analysis showed statistical significance between the model’s predictions and the presence of anomalies in the data gathered from the real web applications’ traffic. The quality of the model’s predictions was measured using the Matthews correlation coefficient (MCC). The MCC values computed on the data were close to one which indicates the model’s high efficiency in solving the problem of anomaly detection in JSON objects.
Download file
Counter downloads: 64
  • Title Anomaly detection in JSON structured data
  • Headline Anomaly detection in JSON structured data
  • Publesher Tomask State UniversityTomsk State University
  • Issue Prikladnaya Diskretnaya Matematika - Applied Discrete Mathematics 56
  • Date:
  • DOI 10.17223/20710410/56/5
Keywords
web traffic security, anomaly detection, machine learning
Authors
References
www.json-schema.org. JSON Schema. 2021.
Frozza A. A., dos Santos Mello R., and da Costa F. S. An approach for schema extraction of JSON and extended JSON document collections // IEEE Intern. Conf. IRI. 6-9 July 2018. P. 356-363.
Klettke M., Storl U., and Scherzinger S. Schema extraction and structural outlier detection for JSON-based NoSQL data stores // Conf. BTW, Hamburg, Germany, 4-6 March 2015. P. 425-444.
Baazizi M. A., Colazzo D., Ghelli G., et al. Parametric schema inference for massive JSON datasets // VLDB J. 2019. V.28. No.4. P.497-521.
Miller B. N. Detection of Malicious Content in JSON Structured Data using Multiple Concurrent Anomaly Detection Methods. Dissertation. Eastern Michigan University, 2016. 125 p.
www.github.com/payloadbox. Payload Box. 2021.
www.github.com/fuzzdb-project/fuzzdb. FuzzDB Project. 2021.
www.kaggle.com/syedsaqlainhussain/sql-injection-dataset. SQL injection dataset. 2021.
www.kaggle.com/syedsaqlainhussain/cross-site-scripting-xss-dataset-for-deep-learning. Cross site scripting XSS dataset for Deep learning. 2021.
Baldi P., Brunak S., Chauvin Y, et al. Assessing the accuracy of prediction algorithms for classification: an overview // Bioinformatics. 2000. V. 16. No. 5. P.412-424.
Chicco D. and Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation // BMC Genomics. 2020. V. 21. No. 1. P. 1-13.
Chicco D., Totsch N., and Jurman G. The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation // BioData Mining. 2021. V. 14. No. 1. P. 1-22.
 Anomaly detection in JSON structured data | Prikladnaya Diskretnaya Matematika - Applied Discrete Mathematics. 2022. № 56. DOI: 10.17223/20710410/56/5
Anomaly detection in JSON structured data | Prikladnaya Diskretnaya Matematika - Applied Discrete Mathematics. 2022. № 56. DOI: 10.17223/20710410/56/5
Download full-text version
Counter downloads: 111