Enhancing Machine Learning Based SQL Injection Detection Using Contextualized Word Embedding
Published in ACMSE, 2024
Recommended citation: https://dl.acm.org/doi/pdf/10.1145/3603287.3651187
SQL injection (SQLi) attacks continue to severely threaten application security, allowing malicious actors to exploit web input and manipulate an application’s database with malicious SQL code. This work explores the possibility of building effective SQLi detectors through machine learning. Specifically, we investigate the impact of contextualized and non-contextualized embedding methods for converting SQL queries into vector space. Our results demonstrate the superiority of the contextualized embedding method, achieving consistent accuracy above 99% across various classification algorithms and reducing model training time by 31 times. In addition, the analysis of reliability diagrams indicates that contextualized embeddings provide better model calibrations. These findings underscore the significance of contextualized word embeddings in enhancing the performance and reliability of SQLi detection models.