TranSQL: A Transformer-based Model for Classifying SQL Queries
Chapter
Accepted version
Åpne
Permanent lenke
https://hdl.handle.net/11250/3063329Utgivelsesdato
2022Metadata
Vis full innførselSamlinger
- Publikasjoner fra CRIStin - SINTEF AS [5647]
- SINTEF Digital [2395]
Originalversjon
2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA). 2022, 788-793. 10.1109/ICMLA55696.2022.00131Sammendrag
Domain-Specific Languages (DSL) are becoming popular in various fields as they enable domain experts to focus on domain-specific concepts rather than software-specific ones. Many domain experts usually reuse their previously-written scripts for writing new ones; however, to make this process straightforward, there is a need for techniques that can enable domain experts to find existing relevant scripts easily. One fundamental component of such a technique is a model for identifying similar DSL scripts. Nevertheless, the inherent nature of DSLs and lack of data makes building such a model challenging. Hence, in this work, we propose TRANSQL, a transformer-based model for classifying DSL scripts based on their similarities, considering their few-shot context. We build TRANSQL using BERT and GPT-3, two performant language models. Our experiments focus on SQL as one of the most commonly-used DSLs. The experiment results reveal that the BERT-based TRANSQL cannot perform well for DSLs since they need extensive data for the fine-tuning phase. However, the GPT-based TRANSQL gives markedly better and more promising results.