Enhancing code generation accuracy using fine-tuning and task-adaptive pretraining with domain-specific data augmentation

Thomas Lass  Barna; Samson  Isaac; Amina Bala  Jaafaru; Hajara  Idris; Ramat Imam  Abba

doi:10.4314/swj.v19i4.13

download PDF

Published:

Feb 14, 2025

DOI:

10.4314/swj.v19i4.13

Keywords:

Code Generation; Fine-Tuning; Task-Adaptive Pretraining; Domain-Specific Data; Data Augmentation; Accuracy Enhancement

Issue

Vol. 19 No. 4 (2024)

Section

Articles

Copyright belongs to the journal. Journal is Open Access

Thomas Lass Barna

Samson Isaac

Amina Bala Jaafaru

Hajara Idris

Ramat Imam Abba

Abstract

Recent advancements in deep learning, particularly through Transformer architectures, have significantly improved code generation tasks. However, current pre-trained language models still encounter limitations when applied to code generation. The Improved RoBERTaMarian model, built upon the Marian neural machine translation framework, addresses these limitations by fine-tuning on natural language descriptions to generate code. The model was trained and tested on Django and CoNaLa datasets. The results in the CoNaLa dataset, was BLEU score of 36.834, Exact Match Accuracy of 15.300%, SacreBLEU score of 34.215, and ROUGE score of 49.827, reflecting its ability to generate accurate and semantically aligned code. Similarly, when evaluated on the Django dataset, the Improved RoBERTaMarian model outperformed BERTMarian, ELECTRAMarian, LUKEMarian, MarianCG and RoBERTaMarian models with a BLEU score of 91.230, Exact Match Accuracy of 83.676%, SacreBLEU score of 75.984, and ROUGE score of 95.210. These results indicate that the Improved RoBERTaMarian model excels in both syntactic and semantic code generation, making it a robust solution for applications requiring precise, contextually relevant code generation. Its high performance suggests significant potential for use in automated code synthesis and language model-based code assistants in software engineering tasks.

Science World Journal
Journal / Science World Journal / Vol. 19 No. 4 (2024) / Articles

Published:

DOI:

Keywords:

Enhancing code generation accuracy using fine-tuning and task-adaptive pretraining with domain-specific data augmentation

Thomas Lass Barna

Samson Isaac

Amina Bala Jaafaru

Hajara Idris

Ramat Imam Abba

Abstract

Journal Identifiers

Article Sidebar

Published:

DOI:

Keywords:

Article Details

Main Article Content

Thomas Lass Barna

Samson Isaac

Amina Bala Jaafaru

Hajara Idris

Ramat Imam Abba

Abstract

Journal Identifiers