2 min readfrom Machine Learning

Converting XQuery to SQL with Local LLMs: Do I Need Fine-Tuning or a Better Approach? [P]

I am trying to convert XQuery statements into SQL queries within an enterprise context, with the constraint that the solution must rely on locally run LLMs.

A key challenge is the limited availability of training data (pairs of XQueries and their corresponding SQL queries), especially with enough diversity to cover different patterns.

I initially experimented with a parsing-based approach.

The idea was to extract elements such as table names, columns, and conditions from the XQuery (using a Python script), map them to SQL components, and pass this structured representation to an LLM.

However, this approach depended heavily on regex-based parsing and broke down when the input queries varied in structure.

I then tried a prompt-engineering approach, defining strict rules and templates for how SQL queries should be generated. While this worked to some extent for simpler inputs, the outputs became inconsistent and often incorrect for more complex or longer XQueries.

At the moment, I am considering fine-tuning a local LLM using PEFT (QLoRA) with a Qwen2.5-Coder 7B model. However, the dataset available is quite small (\~110–120 samples) and not very diverse.

The main issues observed so far:

Sensitivity to variations in how XQueries are written.

Missing conditions or columns in generated SQL for longer inputs.

Given these constraints, I am trying to understand the most effective direction to take.

Would fine-tuning with such limited data be sufficient, or are there better approaches for handling this kind of structured query translation problem?

Happy to provide more details if needed.

submitted by /u/genius03noob
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#generative AI for data analysis
#Excel alternatives for data analysis
#financial modeling with spreadsheets
#natural language processing for spreadsheets
#enterprise data management
#big data management in spreadsheets
#conversational data analysis
#cloud-based spreadsheet applications
#real-time data collaboration
#intelligent data visualization
#data visualization tools
#big data performance
#data analysis tools
#data cleaning solutions
#rows.com
#enterprise-level spreadsheet solutions
#large dataset processing
#XQuery
#SQL
#local LLMs