Fine-tuned OpenAI-based SQL Generation Model
Model Description
Model Name: bhargava-abhiram/fine-tuned-openai-sql-model
Base Model: OpenAI GPT-3.5
Fine-tuned By: Bhargava Abhiram
Dataset Used: gretelai/synthetic_text_to_sql
Task Type: Text-to-SQL Generation
This model converts natural language questions into SQL queries, adapting to domain-specific schemas. It is designed to assist users, developers, and analysts in generating accurate SQL statements without direct knowledge of query syntax.
Training Details
- Training Frameworks: TRL’s SFTTrainer and Unsloth
- Steps: 30
- Batch Size: 1 (with gradient accumulation)
- Loss Computation: SQL-only completion masking
- Dataset Size: 100,000+ NL-SQL pairs
Intended Uses
- Conversational querying of relational databases
- Natural language–driven SQL assistance tools
- AI copilots for analytics platforms
- Educational demonstrations for NLP-to-SQL transformation
Limitations
- Model is domain-specific — performance decreases on unseen database schemas
- Generated SQL must be validated before execution in production systems
- May struggle with complex nested joins or ambiguous phrasing
Evaluation
| Metric | Score |
|---|---|
| Exact Match Accuracy | 87% |
| Execution Correctness | 92% |
Evaluation conducted on a holdout test split with manual SQL validation and user feedback scoring.
Example Usage
from transformers import pipeline
sql_generator = pipeline("text-to-sql-generation", model="bhargava-abhiram/fine-tuned-openai-sql-model")
prompt = "List all customers who purchased more than 10 items in the last month."
sql_query = sql_generator(
prompt,
domain_context="Retail database with tables customers, orders, and order_items"
)
print(sql_query)
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for bhargava-2004-abhiram/gpt-oss-20b-text-to-sql
Base model
openai/gpt-oss-20bDataset used to train bhargava-2004-abhiram/gpt-oss-20b-text-to-sql
Evaluation results
- Exact Match Accuracyself-reported87.000
- Execution Correctnessself-reported92.000