--- language: - en - code license: apache-2.0 library_name: transformers.js tags: - code - embeddings - onnx - transformers.js - semantic-search - code-search pipeline_tag: feature-extraction base_model: microsoft/unixcoder-base --- # UniXcoder ONNX for Code Search **Converted by [VibeAtlas](https://vibeatlas.dev)** - AI Context Optimization for Developers This is [Microsoft's UniXcoder](https://huggingface.co/microsoft/unixcoder-base) converted to ONNX format for use with **Transformers.js** in browser and Node.js environments. ## Why UniXcoder? UniXcoder understands code **semantically**, not just as text: - Trained on 6 programming languages (Python, Java, JavaScript, PHP, Ruby, Go) - Understands AST structure and data flow - 20-30% better code search accuracy vs generic embedding models ## Quick Start ### Transformers.js (Browser/Node.js) ```javascript import { pipeline } from '@huggingface/transformers'; const embedder = await pipeline( 'feature-extraction', 'sailesh27/unixcoder-base-onnx' ); const code = `function authenticate(user) { return user.isValid && user.hasPermission; }`; const embedding = await embedder(code, { pooling: 'mean', normalize: true }); console.log(embedding.dims); // [1, 768] ``` ### Semantic Code Search ```javascript import { pipeline, cos_sim } from '@huggingface/transformers'; const embedder = await pipeline('feature-extraction', 'sailesh27/unixcoder-base-onnx'); // Index your code const codeSnippets = [ 'function login(user, pass) { ... }', 'function formatDate(date) { ... }', 'function validateEmail(email) { ... }' ]; const codeEmbeddings = await embedder(codeSnippets, { pooling: 'mean', normalize: true }); // Search with natural language const query = 'user authentication'; const queryEmbedding = await embedder(query, { pooling: 'mean', normalize: true }); // Find most similar const similarities = codeEmbeddings.tolist().map((emb, i) => ({ code: codeSnippets[i], score: cos_sim(queryEmbedding.tolist()[0], emb) })); ``` ## Technical Details - **Architecture**: RoBERTa-based encoder - **Hidden Size**: 768 - **Max Sequence Length**: 512 tokens - **Output Dimensions**: 768 - **ONNX Opset**: 14 ## About VibeAtlas **VibeAtlas** is the reliability infrastructure for AI coding: - Reduce AI token costs by 40-60% - Improve code search accuracy with semantic understanding - Add governance guardrails to AI workflows **Links**: - [Website](https://vibeatlas.dev) - [VS Code Extension](https://marketplace.visualstudio.com/items?itemName=vibeatlas.vibeatlas) - [GitHub](https://github.com/vibeatlas) ## Citation ```bibtex @misc{unixcoder-onnx-2025, title={UniXcoder ONNX: Code Embeddings for JavaScript}, author={VibeAtlas Team}, year={2025}, publisher={Hugging Face}, url={https://huggingface.co/sailesh27/unixcoder-base-onnx} } ``` ### Original UniXcoder Paper ```bibtex @inproceedings{guo2022unixcoder, title={UniXcoder: Unified Cross-Modal Pre-training for Code Representation}, author={Guo, Daya and Lu, Shuai and Duan, Nan and Wang, Yanlin and Zhou, Ming and Yin, Jian}, booktitle={ACL}, year={2022} } ``` ## License Apache 2.0 (same as original UniXcoder)