RAG System
Retrieval-Augmented Generation (RAG) allows your LLM to answer questions based on your documents.
How It Works
- Upload documents → Parse and chunk text
- Generate embeddings → Convert chunks to vectors
- Store in vector DB → Save for fast retrieval
- Query time → Find relevant chunks and include in context
Supported Formats
- PDF - Text extraction from PDF files
- DOCX - Microsoft Word documents
- TXT - Plain text files
Basic Usage
Enable RAG in Chat
typescript
const response = await fetch('/api/bento/chat/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages: [
{ role: 'user', content: 'What does the report say about revenue?' }
],
ragEnabled: true // Enable RAG
})
})
Upload Documents
typescript
const formData = new FormData()
formData.append('file', file)
const response = await fetch('/api/bento/documents/upload', {
method: 'POST',
body: formData
})
const { documentId, chunks } = await response.json()
console.log(`Uploaded ${chunks} chunks`)
List Documents
typescript
const response = await fetch('/api/bento/documents/list')
const { documents } = await response.json()
documents.forEach(doc => {
console.log(`${doc.title} - ${doc.chunks} chunks`)
})
Search Documents
typescript
const response = await fetch('/api/bento/documents/search', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
query: 'revenue growth',
limit: 5
})
})
const { results } = await response.json()
Configuration
Chunk Settings
Control how documents are split:
typescript
const server = createBentoServer({
openRouterKey: 'your-key',
chunking: {
size: 1000, // Characters per chunk
overlap: 200, // Overlap between chunks
minSize: 100 // Minimum chunk size
}
})
Embedding Model
Currently uses OpenAI's text-embedding-ada-002
via OpenRouter.
Advanced Features
Document Metadata
Add metadata during upload:
typescript
const formData = new FormData()
formData.append('file', file)
formData.append('metadata', JSON.stringify({
category: 'financial',
year: 2024,
department: 'sales'
}))
Filter by Metadata
typescript
// Coming soon: metadata filtering
const results = await vectorDB.searchDocuments(embedding, {
limit: 5,
filter: {
category: 'financial',
year: { $gte: 2023 }
}
})
Custom Processing
Override default document processing:
typescript
import { documentProcessor } from 'bento-core'
// Add custom parser
documentProcessor.addParser('.csv', async (filePath) => {
// Custom CSV parsing logic
return {
title: 'CSV File',
content: parsedContent,
metadata: { type: 'csv' }
}
})
Best Practices
1. Document Preparation
- Clean text: Remove headers/footers
- Logical sections: Split by topics
- Meaningful titles: Help with retrieval
2. Chunk Size
- Smaller chunks (500-1000): Better precision
- Larger chunks (2000-3000): More context
- Overlap: 10-20% prevents context loss
3. Query Optimization
typescript
// Good: Specific query
"What was the Q4 2023 revenue?"
// Bad: Too vague
"Tell me about the company"
4. Context Window Management
typescript
// Limit RAG results to fit context
const MAX_CONTEXT_TOKENS = 2000
const MAX_RESULTS = 3 // Adjust based on chunk size
Performance
Vector DB Stats
- Index time: ~100ms per chunk
- Search time: ~50ms for 10k documents
- Storage: ~1KB per chunk
Optimization Tips
- Pre-process documents: Clean before upload
- Batch uploads: Process multiple files together
- Cache embeddings: Reuse for similar queries
Troubleshooting
Poor Results
- Check chunk size: Too small loses context
- Improve queries: Be specific
- Document quality: Ensure clean text
Slow Performance
- Reduce chunk overlap: Less processing
- Limit search results: Fewer to process
- Use SSD: Faster vector operations
Memory Issues
- Stream large files: Process in chunks
- Cleanup old documents: Remove unused
- Monitor vector DB size: Regular maintenance
Security
Access Control
With custom isolation:
typescript
// User can only search their documents
const db = await vectorDB.getDB(userId)
const results = await db.searchDocuments(query)
Sensitive Data
- Don't upload: PII, credentials, secrets
- Sanitize: Remove sensitive info first
- Encrypt: Use disk encryption
Roadmap
- [ ] More file formats (Excel, Markdown)
- [ ] Metadata filtering
- [ ] Incremental updates
- [ ] Multi-language support
- [ ] Custom embeddings models