RAG System

Retrieval-Augmented Generation (RAG) allows your LLM to answer questions based on your documents.

How It Works

Upload documents → Parse and chunk text
Generate embeddings → Convert chunks to vectors
Store in vector DB → Save for fast retrieval
Query time → Find relevant chunks and include in context

Supported Formats

PDF - Text extraction from PDF files
DOCX - Microsoft Word documents
TXT - Plain text files

Basic Usage

Enable RAG in Chat

typescript

const response = await fetch('/api/bento/chat/stream', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    messages: [
      { role: 'user', content: 'What does the report say about revenue?' }
    ],
    ragEnabled: true // Enable RAG
  })
})

Upload Documents

typescript

const formData = new FormData()
formData.append('file', file)

const response = await fetch('/api/bento/documents/upload', {
  method: 'POST',
  body: formData
})

const { documentId, chunks } = await response.json()
console.log(`Uploaded ${chunks} chunks`)

List Documents

typescript

const response = await fetch('/api/bento/documents/list')
const { documents } = await response.json()

documents.forEach(doc => {
  console.log(`${doc.title} - ${doc.chunks} chunks`)
})

Search Documents

typescript

const response = await fetch('/api/bento/documents/search', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    query: 'revenue growth',
    limit: 5
  })
})

const { results } = await response.json()

Configuration

Chunk Settings

Control how documents are split:

typescript

const server = createBentoServer({
  openRouterKey: 'your-key',
  chunking: {
    size: 1000,      // Characters per chunk
    overlap: 200,    // Overlap between chunks
    minSize: 100     // Minimum chunk size
  }
})

Embedding Model

Currently uses OpenAI's text-embedding-ada-002 via OpenRouter.

Advanced Features

Document Metadata

Add metadata during upload:

typescript

const formData = new FormData()
formData.append('file', file)
formData.append('metadata', JSON.stringify({
  category: 'financial',
  year: 2024,
  department: 'sales'
}))

Filter by Metadata

typescript

// Coming soon: metadata filtering
const results = await vectorDB.searchDocuments(embedding, {
  limit: 5,
  filter: {
    category: 'financial',
    year: { $gte: 2023 }
  }
})

Custom Processing

Override default document processing:

typescript

import { documentProcessor } from 'bento-core'

// Add custom parser
documentProcessor.addParser('.csv', async (filePath) => {
  // Custom CSV parsing logic
  return {
    title: 'CSV File',
    content: parsedContent,
    metadata: { type: 'csv' }
  }
})

Best Practices

1. Document Preparation

Clean text: Remove headers/footers
Logical sections: Split by topics
Meaningful titles: Help with retrieval

2. Chunk Size

Smaller chunks (500-1000): Better precision
Larger chunks (2000-3000): More context
Overlap: 10-20% prevents context loss

3. Query Optimization

typescript

// Good: Specific query
"What was the Q4 2023 revenue?"

// Bad: Too vague
"Tell me about the company"

4. Context Window Management

typescript

// Limit RAG results to fit context
const MAX_CONTEXT_TOKENS = 2000
const MAX_RESULTS = 3 // Adjust based on chunk size

Performance

Vector DB Stats

Index time: ~100ms per chunk
Search time: ~50ms for 10k documents
Storage: ~1KB per chunk

Optimization Tips

Pre-process documents: Clean before upload
Batch uploads: Process multiple files together
Cache embeddings: Reuse for similar queries

Troubleshooting

Poor Results

Check chunk size: Too small loses context
Improve queries: Be specific
Document quality: Ensure clean text

Slow Performance

Reduce chunk overlap: Less processing
Limit search results: Fewer to process
Use SSD: Faster vector operations

Memory Issues

Stream large files: Process in chunks
Cleanup old documents: Remove unused
Monitor vector DB size: Regular maintenance

Security

Access Control

With custom isolation:

typescript

// User can only search their documents
const db = await vectorDB.getDB(userId)
const results = await db.searchDocuments(query)

Sensitive Data

Don't upload: PII, credentials, secrets
Sanitize: Remove sensitive info first
Encrypt: Use disk encryption

Roadmap

[ ] More file formats (Excel, Markdown)
[ ] Metadata filtering
[ ] Incremental updates
[ ] Multi-language support
[ ] Custom embeddings models

RAG System #

How It Works #

Supported Formats #

Basic Usage #

Enable RAG in Chat #

Upload Documents #

List Documents #

Search Documents #

Configuration #

Chunk Settings #

Embedding Model #

Advanced Features #

Document Metadata #

Filter by Metadata #

Custom Processing #

Best Practices #

1. Document Preparation #

2. Chunk Size #

3. Query Optimization #

4. Context Window Management #

Performance #

Vector DB Stats #

Optimization Tips #

Troubleshooting #

Poor Results #

Slow Performance #

Memory Issues #

Security #

Access Control #

Sensitive Data #

Roadmap #

Examples #

RAG System

How It Works

Supported Formats

Basic Usage

Enable RAG in Chat

Upload Documents

List Documents

Search Documents

Configuration

Chunk Settings

Embedding Model

Advanced Features

Document Metadata

Filter by Metadata

Custom Processing

Best Practices

1. Document Preparation

2. Chunk Size

3. Query Optimization

4. Context Window Management

Performance

Vector DB Stats

Optimization Tips

Troubleshooting

Poor Results

Slow Performance

Memory Issues

Security

Access Control

Sensitive Data

Roadmap

Examples