cloudflare
references/workers-ai/patterns.md
.md 121 lines
Content
# Workers AI Patterns
## RAG (Retrieval-Augmented Generation)
```typescript
// 1. Embed query
const embedding = await env.AI.run('@cf/baai/bge-base-en-v1.5', { text: query });
// 2. Search vectors
const results = await env.VECTORIZE.query(embedding.data[0], {
topK: 5, returnMetadata: true
});
// 3. Build context
const context = results.matches.map(m => m.metadata?.text).join('\n\n');
// 4. Generate with context
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [
{ role: 'system', content: `Answer based on:\n\n${context}` },
{ role: 'user', content: query }
]
});
```
## Streaming (SSE)
```typescript
const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages, stream: true
});
const { readable, writable } = new TransformStream();
const writer = writable.getWriter();
(async () => {
for await (const chunk of stream) {
await writer.write(new TextEncoder().encode(`data: ${JSON.stringify(chunk)}\n\n`));
}
await writer.write(new TextEncoder().encode('data: [DONE]\n\n'));
await writer.close();
})();
return new Response(readable, {
headers: { 'Content-Type': 'text/event-stream' }
});
```
## Error Handling & Retry
```typescript
async function runWithRetry(env, model, input, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await env.AI.run(model, input);
} catch (error) {
if (error.message?.includes('7505') && attempt < maxRetries - 1) {
await new Promise(r => setTimeout(r, Math.pow(2, attempt) * 1000));
continue;
}
throw error;
}
}
}
```
## Model Fallback
```typescript
try {
return await env.AI.run('@cf/meta/llama-3.1-70b-instruct', { messages });
} catch {
return await env.AI.run('@cf/meta/llama-3.1-8b-instruct', { messages });
}
```
## Prompt Patterns
```typescript
// System prompts
const PROMPTS = {
json: 'Respond with valid JSON only.',
concise: 'Keep responses brief.',
cot: 'Think step by step before answering.'
};
// Few-shot
messages: [
{ role: 'system', content: 'Extract as JSON' },
{ role: 'user', content: 'John bought 3 apples for $5' },
{ role: 'assistant', content: '{"name":"John","item":"apples","qty":3}' },
{ role: 'user', content: actualInput }
]
```
## Parallel Execution
```typescript
const [sentiment, summary, embedding] = await Promise.all([
env.AI.run('@cf/mistral/mistral-7b-instruct-v0.1', { messages: sentimentPrompt }),
env.AI.run('@cf/meta/llama-3.1-8b-instruct', { messages: summaryPrompt }),
env.AI.run('@cf/baai/bge-base-en-v1.5', { text })
]);
```
## Cost Optimization
| Task | Model | Neurons |
|------|-------|---------|
| Classify | `@cf/mistral/mistral-7b-instruct-v0.1` | ~50 |
| Chat | `@cf/meta/llama-3.1-8b-instruct` | ~200 |
| Complex | `@cf/meta/llama-3.1-70b-instruct` | ~2000 |
| Embed | `@cf/baai/bge-base-en-v1.5` | ~10 |
```typescript
// Batch embeddings
const response = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
text: textsArray // Process multiple at once
});
```