RFC 0008: AI Agent Integration with AGUI/A2UI Protocol
Summary
Add AI Agent capabilities to AuroraView, enabling natural language control of the browser and WebView. Support multiple AI providers (OpenAI, Anthropic Claude, Google Gemini, Azure OpenAI, local models) with extensible Python bindings. Implement AGUI (Agent-User Interaction Protocol) and A2UI (Agent to UI) protocols for standardized AI-frontend communication.
Motivation
Current Limitations
- No AI Integration: AuroraView lacks native AI capabilities for intelligent browsing
- Manual Control Only: Users must manually navigate, search, and interact
- No Standard Protocol: No unified way for AI agents to communicate with UI
- Limited Extensibility: Users cannot easily add custom AI providers
Goals
- Multi-Provider Support: OpenAI, Claude, Gemini, Azure, local models (Ollama, LM Studio)
- AGUI/A2UI Compliance: Follow emerging AI-UI interaction standards
- Python Extensibility: Allow users to implement custom providers
- Seamless Integration: AI controls real WebView, not simulated content
- DCC Compatible: Works in Maya, Houdini, Blender, and standalone
Use Cases
- Natural Language Navigation: "Open GitHub and search for AuroraView"
- Content Summarization: "Summarize this page"
- Intelligent Actions: "Bookmark all tabs related to Python"
- Code Generation: "Create a login form with Tailwind CSS"
- DCC Workflow: "Find Maya scripting documentation"
- Auto-call Application APIs: AI automatically discovers and calls Python APIs bound to WebView
Design
1. Architecture Overview
┌─────────────────────────────────────────────────────────────────────────────┐
│ AI Agent Integration Architecture │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Provider Layer (Rust + Python) │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌──────────────┐ │ │
│ │ │ OpenAI │ │ Anthropic │ │ Gemini │ │ Azure OpenAI │ │ │
│ │ │ GPT-4/4o │ │ Claude 3 │ │ 2.0 Flash │ │ GPT-4 │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └──────────────┘ │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────────┐ │ │
│ │ │ Ollama │ │ LM Studio │ │ Custom Python Provider │ │ │
│ │ │ Local LLMs │ │ Local LLMs │ │ (User Extensible) │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ↓ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Protocol Layer (AGUI + A2UI) │ │
│ │ │ │
│ │ ┌──────────────────────────┐ ┌──────────────────────────────┐ │ │
│ │ │ AGUI Events (16+) │ │ A2UI UI Components │ │ │
│ │ │ - TEXT_MESSAGE_START │ │ - Declarative UI specs │ │ │
│ │ │ - TEXT_MESSAGE_CONTENT │ │ - Component catalog │ │ │
│ │ │ - TEXT_MESSAGE_END │ │ - Safe rendering │ │ │
│ │ │ - TOOL_CALL_START │ │ - SSE + JSON-RPC │ │ │
│ │ │ - TOOL_CALL_END │ │ │ │ │
│ │ │ - STATE_SYNC │ │ │ │ │
│ │ │ - ... │ │ │ │ │
│ │ └──────────────────────────┘ └──────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ↓ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Action Layer (Browser Control) │ │
│ │ │ │
│ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │
│ │ │ navigate │ │ search │ │ bookmark │ │ execute │ │ │
│ │ │ (url) │ │ (query) │ │ (add) │ │ (js) │ │ │
│ │ └───────────┘ └───────────┘ └───────────┘ └───────────┘ │ │
│ │ │ │
│ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │
│ │ │ tab_* │ │ summarize │ │ generate │ │ custom │ │ │
│ │ │(new/close)│ │ (page) │ │ (ui) │ │ (action) │ │ │
│ │ └───────────┘ └───────────┘ └───────────┘ └───────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ↓ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Presentation Layer (Frontend) │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ Modern Chat UI (Gemini Browser Inspired) │ │ │
│ │ │ │ │ │
│ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │ │
│ │ │ │ Chat Panel │ │ History │ │ Discover/Actions │ │ │ │
│ │ │ │ - Messages │ │ - Sessions │ │ - Capabilities │ │ │ │
│ │ │ │ - Streaming│ │ - Search │ │ - Quick Actions │ │ │ │
│ │ │ │ - UI Cards │ │ │ │ │ │ │ │
│ │ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘2. Sidebar/Drawer Mode Architecture
AI Agent supports two presentation modes:
- Standalone Browser Mode - AI as a complete browser interface
- Sidebar/Drawer Mode - AI as a collapsible sidebar attached to any WebView
┌───────────────────────────────────────────────────────────────────────────┐
│ WebView + AI Sidebar │
├───────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────┐ ┌───────────────────┐ │
│ │ │ │ AI Agent Sidebar │ │
│ │ │ │ ┌─────────────┐ │ │
│ │ Main WebView Content │ │ │ Chat Messages│ │ │
│ │ │ │ │ │ │ │
│ │ (Your app, tool panels, DCC plugins) │ │ │ User: ... │ │ │
│ │ │ │ │ AI: ... │ │ │
│ │ │ │ │ │ │ │
│ │ │ │ └─────────────┘ │ │
│ │ │ │ ┌─────────────┐ │ │
│ │ │ │ │ Input Box │ │ │
│ │ │ │ └─────────────┘ │ │
│ └─────────────────────────────────────────────┘ └───────────────────┘ │
│ ↑ │
│ Collapsible/Resizable │
└───────────────────────────────────────────────────────────────────────────┘Sidebar Features:
- Collapsible/expandable (like VSCode sidebar)
- Resizable width
- Draggable to left/right side
- Keyboard shortcut support (Ctrl+Shift+A to toggle)
- Shares context with main WebView
3. Auto-Discovery of WebView APIs
AI Agent automatically discovers and uses all API methods bound via bind_call():
┌─────────────────────────────────────────────────────────────────────────┐
│ API Auto-Discovery Flow │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ │
│ │ WebView │ │
│ │ │ bind_call("api.echo", echo_func) │
│ │ bind_call() ├──────────────────────────────────────────┐ │
│ │ bind_api() │ bind_api(my_api_object) │ │
│ └──────────────┘ │ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ API Registry (Python) │ │
│ │ │ │
│ │ _bound_functions = { │ │
│ │ "api.echo": <function echo>, │ │
│ │ "api.export_scene": <function export_scene>, │ │
│ │ "api.create_cube": <function create_cube>, │ │
│ │ "browser.navigate": <function navigate>, │ │
│ │ ... │ │
│ │ } │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │ │
│ │ get_bound_methods() │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ AI Agent Tool Registration │ │
│ │ │ │
│ │ Auto-converted to AI tool definitions: │ │
│ │ { │ │
│ │ "name": "api.echo", │ │
│ │ "description": "Echo a message back", // from docstring │ │
│ │ "parameters": { // inferred from types │ │
│ │ "type": "object", │ │
│ │ "properties": {"message": {"type": "string"}} │ │
│ │ } │ │
│ │ } │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘4. Crate Structure
crates/
├── auroraview-ai-agent/ # NEW: AI Agent Integration
│ ├── Cargo.toml
│ └── src/
│ ├── lib.rs # Public API
│ ├── agent.rs # AIAgent struct
│ ├── session.rs # ChatSession management
│ ├── message.rs # Message types
│ │
│ ├── providers/ # AI Provider implementations
│ │ ├── mod.rs
│ │ ├── provider.rs # Provider trait
│ │ ├── openai.rs # OpenAI (GPT-4, GPT-4o)
│ │ ├── anthropic.rs # Anthropic (Claude 3)
│ │ ├── gemini.rs # Google (Gemini 2.0)
│ │ ├── azure.rs # Azure OpenAI
│ │ ├── ollama.rs # Ollama (local models)
│ │ └── custom.rs # Custom provider bridge
│ │
│ ├── protocol/ # AGUI/A2UI Protocol
│ │ ├── mod.rs
│ │ ├── agui.rs # AGUI events
│ │ ├── a2ui.rs # A2UI components
│ │ └── streaming.rs # SSE streaming
│ │
│ ├── actions/ # Browser Actions (Function Calling)
│ │ ├── mod.rs
│ │ ├── navigation.rs # navigate, go_back, go_forward
│ │ ├── tabs.rs # new_tab, close_tab, switch_tab
│ │ ├── content.rs # summarize, search, bookmark
│ │ ├── ui.rs # generate_ui, show_toast
│ │ └── registry.rs # Action registry
│ │
│ └── ui/ # UI Components
│ ├── mod.rs
│ ├── components.rs # Predefined UI components
│ └── renderer.rs # Component renderer5. Provider Trait (Rust)
// crates/auroraview-ai-agent/src/providers/provider.rs
use async_trait::async_trait;
use serde::{Deserialize, Serialize};
/// AI Provider trait - implement this for new providers
#[async_trait]
pub trait AIProvider: Send + Sync {
/// Provider name (e.g., "openai", "anthropic", "gemini")
fn name(&self) -> &str;
/// Send message and get response (non-streaming)
async fn complete(&self, request: CompletionRequest) -> Result<CompletionResponse, AIError>;
/// Send message and stream response
async fn complete_stream(
&self,
request: CompletionRequest,
on_event: Box<dyn Fn(StreamEvent) + Send>,
) -> Result<(), AIError>;
/// Check if provider supports function calling
fn supports_function_calling(&self) -> bool;
/// Check if provider supports vision (image input)
fn supports_vision(&self) -> bool;
/// Get available models
fn available_models(&self) -> Vec<ModelInfo>;
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CompletionRequest {
pub model: String,
pub messages: Vec<Message>,
pub tools: Option<Vec<Tool>>,
pub tool_choice: Option<ToolChoice>,
pub max_tokens: Option<u32>,
pub temperature: Option<f32>,
pub stream: bool,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Message {
pub role: MessageRole,
pub content: MessageContent,
pub name: Option<String>,
pub tool_calls: Option<Vec<ToolCall>>,
pub tool_call_id: Option<String>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(rename_all = "lowercase")]
pub enum MessageRole {
System,
User,
Assistant,
Tool,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(untagged)]
pub enum MessageContent {
Text(String),
Parts(Vec<ContentPart>),
}
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(tag = "type")]
pub enum ContentPart {
#[serde(rename = "text")]
Text { text: String },
#[serde(rename = "image_url")]
ImageUrl { image_url: ImageUrl },
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Tool {
pub name: String,
pub description: String,
pub parameters: serde_json::Value, // JSON Schema
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ToolCall {
pub id: String,
pub name: String,
pub arguments: String, // JSON string
}
/// Stream events for real-time UI updates
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(tag = "type")]
pub enum StreamEvent {
#[serde(rename = "text_start")]
TextStart { id: String },
#[serde(rename = "text_delta")]
TextDelta { id: String, delta: String },
#[serde(rename = "text_end")]
TextEnd { id: String },
#[serde(rename = "tool_call_start")]
ToolCallStart { id: String, name: String },
#[serde(rename = "tool_call_delta")]
ToolCallDelta { id: String, delta: String },
#[serde(rename = "tool_call_end")]
ToolCallEnd { id: String },
#[serde(rename = "error")]
Error { message: String },
#[serde(rename = "done")]
Done,
}6. AGUI Protocol Events
// crates/auroraview-ai-agent/src/protocol/agui.rs
use serde::{Deserialize, Serialize};
/// AGUI Event Types (following AG-UI Protocol standard)
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(tag = "type")]
pub enum AGUIEvent {
// === Text Message Events ===
#[serde(rename = "TEXT_MESSAGE_START")]
TextMessageStart {
message_id: String,
role: String,
},
#[serde(rename = "TEXT_MESSAGE_CONTENT")]
TextMessageContent {
message_id: String,
delta: String,
},
#[serde(rename = "TEXT_MESSAGE_END")]
TextMessageEnd {
message_id: String,
},
// === Tool Call Events ===
#[serde(rename = "TOOL_CALL_START")]
ToolCallStart {
message_id: String,
tool_call_id: String,
tool_name: String,
},
#[serde(rename = "TOOL_CALL_ARGS")]
ToolCallArgs {
tool_call_id: String,
delta: String,
},
#[serde(rename = "TOOL_CALL_END")]
ToolCallEnd {
tool_call_id: String,
},
// === State Sync Events ===
#[serde(rename = "STATE_SNAPSHOT")]
StateSnapshot {
state: serde_json::Value,
},
#[serde(rename = "STATE_DELTA")]
StateDelta {
path: String,
operation: StateOperation,
value: serde_json::Value,
},
// === UI Generation Events (A2UI) ===
#[serde(rename = "UI_COMPONENT")]
UIComponent {
id: String,
component: UIComponentSpec,
},
// === Lifecycle Events ===
#[serde(rename = "RUN_START")]
RunStart {
run_id: String,
thread_id: Option<String>,
},
#[serde(rename = "RUN_END")]
RunEnd {
run_id: String,
},
#[serde(rename = "RUN_ERROR")]
RunError {
run_id: String,
error: ErrorInfo,
},
// === Human-in-the-Loop Events ===
#[serde(rename = "INTERRUPT_REQUEST")]
InterruptRequest {
id: String,
prompt: String,
options: Vec<InterruptOption>,
},
#[serde(rename = "INTERRUPT_RESPONSE")]
InterruptResponse {
id: String,
choice: String,
data: Option<serde_json::Value>,
},
}
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(rename_all = "lowercase")]
pub enum StateOperation {
Set,
Delete,
Append,
Insert,
}
/// A2UI Component Specification
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct UIComponentSpec {
#[serde(rename = "type")]
pub component_type: String,
pub props: serde_json::Value,
pub children: Option<Vec<UIComponentSpec>>,
}7. Browser Actions (Function Calling)
// crates/auroraview-ai-agent/src/actions/mod.rs
use serde::{Deserialize, Serialize};
/// Action registry for function calling
pub struct ActionRegistry {
actions: HashMap<String, Box<dyn Action>>,
}
/// Action trait - implement for new browser actions
pub trait Action: Send + Sync {
fn name(&self) -> &str;
fn description(&self) -> &str;
fn parameters_schema(&self) -> serde_json::Value;
fn execute(&self, args: serde_json::Value, ctx: &ActionContext) -> Result<ActionResult, ActionError>;
}
/// Built-in browser actions
pub fn register_builtin_actions(registry: &mut ActionRegistry) {
// Navigation
registry.register(NavigateAction);
registry.register(GoBackAction);
registry.register(GoForwardAction);
registry.register(ReloadAction);
registry.register(SearchAction);
// Tabs
registry.register(NewTabAction);
registry.register(CloseTabAction);
registry.register(SwitchTabAction);
registry.register(ListTabsAction);
// Content
registry.register(SummarizePageAction);
registry.register(ExtractTextAction);
registry.register(TakeScreenshotAction);
// Bookmarks
registry.register(AddBookmarkAction);
registry.register(ListBookmarksAction);
registry.register(SearchBookmarksAction);
// UI
registry.register(ShowToastAction);
registry.register(GenerateUIAction);
registry.register(SetThemeAction);
// Extensions
registry.register(ToggleExtensionAction);
registry.register(ListExtensionsAction);
}
// Example action implementation
pub struct NavigateAction;
impl Action for NavigateAction {
fn name(&self) -> &str { "navigate" }
fn description(&self) -> &str {
"Navigate to a URL. Supports both direct URLs and search queries."
}
fn parameters_schema(&self) -> serde_json::Value {
json!({
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "URL to navigate to, or search query"
},
"new_tab": {
"type": "boolean",
"description": "Open in new tab (default: false)"
}
},
"required": ["url"]
})
}
fn execute(&self, args: serde_json::Value, ctx: &ActionContext) -> Result<ActionResult, ActionError> {
let url = args["url"].as_str().ok_or(ActionError::MissingParam("url"))?;
let new_tab = args["new_tab"].as_bool().unwrap_or(false);
// Determine if URL or search query
let final_url = if url.contains('.') && !url.contains(' ') {
if url.starts_with("http") { url.to_string() }
else { format!("https://{}", url) }
} else {
format!("https://www.google.com/search?q={}", urlencoding::encode(url))
};
if new_tab {
ctx.browser.new_tab(&final_url)?;
} else {
ctx.browser.navigate(&final_url)?;
}
Ok(ActionResult::success(json!({
"navigated_to": final_url,
"new_tab": new_tab
})))
}
}8. Python Bindings
# python/auroraview/ai/agent.py
from typing import Optional, Callable, Any, Dict, List, Union
from dataclasses import dataclass, field
from enum import Enum
from abc import ABC, abstractmethod
class AIProviderType(Enum):
OPENAI = "openai"
ANTHROPIC = "anthropic"
GEMINI = "gemini"
AZURE = "azure"
OLLAMA = "ollama"
CUSTOM = "custom"
@dataclass
class AIConfig:
"""AI Agent configuration."""
provider: AIProviderType = AIProviderType.OPENAI
model: str = "gpt-4o"
api_key: Optional[str] = None
base_url: Optional[str] = None
temperature: float = 0.7
max_tokens: int = 4096
system_prompt: Optional[str] = None
class AIProvider(ABC):
"""Base class for custom AI providers.
Users can extend this to add support for custom AI backends.
"""
@abstractmethod
def name(self) -> str:
"""Provider name."""
pass
@abstractmethod
async def complete(
self,
messages: List[Dict],
tools: Optional[List[Dict]] = None,
**kwargs
) -> Dict:
"""Send completion request."""
pass
@abstractmethod
async def complete_stream(
self,
messages: List[Dict],
tools: Optional[List[Dict]] = None,
on_event: Optional[Callable[[Dict], None]] = None,
**kwargs
):
"""Send streaming completion request."""
pass
class AIAgent:
"""AI Agent for natural language control of browser and WebView.
Features:
- Auto-discovers all API methods bound to WebView
- Supports sidebar/drawer mode attached to any WebView
- Multi-provider AI support
Example - Standalone Browser Mode:
from auroraview import Browser
from auroraview.ai import AIAgent, AIConfig
browser = Browser(title="AI Browser")
agent = AIAgent(browser=browser, config=AIConfig(provider="gemini"))
await agent.chat("Open GitHub and search for Python")
Example - Sidebar Mode (attached to existing WebView):
from auroraview import WebView
from auroraview.ai import AIAgent
# Create your application WebView
webview = WebView(title="My App")
# Bind some API methods
@webview.bind_call("api.export_scene")
def export_scene(format: str = "fbx"):
'''Export current scene'''
return {"status": "exported"}
# Create AI Agent as sidebar
agent = AIAgent.as_sidebar(webview, config=AIConfig(provider="gemini"))
# AI can now automatically use export_scene method
webview.show()
"""
def __init__(
self,
browser: "Browser" = None,
webview: "WebView" = None,
config: Optional["AIConfig"] = None,
provider: Optional[AIProvider] = None,
auto_discover_apis: bool = True,
):
self.browser = browser
self.webview = webview or (browser._webview if browser else None)
self.config = config or AIConfig()
self._provider = provider
self._session = ChatSession()
self._actions = ActionRegistry()
self._init_builtin_actions()
# Auto-discover APIs bound to WebView
if auto_discover_apis and self.webview:
self._discover_webview_apis()
@classmethod
def as_sidebar(
cls,
webview: "WebView",
config: Optional["AIConfig"] = None,
**kwargs
) -> "AIAgent":
"""Create an AI Agent as a sidebar attached to a WebView.
Args:
webview: The WebView instance to attach sidebar to
config: AI configuration
**kwargs: Additional configuration parameters
Returns:
AIAgent instance configured in sidebar mode
Example:
webview = WebView(title="My Tool")
agent = AIAgent.as_sidebar(webview, config=AIConfig(provider="gemini"))
"""
agent = cls(webview=webview, config=config, **kwargs)
agent._enable_sidebar_mode()
return agent
def _enable_sidebar_mode(self):
"""Enable sidebar mode, inject sidebar UI into WebView."""
# Inject sidebar CSS and JS
self.webview.eval_js(self._get_sidebar_injection_script())
# Bind sidebar control methods
@self.webview.bind_call("ai.toggle_sidebar")
def toggle_sidebar():
self.webview.eval_js("window.__auroraview_ai_sidebar.toggle()")
return {"toggled": True}
@self.webview.bind_call("ai.chat")
def sidebar_chat(message: str):
return self.chat_sync(message)
def _discover_webview_apis(self):
"""Auto-discover and register all API methods bound to WebView."""
if not hasattr(self.webview, 'get_bound_methods'):
return
bound_methods = self.webview.get_bound_methods()
for method_name in bound_methods:
# Skip internal methods
if method_name.startswith('_') or method_name.startswith('ai.'):
continue
# Get function reference to extract docstring and type hints
func = self.webview._bound_functions.get(method_name)
if func is None:
continue
# Get description from docstring
description = func.__doc__ or f"Call {method_name}"
# Infer parameter schema from type annotations (if available)
parameters = self._infer_parameters_schema(func)
# Register as AI tool
self._actions.register_from_webview(
name=method_name,
description=description,
parameters=parameters,
webview=self.webview,
)
def _infer_parameters_schema(self, func: Callable) -> Dict:
"""Infer JSON Schema parameter definition from function signature."""
import inspect
sig = inspect.signature(func)
properties = {}
required = []
for name, param in sig.parameters.items():
if name in ('self', 'cls'):
continue
prop = {"type": "string"} # Default type
# Try to infer type from annotation
if param.annotation != inspect.Parameter.empty:
python_type = param.annotation
if python_type == int:
prop["type"] = "integer"
elif python_type == float:
prop["type"] = "number"
elif python_type == bool:
prop["type"] = "boolean"
elif python_type == str:
prop["type"] = "string"
elif python_type == list:
prop["type"] = "array"
elif python_type == dict:
prop["type"] = "object"
properties[name] = prop
# Parameters without default values are required
if param.default == inspect.Parameter.empty:
required.append(name)
return {
"type": "object",
"properties": properties,
"required": required,
}
def _init_builtin_actions(self):
"""Register built-in browser actions."""
self._actions.register("navigate", self._action_navigate)
self._actions.register("new_tab", self._action_new_tab)
self._actions.register("close_tab", self._action_close_tab)
self._actions.register("list_tabs", self._action_list_tabs)
self._actions.register("summarize", self._action_summarize)
self._actions.register("search", self._action_search)
self._actions.register("bookmark", self._action_bookmark)
self._actions.register("set_theme", self._action_set_theme)
# ... more actions
async def chat(
self,
message: str,
stream: bool = True,
on_event: Optional[Callable[[Dict], None]] = None,
) -> str:
"""Send a message to the AI agent.
Args:
message: User message
stream: Whether to stream the response
on_event: Callback for AGUI events (streaming mode)
Returns:
AI response text
"""
# Add user message to session
self._session.add_message(role="user", content=message)
# Prepare tools (function calling)
tools = self._actions.get_tools_schema()
# Get response from provider
if stream:
response = await self._complete_stream(
messages=self._session.messages,
tools=tools,
on_event=on_event,
)
else:
response = await self._complete(
messages=self._session.messages,
tools=tools,
)
# Handle tool calls
while response.get("tool_calls"):
for tool_call in response["tool_calls"]:
result = await self._execute_action(
name=tool_call["name"],
args=tool_call["arguments"],
)
self._session.add_tool_result(
tool_call_id=tool_call["id"],
result=result,
)
# Get next response
response = await self._complete(
messages=self._session.messages,
tools=tools,
)
# Add assistant response to session
assistant_message = response.get("content", "")
self._session.add_message(role="assistant", content=assistant_message)
return assistant_message
def register_action(
self,
name: str,
handler: Callable,
description: str = "",
parameters: Optional[Dict] = None,
):
"""Register a custom action for the AI to call."""
self._actions.register(name, handler, description, parameters)
def action(self, name: str):
"""Decorator to register a custom action."""
def decorator(func):
self.register_action(name, func, func.__doc__ or "")
return func
return decorator9. UI Components (AGUI/A2UI)
// packages/auroraview-sdk/src/inject/plugins/ai-agent.ts
import { UIComponentSpec } from '../types';
/**
* AGUI UI Component Types
*/
export interface AGUIComponents {
// Text content
markdown: { content: string };
// Navigation feedback
navigation_card: { url: string; title?: string };
// Tab management
tab_list: { tabs: Array<{ id: string; title: string; url: string; active: boolean }> };
// Actions
extension_toggle: { id: string; name: string; enabled: boolean };
success_toast: { message: string };
error_toast: { message: string };
// Rich content
code_snippet: { html: string; title: string; description?: string };
media_player: { title: string; artist: string; type: 'audio' | 'video'; url: string };
rich_link: { type: string; title: string; subtitle: string; url: string };
file_viewer: { filename: string; code: string; language: string };
// Interactive
action_buttons: { buttons: Array<{ id: string; label: string; primary?: boolean }> };
form_input: { fields: Array<{ name: string; type: string; label: string }> };
}
/**
* Render AGUI UI Component
*/
export function renderAGUIComponent(spec: UIComponentSpec): HTMLElement {
switch (spec.type) {
case 'markdown':
return renderMarkdown(spec.props as AGUIComponents['markdown']);
case 'navigation_card':
return renderNavigationCard(spec.props as AGUIComponents['navigation_card']);
case 'tab_list':
return renderTabList(spec.props as AGUIComponents['tab_list']);
case 'code_snippet':
return renderCodeSnippet(spec.props as AGUIComponents['code_snippet']);
// ... more components
default:
return renderUnknown(spec);
}
}10. Modern Browser UI (Gemini-Inspired)
The browser UI will be updated to include:
- Side Panel - Collapsible AI assistant panel
- Chat Interface - Real-time streaming messages
- History View - Chat session history
- Discover View - Quick action cards
- AGUI Components - Rich UI cards for responses
Design principles (from UI/UX Pro Max):
- Clean, minimal interface
- Glass morphism effects
- Dark/Light mode support
- Smooth animations
- Accessible components
11. Implementation Plan
Phase 1: Core AI Agent Crate (Week 1-2) ✅ Done
- [x] Create
crates/auroraview-ai-agent/- Implemented:
agent.rs,error.rs,lib.rs,message.rs,session.rs actions/:mod.rs,registry.rsprotocol/:a2ui.rs,agui.rs,mod.rsproviders/:mod.rs,types.rs,wrapper.rsui/:mod.rsassets/:ai_browser.html
- Implemented:
- [x] Implement Provider trait and types (
providers/types.rs) - [x] Provider wrapper supporting multiple providers (
providers/wrapper.rs) - [ ] Implement OpenAI provider (Python layer done)
- [ ] Implement Anthropic provider (Python layer done)
- [ ] Implement Gemini provider (Python layer done)
- [ ] Implement Ollama provider (local)
- [ ] Add custom provider bridge
Phase 2: AGUI/A2UI Protocol (Week 2-3) ✅ Done
- [x] Implement AGUI event types (
protocol/agui.rs) - [x] Implement A2UI component specs (
protocol/a2ui.rs) - [x] Add SSE streaming support
- [ ] Implement state synchronization
- [ ] Add human-in-the-loop interrupts
Phase 3: Browser Actions (Week 3-4) ✅ Done
- [x] Implement action registry (
actions/registry.rs) - [x] Add navigation actions (
actions/mod.rs) - [x] Add tab management actions
- [x] Add content actions (summarize, extract)
- [x] Add bookmark actions
- [x] Add UI actions (toast, theme)
Phase 4: Modern UI (Week 4-5) ✅ Done
- [x] Create AI browser HTML (
assets/ai_browser.html)- 完整的 Gemini 风格浏览器界面
- 包含 Tab 管理、URL 栏、Chat Panel
- 支持 Light/Dark 主题
- [x] Implement chat interface
- Gallery:
AISidebar.tsx组件 - React hooks:
useAIAgent.ts - 完整的消息列表、输入框、发送按钮
- Gallery:
- [x] Implement AGUI component renderer
- AGUI 事件订阅:
agui:text_message_start/content/end - 流式响应渲染支持
- 错误处理和状态管理
- AGUI 事件订阅:
- [x] Add session management
- 会话清除功能 (
clearSession) - 消息历史管理
- 会话清除功能 (
- [x] Apply Gemini-inspired styling
- 渐变 Logo、圆角设计、动画效果
- 响应式布局
- [x] Model selector with multi-provider support
- [x] Tool discovery display
- [ ] Update browser_controller.html with AI panel (Optional - 独立组件)
Phase 5: Python Bindings (Week 5-6) ✅ Done
- [x] Implement Python AIAgent class (
python/auroraview/ai/agent.py) - [x] Implement Python AIProvider base class and config (
config.py) - [x] Implement AGUI protocol Python version (
protocol.py) - [x] Implement tool registry (
tools.py) - [x] Add action decorator API
- [x] Create convenience functions
Phase 6: Documentation & Gallery (Week 6-7) ✅ Done
- [x] Create Gallery demos
AISidebar.tsx- 完整的 AI 聊天侧边栏组件useAIAgent.ts- React hook 封装- 类型定义:
types/ai.ts
- [x] Write API documentation (包含在 RFC 中)
- Python API 使用示例
- Sidebar mode 配置
- Auto-discovery 机制说明
- [x] Add integration examples
- DCC 场景示例 (Maya API 绑定)
- Custom provider 示例
- AGUI 事件处理示例
- [ ] Write standalone tutorial guides (可选 - 未来改进)
API Examples
Basic Usage
from auroraview import Browser
from auroraview.ai import AIAgent, AIConfig, AIProviderType
# Create browser with AI agent
browser = Browser(title="AI Browser")
agent = AIAgent(
browser=browser,
config=AIConfig(
provider=AIProviderType.GEMINI,
model="gemini-2.0-flash-exp",
api_key=os.environ["GEMINI_API_KEY"],
)
)
# Natural language control
await agent.chat("Open GitHub")
await agent.chat("Search for Python web frameworks")
await agent.chat("Bookmark the top result")
await agent.chat("Summarize this page")Sidebar Mode (Recommended)
Attach AI Agent as a sidebar to any WebView application:
from auroraview import WebView
from auroraview.ai import AIAgent, AIConfig
# Create your application WebView
webview = WebView(title="My DCC Tool Panel")
# Bind your application APIs
@webview.bind_call("api.export_scene")
def export_scene(format: str = "fbx", path: str = None):
"""Export current scene to specified format.
Args:
format: Export format (fbx, obj, usd)
path: Export path (optional)
"""
import maya.cmds as cmds
export_path = path or f"/tmp/scene.{format}"
cmds.file(export_path, exportAll=True, type=format)
return {"exported": export_path, "format": format}
@webview.bind_call("api.create_primitive")
def create_primitive(shape: str = "cube", size: float = 1.0, name: str = None):
"""Create a primitive geometry in the scene.
Args:
shape: Geometry type (cube, sphere, cylinder, plane)
size: Size
name: Object name
"""
import maya.cmds as cmds
if shape == "cube":
result = cmds.polyCube(w=size, h=size, d=size, name=name or "cube")
elif shape == "sphere":
result = cmds.polySphere(r=size/2, name=name or "sphere")
elif shape == "cylinder":
result = cmds.polyCylinder(r=size/2, h=size, name=name or "cylinder")
else:
result = cmds.polyPlane(w=size, h=size, name=name or "plane")
return {"created": result[0], "shape": shape, "size": size}
@webview.bind_call("api.list_objects")
def list_objects(type_filter: str = None):
"""List objects in the scene."""
import maya.cmds as cmds
if type_filter:
objects = cmds.ls(type=type_filter)
else:
objects = cmds.ls(dag=True, visible=True)
return {"objects": objects, "count": len(objects)}
# Create AI Agent as sidebar (auto-discovers all bound APIs above)
agent = AIAgent.as_sidebar(
webview,
config=AIConfig(
provider="gemini",
model="gemini-2.0-flash-exp",
)
)
# Start application
webview.show()
# Now users can chat with AI in the sidebar:
# - "Create a cube named 'hero_cube' with size 2"
# - "List all mesh objects in the scene"
# - "Export the scene as FBX"
# AI will automatically call the corresponding api.* methods!Auto API Discovery Explained
AI Agent automatically converts WebView-bound methods to AI tools:
from auroraview import WebView
from auroraview.ai import AIAgent
webview = WebView()
# Method 1: Using bind_call decorator
@webview.bind_call("tools.calculate")
def calculate(expression: str) -> dict:
"""Calculate a mathematical expression.
Args:
expression: Mathematical expression, e.g., "2 + 2 * 3"
Returns:
Dictionary containing the calculation result
"""
result = eval(expression) # Example only, use safe evaluation in production
return {"expression": expression, "result": result}
# Method 2: Using bind_api to bind an entire class
class SceneAPI:
def get_selection(self) -> dict:
"""Get currently selected objects."""
return {"selected": ["cube1", "sphere1"]}
def set_attribute(self, object: str, attr: str, value: float) -> dict:
"""Set an object attribute value.
Args:
object: Object name
attr: Attribute name (e.g., translateX, rotateY)
value: New attribute value
"""
return {"object": object, "attr": attr, "value": value, "status": "set"}
webview.bind_api(SceneAPI(), namespace="scene")
# Create Agent - auto-discovers all bound methods
agent = AIAgent.as_sidebar(webview, auto_discover_apis=True)
# AI can now use these tools:
# - tools.calculate: "Calculate a mathematical expression"
# - scene.get_selection: "Get currently selected objects"
# - scene.set_attribute: "Set an object attribute value"
webview.show()Sidebar UI Configuration
from auroraview import WebView
from auroraview.ai import AIAgent, AIConfig, SidebarConfig
webview = WebView(title="My Application")
# Customize sidebar configuration
sidebar_config = SidebarConfig(
position="right", # "left" or "right"
width=380, # Initial width in pixels
min_width=280, # Minimum width
max_width=600, # Maximum width
collapsed=False, # Initially collapsed
resizable=True, # Can resize
keyboard_shortcut="Ctrl+Shift+A", # Toggle shortcut
theme="auto", # "light", "dark", or "auto"
)
agent = AIAgent.as_sidebar(
webview,
config=AIConfig(provider="gemini"),
sidebar_config=sidebar_config,
)
webview.show()Custom Provider
from auroraview.ai import AIAgent, AIProvider
class MyLLMProvider(AIProvider):
def name(self) -> str:
return "my-llm"
async def complete(self, messages, tools=None, **kwargs):
# Call your custom LLM API
response = await my_llm_api.chat(messages)
return {"content": response.text}
async def complete_stream(self, messages, tools=None, on_event=None, **kwargs):
async for chunk in my_llm_api.chat_stream(messages):
if on_event:
on_event({"type": "text_delta", "delta": chunk.text})
agent = AIAgent(browser=browser, provider=MyLLMProvider())Custom Actions (DCC Scenario)
In addition to auto-discovered APIs, you can manually register additional actions:
# In Maya
from auroraview.ai import AIAgent
agent = AIAgent(browser=browser)
@agent.action("export_scene")
def export_scene(format: str = "fbx", path: str = None):
"""Export the current Maya scene."""
import maya.cmds as cmds
export_path = path or f"/tmp/scene.{format}"
cmds.file(export_path, exportAll=True, type=format)
return {"exported": export_path}
@agent.action("create_cube")
def create_cube(size: float = 1.0, name: str = "cube"):
"""Create a cube in the Maya scene."""
import maya.cmds as cmds
cube = cmds.polyCube(w=size, h=size, d=size, name=name)
return {"created": cube[0]}
# Now AI can use these actions
await agent.chat("Create a cube named 'myBox' with size 2")
await agent.chat("Export the scene as FBX")AGUI Event Handling
def on_event(event: dict):
if event["type"] == "TEXT_MESSAGE_CONTENT":
print(event["delta"], end="", flush=True)
elif event["type"] == "TOOL_CALL_START":
print(f"\n[Calling: {event['tool_name']}]")
elif event["type"] == "UI_COMPONENT":
render_component(event["component"])
await agent.chat(
"Search for Python tutorials and summarize the top 3",
on_event=on_event
)