Multi-Agents
Collaboration Architecture

A distributed agent architecture for efficient collaboration. Fuses speech recognition, memory, tool use and execution loops — natively suited to automated operation, embodied intelligence, and multi-sensor coordination. From conversational AI to actionable AI.

Input

Multimodal

Cognition

LLM Orchestration

Execution

Soft + Hardware

Persistence

Closed-loop

§ 02 · CLOSED-LOOP ARCHITECTURE

Input · Cognition · Execution · Memory Feedback

Not layered stacks — a closed loop. Three-stage flow plus memory continuously feeding decisions back, forming a self-evolving Agent system.

横向滑动查看完整架构图

Input & Recognition

Voice in · PCM 16kHz

Text in · Text Stream

DashScope ASR · server_vad endpoint

Cognition & Decision

Prompt assembly · Platform + persona + memory + skills

Agents System

LLM → Tool → LLM → Response

Execution & Output

Streaming TTS · PCM 24kHz

Action execution · Hardware / workflow / system tools

Text reply · Text Response

Persistence & Memory

chat_state JSONB · PostgreSQL

Input & Recognition—Millisecond recognition · multi-language mix · long-form uninterrupted

Cognition & Decision—Think-tool-think loop · model plans proactively rather than reacting

Execution & Output—Three channels in parallel · intent → real-world action in one shot

Persistence & Memory—Structured session state on disk · each turn becomes next turn's experience

§ 03 · CAPABILITIES

Three capability leaps

More natural multimodal interaction

Voice, text, vision, sensor input fused seamlessly; output side generates TTS streams, text replies, and action commands in sync. One input fires three outputs — conversation pace approaches human.

Stronger tool use & orchestration

From single-call to flow orchestration — Agents chain multiple tools to complete complex tasks autonomously. The model decides which tool, when; developers only declare what each tool can do.

Software automation × embodied control

Operate software UIs (click, type, navigate), drive hardware (servos, lights, displays), and link multi-sensor input into spatial awareness. AI no longer just thinks in the cloud.

Multi-AgentsCollaboration Architecture