LANCUN
§ 01 · CORE TECH
China's First

EmoMonte
Affective Engine

From recognizing emotion · to expressing it

China's first end-to-end emotional voice agent — built on the “end-to-end simulation + Monte-Carlo pruning” paradigm. Machines move from recognizing emotion to expressing it, evolving from chat tools into warm, present companions.

50 emotions · all expressed
50 emotions · all expressed
50kinds
Emotion recognition
17kinds
Emotion expression
300ms
Dialog latency
70%
Token cost
§ 02 · TECH ROADMAP

Architecture comparison

Unlike traditional cascades or bolt-on ASR+LLM+TTS pipelines, LANCUN takes an end-to-end native-fusion path — simulation plus pruning.

Route A · Cascade

Emotion as a separate module
ASRNLUPolicyTTS
ASR CER 1.9-3.3%SER 89-97% / NLG BLEU 0.86
Cross-domain success only 11-12%

Route B · ASR+LLM+TTS

Emotion bolted on
ASRLLMTTS
LLM is strong but needs ASR-ECRecognition errors cascade downstream
Emotion module is fragmented

LANCUN · End-to-end simulation + pruning

Unified speech-policy-emotion optimization
End-to-end voice model
Simulation · Evaluation · Pruning · Trinity
  • Unified speech-policy-emotion optimization
  • Fewer steps + higher success rate
  • Emotion and policy aligned
1Simulation
Monte-Carlo simulation
2Pruning
Neural network pruning
3Evaluation
Supervisory AI user ⇌ Bot
§ 03 · CORE CAPABILITIES

Six core capabilities

From recognition to expression · from perception to generation · from chat tool to companion

01

Emotion recognition

50 kinds

Combines voice, language, and paralinguistic cues to read the user's emotional state — full spectrum from basic emotions (joy / anger / sadness / happiness) to complex ones (anxiety, anticipation, hesitation, relief).

02

Emotion expression

17 kinds

AI expresses 17 emotions on its own — not just by synthesizing different tones, but by embedding emotion into semantics, rhythm, pauses and stress, giving conversation warmth, range, and realism.

03

Dialog latency

300 ms

An end-to-end voice architecture, device-side pre-processing, and a low-latency cloud routing layer push overall dialog latency down to 300 ms — natural conversational rhythm.

04

Voiceprint analysis

Voiceprint quickly distinguishes who's speaking and their state, removing redundant recognition and emotion inference — and enabling differentiated strategies across multiple speakers and roles.

05

Full-duplex voice

Supports interruption, addition, and correction at any point in the dialogue — the model listens and speaks at the same time, leaving walkie-talkie turn-taking behind for human-like conversation.

06

Paralinguistic cues

Recognizes sighs, laughter, hesitation, and breathing rhythm — non-verbal signals — and weaves these “human” details back into expression so the AI doesn't just answer but actually converses.

§ 04 · INDUSTRY HERITAGE

A decade of industry depth · shipped know-how

From cascade-era LSTM-CTC to end-to-end multimodal LLMs — the LANCUN team lived through and contributed to the full evolution of speech tech.

2015
LSTM-CTC ships
Speech recognition enters industrial use
2016
Baidu Deep CNN
Noise-robustness leaps forward
2018
DFCNN & LFR-DFSMN
Lightweight model deployment becomes practical
2019
SMLTA
Streaming decoding · dialogue enters real-time
2024
GPT-4o end-to-end MM
Machines move toward humanity · a new tech branch
2025
LANCUN EmoMonte
China's first emotional voice agent · end-to-end simulation + pruning · expressive emotions
17yrs
Industrial projects still running
98%
AI-workstation delivery rate
300ms
Dialog-latency floor
70%
Token call cost

Give your product · warm-blooded conversation