§ 01 · CORE TECH

China's First

EmoMonte
Affective Engine

From recognizing emotion · to expressing it

China's first end-to-end emotional voice agent — built on the “end-to-end simulation + Monte-Carlo pruning” paradigm. Machines move from recognizing emotion to expressing it, evolving from chat tools into warm, present companions.

50 emotions · all expressed

Emotion recognition

50kinds

Emotion expression

17kinds

Dialog latency

300ms

Token cost

↓70%

§ 02 · TECH ROADMAP

Architecture comparison

Unlike traditional cascades or bolt-on ASR+LLM+TTS pipelines, LANCUN takes an end-to-end native-fusion path — simulation plus pruning.

Route A · Cascade

Emotion as a separate module

ASRNLUPolicyTTS

ASR CER 1.9-3.3%SER 89-97% / NLG BLEU 0.86

Cross-domain success only 11-12%

Route B · ASR+LLM+TTS

Emotion bolted on

ASRLLMTTS

LLM is strong but needs ASR-ECRecognition errors cascade downstream

Emotion module is fragmented

LANCUN · End-to-end simulation + pruning

Unified speech-policy-emotion optimization

End-to-end voice model

Simulation · Evaluation · Pruning · Trinity

Unified speech-policy-emotion optimization
Fewer steps + higher success rate
Emotion and policy aligned

1Simulation

Monte-Carlo simulation

2Pruning

Neural network pruning

3Evaluation

Supervisory AI user ⇌ Bot

§ 03 · CORE CAPABILITIES

Six core capabilities

From recognition to expression · from perception to generation · from chat tool to companion

Emotion recognition

50 kinds

Combines voice, language, and paralinguistic cues to read the user's emotional state — full spectrum from basic emotions (joy / anger / sadness / happiness) to complex ones (anxiety, anticipation, hesitation, relief).

Emotion expression

17 kinds

AI expresses 17 emotions on its own — not just by synthesizing different tones, but by embedding emotion into semantics, rhythm, pauses and stress, giving conversation warmth, range, and realism.

Dialog latency

300 ms

An end-to-end voice architecture, device-side pre-processing, and a low-latency cloud routing layer push overall dialog latency down to 300 ms — natural conversational rhythm.

Voiceprint analysis

Voiceprint quickly distinguishes who's speaking and their state, removing redundant recognition and emotion inference — and enabling differentiated strategies across multiple speakers and roles.

Full-duplex voice

Supports interruption, addition, and correction at any point in the dialogue — the model listens and speaks at the same time, leaving walkie-talkie turn-taking behind for human-like conversation.

Paralinguistic cues

Recognizes sighs, laughter, hesitation, and breathing rhythm — non-verbal signals — and weaves these “human” details back into expression so the AI doesn't just answer but actually converses.

§ 04 · INDUSTRY HERITAGE

A decade of industry depth · shipped know-how

From cascade-era LSTM-CTC to end-to-end multimodal LLMs — the LANCUN team lived through and contributed to the full evolution of speech tech.

2015

2016

2018

2019

2024

2025

LSTM-CTC ships

Speech recognition enters industrial use

Baidu Deep CNN

Noise-robustness leaps forward

DFCNN & LFR-DFSMN

Lightweight model deployment becomes practical

SMLTA

Streaming decoding · dialogue enters real-time

GPT-4o end-to-end MM

Machines move toward humanity · a new tech branch

LANCUN EmoMonte

China's first emotional voice agent · end-to-end simulation + pruning · expressive emotions

2015

LSTM-CTC ships

Speech recognition enters industrial use

2016

Baidu Deep CNN

Noise-robustness leaps forward

2018

DFCNN & LFR-DFSMN

Lightweight model deployment becomes practical

2019

SMLTA

Streaming decoding · dialogue enters real-time

2024

GPT-4o end-to-end MM

Machines move toward humanity · a new tech branch

2025

LANCUN EmoMonte

China's first emotional voice agent · end-to-end simulation + pruning · expressive emotions

17yrs

Industrial projects still running

98%

AI-workstation delivery rate

300ms

Dialog-latency floor

70%↓

Token call cost

EmoMonteAffective Engine

Architecture comparison

Route A · Cascade

Route B · ASR+LLM+TTS

LANCUN · End-to-end simulation + pruning

Six core capabilities

Emotion recognition

Emotion expression

Dialog latency

Voiceprint analysis

Full-duplex voice

Paralinguistic cues

A decade of industry depth · shipped know-how

Give your product · warm-blooded conversation

EmoMonte
Affective Engine