EXTENSION

3D Spatial Audio

Add dynamic, immersive audio to your real-time experience
A woman meditating in a living room wearing a virtual reality headset, surrounded by holographic jellyfish and immersed in spatial audio effects.
Supported Platforms
Android
iOS
Windows
EXTENSION

3D Spatial Audio

Add dynamic, immersive audio to your real-time experience
Stylized glowing AI orb and a microphone icon labeled 'Your AI Agent'
Supported Platforms
Android
iOS
Windows
Customers building with
Agora and OpenAI
grepp logoWYZE logokileon logokumu logoScaler logoParallel logoJorJin logoAnotherBall logoEllie logozigbang logo
grepp logoWYZE logokileon logokumu logoScaler logoParallel logoJorJin logoAnotherBall logoEllie logozigbang logo

Features

Natural listening experience icon

Natural listening experience 

Support for high-quality audio range, audio playback, background blur, air attenuation and more, perfectly simulating a natural listening experience.
Highest fidelity 3D audio icon

Highest fidelity 3D audio

Supports 48kHz full-band sampling and allows listeners to pinpoint both the direction and distance of a voice coming from the speaker.
Low Latency icon

Low latency

Low latency, low power consumption, and efficient processing modes preserve the real-time experience.
Cross-platform support icon

Cross-platform support

Agora’s streaming 3D audio API has support for Web, iOS, Android, Mac, Windows, Unity, React Native, and Electron.
Global scalability icon

Global scalability

Scale from 1:1 to millions of users on the network that annually powers hundreds of billions of minutes of real-time video to users in over 200 countries and regions.

Talk to a voice agent powered by the Conversational AI Engine

Try it now
One real-time view for the metrics that matter the most
Use a single dashboard to monitor every active session around the world. Track the metrics that are most important to you, from concurrent users and channels to network latency and so much more.

Your vision, unrestricted.

With Interactive Whiteboard, you can build a collaborative app fast—with custom branding and full of features. Our platform makes it easy to create a customized and engaging learning environment.
  • Flexible APIs support custom branding and extensive digital whiteboard features.
  • Easily integrate real-time voice and video calling, interactive streaming and signaling.
  • Save users’ bandwidth by preloading, sharing, and annotating files, and retain all the dynamic content.
And have peace of mind with HIPAA, GDPR, and CCPA compliance.

See OpenAI's Realtime API in action

Deliver a more natural audio experience

Make your product stand out with Agora’s 3D Spatial Audio API that boosts user engagement.
Deliver a more realistic audio experience icon

Deliver a more realistic audio experience

Replicate how we hear sound in the real world for a more natural experience that makes users feel like they are in the same room.
Deliver a more realistic audio experience icon

Deliver a more realistic audio experience

Integrate quickly and easily icon

Integrate quickly and easily

Quickly make your user experience more immersive by activating Agora’s 3D Spatial Audio extension that works seamlessly with our video, voice, and streaming products.
Integrate quickly and easily icon

Integrate quickly and easily

Give users the best audio quality icon

Give users the best audio quality

Allow your audience to hear deeper nuances of music and spoken word with superior audio that elevates the quality of the user’s entire experience.
Give users the best audio quality icon

Give users the best audio quality

Recording options for:

Cloud recording
Store, retrieve and share recordings in the cloud.
Go to Docs
On-premise recording
Store on a local server for security and confidentiality.
Go to Docs
Webpage recording
Record the entire web browser screen experience.
Go to Docs

Agora Media Services

Recording icon
Recording
Record audio streams, video streams and web pages for archive, review, or distribution.
Live icon
Media Gateway
Directly push media streams into Agora voice and video channels using the RTMP/SRT protocol and enable advanced transcoding processing on media streams to facilitate distribution.
Cloud Transcoding
Beta
Obtain audio and video source streams from hosts in RTC channels and perform transcoding, audio mixing, and video compositing.
Download icon
Media Pull
Add additional engagement to your Agora sessions by  pulling live or recorded video and audio content and ingesting directly into your Agora channel.
Media Push
Expand your audience with hybrid engagement experiences by pushing audio and video streams from Agora channels to Content Delivery Networks (CDN).

Made for developers

Quickstart guide

View the quickstart guide to get up and running with Agora and Open AI.

How the Conversational AI Engine works

Made for developers

Your Code

Agora SDK

Customize your experience from the start with our flexible SDK.
Your Code

Agora SDK

Build and integrate real-time video into your app with the most flexibility and  customization using Agora's Video SDK.
NO CODE

App Builder

Agora’s App Builder is the fastest and easiest way to real-time video into your product using our no-code visual designer.
Go to Docs
low code

Agora UI Kit

Add real-time video to your app with only a few lines of code using low-code UI Kit libraries.
Go to Docs
your code

Agora SDK

Customize your experience from the start with our flexible SDK.
Android
iOS
Windows
Go to Docs
low code

Agora UI Kit

Integrate real-time communication and streaming using only a few lines of code with low-code UIKit libraries.
Go to Docs

Documentation

Documentation

This project presents you a set of API examples to help you understand how to use Agora APIs.
View documentation on how to set up 3D Spatial Audio.
Android
iOS
Windows
Go to Docs

Activate Extension

Activate the AI Noise Suppression extension on the Agora Console.

Activate the 3D Spatial Audio extension in the Agora Console.

Go to Console
your code

Agora SDK

Build and integrate Live Streaming with the most flexibility and full customization using Agora's Video SDK.
Android
iOS
Windows
Go to Docs
NO code

App Builder

Agora’s App Builder is the fastest and easiest way to add real-time voice chat, video chat, and live streaming into your product.
Go to Docs
your code

Agora SDK

Build and integrate real-time visual collaboration features into your application with the most flexibility and full customization using Agora's Interactive Whiteboard SDK.
Android
iOS
Windows
Go to Docs
LOW code

Fastboard

Build real-time visual collaboration faster with a pre-built UI and the ability to include custom plug ins.
Try it Now
Security, privacy and compliance
Agora is certified to the ISO/IEC 27001, 27017, 27018, 27701 and SOC 2 security standards and meets privacy regulations like GDPR, CCAP, COPPA, and HIPAA. Agora doesn’t collect or store any end-user data aside from Internet Protocol (IP) addresses and operational information necessary for providing our services.
ISO 27001:2022
ISO 27017:2015
ISO 27018:2019
ISO 27701:2019
HIPAA
GDPR
SOC2 Type1&2
CCPA
COPPA
HOW TO INTEGRATE?
Streamlined 3-step integration process:
01
Activate Agora Conversational AI Engine
Unlock real-time Speech-to-Text (STT) and Text-to-Speech (TTS) capabilities, enabling seamless conversational interactions. 
02
Integrate Agora Edge Chip on Hardware
Optimize microphone, speaker, and system efficiency to ensure ultra-low-latency and high-fidelity conversations.
03
Deploy AI Voice Agents
Enable interactive, multilingual, and user-customized conversations for a wide range of IoT applications.

Integrated chipset and module

By building our Conversational AI technology into RiseLink's high-performance IoT chip modules, the turnkey solution makes it easy to integrate voice AI into any connected toy.
“With Agora’s conversational AI technology and our optimized AI hardware, we’re enabling the next generation of toys to think, respond, and interact naturally. We are excited to usher in the future of robotics and toys, ones that can react to the environment around them and interact fluently with users.” 
Pengfei Zhang
CEO, Riselink
Use cases

Provide an exceptional immersive sound experience

A livecast of a gaming session with three players.

Livecasting

Create a more personal environment, as if friends are sharing same physical space.
A man is a on conference call next to several others, powered by 3D spatial audio which allows participants to hear him clearly.

Meetings / Conference calls

Make meetings more productive by allowing participants to focus on the main speaker—not background noises
A young child on a live video call on a laptop with his teacher and immersed in the lesson.

Education

Enrich the learning experience by making it more personal and memorable—as if the teacher is sitting next to the student.
A video of a live musical concert, providing an immersive experience and allowing listeners to enjoy the nuances in every note.

Music Streaming

Provide a fully immersive experience allowing listeners to enjoy the nuances in every note.
Robopoet's Fuzzoo, an AI companion robot, leverages Agora's ConvoAI Device Kit to deliver real-time emotional support and personalized interaction.
"Agora’s AI technology enables toys and robots to interact in a way that feels natural and engaging. With real-time voice processing, emotional AI, and advanced speech capabilities, Agora makes seamless human-machine interaction possible and ensures exceptional performance and reliability." 
Yuna Pan
Co-Founder and CTO
Mouse cursor illustration

Fastboard

Easily build and integrate Agora’s Interactive Whiteboard with our newest Fastboard SDK that delivers all the same whiteboard features with a pre-built UI and the ability to include custom plug ins.
Try it Now
No items found.
Request more information
Connect with our experts to answer your questions, discuss requirements, and provide more detail on the ConvoAI Device Kit

Frequently asked questions

How does Agora improve the experience in comparison with other solutions for voice interaction with AI?

Agora enables more natural voice conversations with AI, thanks to low-latency responses and real-time interruption handling. Agora’s built-in background noise suppression, echo cancelation, and selective attention locking allow AI to hear the user clearly in any environment. Agora’s global real-time network ensures connectivity and performance in any location.

What LLMs can be connected to Agora’s conversational AI platform?

Agora's Conversational AI Engine offers support for a wide range of large language models (LLMs), including:

  • OpenAI
  • OpenAI Realtime API
  • Azure OpenAI
  • Google Gemini
  • Google Vertex AI
  • Anthropic Claude
  • Dify
  • Custom LLM

Review our documentation on connecting LLMs here: https://docs.agora.io/en/conversational-ai/models/llm/overview

What automatic-speech-recognition (ASR) / speech-to text (STT) models are supported?

Agora’s Conversational AI Engine currently supports the following ASR providers:

  • ARES (default)  
  • Microsoft Azure
  • Deepgram

Review our documentation on connecting ASR models here: https://docs.agora.io/en/conversational-ai/models/asr/overview

What text-to-speech (TTS) models are supported?

Agora’s Conversational AI Engine currently supports the following TTS providers:

  • Microsoft Azure
  • ElevenLabs
  • Cartesia (Beta)
  • OpenAI (Beta)
  • Hume AI (Beta)

Review our documentation on connecting TTS models here: https://docs.agora.io/en/conversational-ai/models/tts/overview

What avatar providers are supported?

Agora’s Conversational AI Engine currently supports the following AI avatar providers:

  • Akool (Beta)
  • HeyGen (Alpha)

Review our documentation on connecting avatar providers here: https://docs.agora.io/en/conversational-ai/models/avatar/overview

What additional technology is required to implement a voice AI agent?

To implement a voice AI agent, you need to connect an LLM and a text-to-speech service to Agora’s Conversational AI Engine. This enables full customization of the experience, with the LLM and voice of your choice.

What is a “chained” or “cascade” model” in relation to conversational voice AI?

The chained or cascade model refers to the processing flow of the user’s voice being processed by automatic speech recognition (ASR) technology that converts speech to text, then that text being processed by the LLM, then the LLM’s response being processed by text-to-speech technology and ultimately outputting the AI agent’s voice response.

Does Agora’s Conversational AI Engine enable the creation of an AI model or LLM?

No, Agora’s Conversational AI Engine requires an existing AI model or LLM. The Engine enables customized voice interaction with the LLM but is not capable of creating or training an LLM.

FAQs

What is Agora Voice Calling?

Agora Voice Calling is a real-time voice API that lets developers embed high-quality, ultra-low latency voice chat into any application. It supports one-to-one calls, group voice chat, and large-scale audio rooms across devices and platforms.

Which platforms does Agora Voice Calling support?

Agora Voice Calling supports Android, iOS, Web, Windows, Electron, Flutter, React Native, Unity, and Unreal Engine. This allows teams to build consistent voice experiences across mobile, web, desktop, and immersive environments.

How does Agora deliver HD audio quality with low latency?

Agora uses a 48 kHz sampling rate with full-bandwidth audio capture and intelligent routing over its global real-time network. This minimizes latency, jitter, and packet loss to deliver clear, stable voice calls—even on unstable networks.

Does Agora support AI-powered voice features?

Yes. Agora Voice Calling includes AI-powered features such as Noise Suppression, Real-Time Speech to Text, and seamless integration with large language models and text-to-speech engines to enable intelligent, voice-driven experiences.

Can I record voice calls and audio sessions?

Yes. Agora supports flexible voice recording in the cloud or on premises. Developers control audio formats, storage locations, and recording quality to support playback, analytics, moderation, or compliance needs.

What is 3D Spatial Audio and when should I use it?

3D Spatial Audio simulates real-world sound positioning, making conversations feel more immersive and natural. It’s commonly used in gaming, social audio rooms, virtual workspaces, and metaverse-style experiences.

How quickly can I launch a voice calling experience?

You can integrate Agora Voice Calling within hours using SDKs, documentation, and sample apps. For teams that want to move faster, Agora App Builder offers a no-code option to deploy voice chat without custom development.

What applications are best suited for Agora Voice Calling?

Agora Voice Calling is ideal for education platforms, multiplayer games, social apps, collaboration tools, live shopping, customer engagement, and IoT devices—any use case that requires reliable, real-time voice communication at global scale.

FAQs

What is Agora Video Calling?

Agora Video Calling is a real-time video API that lets developers embed high-quality, low-latency video calls into web, mobile, and native applications. It supports everything from 1:1 calls to large-scale video experiences with full customization.

Which platforms are supported by Agora’s Video Calling SDK?

Agora Video Calling supports Android, iOS, Web, Windows, Electron, Flutter, React Native, Unity, and Unreal Engine—making it easy to deliver consistent video experiences across devices and operating systems.

How does Agora ensure reliable video quality in poor network conditions?

Agora uses intelligent routing and adaptive video optimization to reduce jitter, lag, and packet loss. The platform dynamically adjusts video quality in real time to maintain smooth, uninterrupted calls—even on slow or unstable networks.

What collaboration features are available with Agora Video Calling?

Agora supports advanced collaboration features such as screen sharing, interactive whiteboards, multi-user video layouts, and real-time messaging. These features make it well suited for meetings, education, telehealth, and collaborative work apps.

Can I record video calls and meetings?

Yes. Agora provides flexible video call recording options, allowing you to record securely to the cloud or on local servers. Developers control video format, resolution, storage location, and access permissions to meet compliance and operational needs.

Does Agora support multi-camera or multi-audio setups?

Yes. Agora supports multi-track audio and video, making it possible to publish multiple camera feeds or microphone streams within a single session. This is ideal for live production workflows, virtual events, and advanced conferencing scenarios.

How fast can I launch a video calling experience?

You can ship a video calling app within hours using Agora SDKs, documentation, and sample apps. For even faster deployment, Agora App Builder provides a no-code option to launch video, voice, and live streaming features without custom development.

What use cases are best suited for Agora Video Calling?

Agora Video Calling is ideal for education, remote work, gaming, social apps, live shopping, and telehealth. Any application that requires scalable, real-time video communication with global reach and low latency can benefit from Agora’s platform.

FAQs

What is Agora Real-Time Chat?

Agora Real-Time Chat is a customizable chat SDK that lets developers add secure, scalable messaging to real-time video, voice, and live streaming applications. It supports one-to-one messaging, group chat, and large community channels.

Which platforms are supported by Agora’s Chat SDK?

Agora’s Chat SDK supports Android, iOS, Web, Windows, Flutter, React Native, and Unity, making it easy to deliver consistent messaging experiences across mobile, desktop, and cross-platform apps.

What messaging features does Agora Chat support?

Agora Chat supports rich media messaging including emojis, images, files, GPS locations, structured messages, and voice notes. Core messaging features also include offline messaging, message recall and deletion, read receipts, typing indicators, presence, and push notifications.

How does Agora ensure chat security and compliance?

Agora Chat uses TLS/SSL encryption for data in transit and encrypted file storage to protect user data. The platform also supports privacy compliance features such as user data deletion and secure message handling.

Does Agora Chat include moderation and community safety tools?

Yes. Agora Chat includes built-in content moderation to help filter profanity, offensive language, and inappropriate images or text. Developers can also integrate third-party moderation tools for additional control.

Can Agora Chat support multilingual users?

Yes. Agora Chat supports multilingual message translation with automatic, on-demand, or push-based translation options, enabling users to communicate in their preferred language.

How quickly can I launch a chat experience with Agora?

Developers can launch a chat experience within hours using Agora SDKs, documentation, and sample apps. For faster implementation, Agora UI Kit provides a low-code option to add messaging with minimal development effort.

What use cases are best suited for Agora Real-Time Chat?

Agora Real-Time Chat is ideal for education platforms, gaming communities, social apps, collaboration tools, live commerce, and telehealth—any application that requires reliable, secure, and engaging real-time messaging.

FAQs

What is Agora Real-Time Speech to Text?

Agora Real-Time Speech to Text is a cloud-based live transcription and subtitling service that converts real-time audio into accurate text for live audio and video applications. It enables captions, transcripts, and AI-powered workflows without impacting real-time performance.

How does Real-Time Speech to Text work in live audio and video sessions?

Agora’s cloud-based transcription processes audio streams in real time and converts speech into text with low latency. Transcripts can be delivered as live captions to participants, stored for later review, or exported for downstream processing.

Can I integrate Real-Time Speech to Text with large language models (LLMs)?

Yes. Real-time transcripts can be integrated with large language models to generate summaries, meeting notes, action items, feedback, or translations. Transcripts can also be exported as .vtt files for seamless LLM processing without affecting RTC performance.

Does Agora support multiple speakers and overlapping speech?

Yes. Agora supports real-time speaker recognition and labeling for up to three simultaneous speakers. Each speaker can be transcribed separately, improving accuracy in conversations with interruptions or overlapping dialogue.

What languages are supported by Agora’s Real-Time Speech to Text?

Agora supports all major languages and regional dialects. Each channel can transcribe up to two languages simultaneously, making it ideal for multilingual meetings, events, and global applications.

Can I generate captions for recorded audio or video?

Yes. Agora supports transcription for cloud-recorded audio and video, enabling closed captions (CC) during playback and searchable transcripts for reviewing important discussion points.

How does Agora ensure transcription accuracy at scale?

Agora uses advanced AI techniques to reduce silence, lower Word Error Rate (WER), and maintain accuracy even with accents, overlapping speech, poor audio quality, or unstable networks. The solution scales from one-to-one sessions to millions of participants with consistent accuracy.

Is Real-Time Speech to Text secure and compliant?

Yes. Agora is ISO and SOC 2 certified and supports compliance with GDPR, CCPA, and HIPAA. Live captions and transcripts can be encrypted using the same security mechanisms as Agora’s real-time audio and video streams.

FAQs

What is Agora Recording?

Agora Recording is an extension that allows developers to record audio streams, video streams, interactive content, and web pages for archive, review, compliance, or redistribution. It supports cloud, on-premises, and webpage recording options.

What types of content can I record with Agora?

Agora Recording can capture audio, video, screen content, whiteboards, chat messages, and live streaming elements. You can record single streams or multiple streams separately, making it easy to edit, combine, or repurpose content later.

What’s the difference between single-stream and multi-stream recording?

Single-stream recording combines audio, video, and content into one synchronized file. Multi-stream recording captures each audio, video, or content stream separately, giving you greater flexibility for post-production, analysis, or moderation workflows.

Where are recordings stored?

Recordings can be stored in the cloud or on-premises, depending on your deployment needs. Agora supports third-party cloud storage providers such as Amazon S3, Microsoft Azure, Google Cloud, Alibaba Cloud, Tencent Cloud, and others.

Can Agora Recording support moderation and compliance requirements?

Yes. Agora Recording supports screenshots for moderation, customizable capture intervals, digital watermarks, and content moderation tools. These features help enforce community guidelines, protect intellectual property, and meet regulatory or organizational requirements.

How secure is Agora Recording?

Agora Recording is built with enterprise-grade security, including end-to-end encryption for calls, transmission, and storage. It supports globally distributed clusters, automatic backups, proxy services, and LAN deployment to meet strict data security and privacy needs.

How quickly can I integrate recording into my application?

Developers can integrate Agora Recording in as little as 30 minutes using RESTful APIs. The service is designed to be easy to embed, test, and deploy, with automatic uploading and backup to ensure recordings are not lost.

What use cases are best suited for Agora Recording?

Agora Recording is ideal for virtual events and webinars, large-scale live streaming, customer service quality assurance, education and online classes, and telehealth consultations—any scenario where capturing, reviewing, or distributing real-time interactions is essential.