azure-ai-voicelive-java
Azure AI VoiceLive SDK for Java. Real-time bidirectional voice conversations with AI assistants using WebSocket.
- risk
- unknown
- source
- community
- date added
- 2026-02-27
Azure AI VoiceLive SDK for Java
Real-time, bidirectional voice conversations with AI assistants using WebSocket technology.
Installation
<dependency> <groupId>com.azure</groupId> <artifactId>azure-ai-voicelive</artifactId> <version>1.0.0-beta.2</version> </dependency>
Environment Variables
AZURE_VOICELIVE_ENDPOINT=https://<resource>.openai.azure.com/ AZURE_VOICELIVE_API_KEY=<your-api-key>
Authentication
API Key
import com.azure.ai.voicelive.VoiceLiveAsyncClient; import com.azure.ai.voicelive.VoiceLiveClientBuilder; import com.azure.core.credential.AzureKeyCredential; VoiceLiveAsyncClient client = new VoiceLiveClientBuilder() .endpoint(System.getenv("AZURE_VOICELIVE_ENDPOINT")) .credential(new AzureKeyCredential(System.getenv("AZURE_VOICELIVE_API_KEY"))) .buildAsyncClient();
DefaultAzureCredential (Recommended)
import com.azure.identity.DefaultAzureCredentialBuilder; VoiceLiveAsyncClient client = new VoiceLiveClientBuilder() .endpoint(System.getenv("AZURE_VOICELIVE_ENDPOINT")) .credential(new DefaultAzureCredentialBuilder().build()) .buildAsyncClient();
Key Concepts
| Concept | Description |
|---|---|
VoiceLiveAsyncClient | Main entry point for voice sessions |
VoiceLiveSessionAsyncClient | Active WebSocket connection for streaming |
VoiceLiveSessionOptions | Configuration for session behavior |
Audio Requirements
- Sample Rate: 24kHz (24000 Hz)
- Bit Depth: 16-bit PCM
- Channels: Mono (1 channel)
- Format: Signed PCM, little-endian
Core Workflow
1. Start Session
import reactor.core.publisher.Mono; client.startSession("gpt-4o-realtime-preview") .flatMap(session -> { System.out.println("Session started"); // Subscribe to events session.receiveEvents() .subscribe( event -> System.out.println("Event: " + event.getType()), error -> System.err.println("Error: " + error.getMessage()) ); return Mono.just(session); }) .block();
2. Configure Session Options
import com.azure.ai.voicelive.models.*; import java.util.Arrays; ServerVadTurnDetection turnDetection = new ServerVadTurnDetection() .setThreshold(0.5) // Sensitivity (0.0-1.0) .setPrefixPaddingMs(300) // Audio before speech .setSilenceDurationMs(500) // Silence to end turn .setInterruptResponse(true) // Allow interruptions .setAutoTruncate(true) .setCreateResponse(true); AudioInputTranscriptionOptions transcription = new AudioInputTranscriptionOptions( AudioInputTranscriptionOptionsModel.WHISPER_1); VoiceLiveSessionOptions options = new VoiceLiveSessionOptions() .setInstructions("You are a helpful AI voice assistant.") .setVoice(BinaryData.fromObject(new OpenAIVoice(OpenAIVoiceName.ALLOY))) .setModalities(Arrays.asList(InteractionModality.TEXT, InteractionModality.AUDIO)) .setInputAudioFormat(InputAudioFormat.PCM16) .setOutputAudioFormat(OutputAudioFormat.PCM16) .setInputAudioSamplingRate(24000) .setInputAudioNoiseReduction(new AudioNoiseReduction(AudioNoiseReductionType.NEAR_FIELD)) .setInputAudioEchoCancellation(new AudioEchoCancellation()) .setInputAudioTranscription(transcription) .setTurnDetection(turnDetection); // Send configuration ClientEventSessionUpdate updateEvent = new ClientEventSessionUpdate(options); session.sendEvent(updateEvent).subscribe();
3. Send Audio Input
byte[] audioData = readAudioChunk(); // Your PCM16 audio data session.sendInputAudio(BinaryData.fromBytes(audioData)).subscribe();
4. Handle Events
session.receiveEvents().subscribe(event -> { ServerEventType eventType = event.getType(); if (ServerEventType.SESSION_CREATED.equals(eventType)) { System.out.println("Session created"); } else if (ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED.equals(eventType)) { System.out.println("User started speaking"); } else if (ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STOPPED.equals(eventType)) { System.out.println("User stopped speaking"); } else if (ServerEventType.RESPONSE_AUDIO_DELTA.equals(eventType)) { if (event instanceof SessionUpdateResponseAudioDelta) { SessionUpdateResponseAudioDelta audioEvent = (SessionUpdateResponseAudioDelta) event; playAudioChunk(audioEvent.getDelta()); } } else if (ServerEventType.RESPONSE_DONE.equals(eventType)) { System.out.println("Response complete"); } else if (ServerEventType.ERROR.equals(eventType)) { if (event instanceof SessionUpdateError) { SessionUpdateError errorEvent = (SessionUpdateError) event; System.err.println("Error: " + errorEvent.getError().getMessage()); } } });
Voice Configuration
OpenAI Voices
// Available: ALLOY, ASH, BALLAD, CORAL, ECHO, SAGE, SHIMMER, VERSE VoiceLiveSessionOptions options = new VoiceLiveSessionOptions() .setVoice(BinaryData.fromObject(new OpenAIVoice(OpenAIVoiceName.ALLOY)));
Azure Voices
// Azure Standard Voice options.setVoice(BinaryData.fromObject(new AzureStandardVoice("en-US-JennyNeural"))); // Azure Custom Voice options.setVoice(BinaryData.fromObject(new AzureCustomVoice("myVoice", "endpointId"))); // Azure Personal Voice options.setVoice(BinaryData.fromObject( new AzurePersonalVoice("speakerProfileId", PersonalVoiceModels.PHOENIX_LATEST_NEURAL)));
Function Calling
VoiceLiveFunctionDefinition weatherFunction = new VoiceLiveFunctionDefinition("get_weather") .setDescription("Get current weather for a location") .setParameters(BinaryData.fromObject(parametersSchema)); VoiceLiveSessionOptions options = new VoiceLiveSessionOptions() .setTools(Arrays.asList(weatherFunction)) .setInstructions("You have access to weather information.");
Best Practices
- Use async client — VoiceLive requires reactive patterns
- Configure turn detection for natural conversation flow
- Enable noise reduction for better speech recognition
- Handle interruptions gracefully with
setInterruptResponse(true) - Use Whisper transcription for input audio transcription
- Close sessions properly when conversation ends
Error Handling
session.receiveEvents() .doOnError(error -> System.err.println("Connection error: " + error.getMessage())) .onErrorResume(error -> { // Attempt reconnection or cleanup return Flux.empty(); }) .subscribe();
Reference Links
When to Use
This skill is applicable to execute the workflow or actions described in the overview.