Speex Encoder
Speex is a patent-free audio compression format designed specifically for speech encoding. The VisioForge provides a flexible encoder with multiple operation modes and configuration options to optimize speech compression for different use cases.
Cross-platform Speex output (X-engines: VideoCaptureCoreX, VideoEditCoreX, Media Blocks)
Encoder Modes
The Speex encoder supports four distinct modes of operation, each optimized for different frequency ranges:
- Auto (0): Automatically selects the most appropriate band mode based on input
- Ultra Wide Band (1): Optimized for 32 kHz sampling rate
- Wide Band (2): Optimized for 16 kHz sampling rate
- Narrow Band (3): Optimized for 8 kHz sampling rate
Use the SpeexEncoderSettings class to configure the encoder mode and other parameters.
Supported Audio Parameters
Sample Rates
The encoder supports three standard sampling rates:
- 8,000 Hz (Narrow Band)
- 16,000 Hz (Wide Band)
- 32,000 Hz (Ultra Wide Band)
Channel Configuration
Supports both:
- Mono (1 channel)
- Stereo (2 channels)
Rate Control Methods
The Speex encoder implements several rate control mechanisms that can be used independently or in combination:
Fixed Quality Mode
Uses the Quality
parameter to maintain consistent quality:
var settings = new SpeexEncoderSettings {
Quality = 8.0f, // Range: 0-10, default: 8
VBR = false // Disable VBR for pure quality-based encoding
};
Variable Bit Rate (VBR)
Dynamically adjusts bitrate based on content complexity:
var settings = new SpeexEncoderSettings {
VBR = true,
Quality = 8.0f // Acts as the target quality for VBR
};
Average Bit Rate (ABR)
Maintains a target average bitrate over time:
var settings = new SpeexEncoderSettings {
ABR = 15.0f, // Target bitrate in kbps
VBR = true // ABR requires VBR to be enabled
};
Fixed Bitrate
Uses a constant bitrate throughout encoding:
var settings = new SpeexEncoderSettings {
Bitrate = 24.6f, // Fixed bitrate in kbps
VBR = false
};
The supported bitrates range from 2.15 kbps to 24.6 kbps:
- 2.15 kbps
- 3.95 kbps
- 5.95 kbps
- 8.00 kbps
- 11.0 kbps
- 15.0 kbps
- 18.2 kbps
- 24.6 kbps
Advanced Features
Voice Activity Detection (VAD)
Detects presence of speech in the audio:
var settings = new SpeexEncoderSettings {
VAD = true, // Enable voice activity detection
DTX = true // Usually enabled with VAD for bandwidth efficiency
};
Discontinuous Transmission (DTX)
Reduces bandwidth usage during silence periods:
var settings = new SpeexEncoderSettings {
DTX = true // Enable discontinuous transmission
};
Encoding Complexity
Controls the trade-off between encoding quality and CPU usage:
var settings = new SpeexEncoderSettings {
Complexity = 3 // Range: 1-10, default: 3
};
Complete Usage Example
Here's a comprehensive example showing how to configure and use the Speex encoder:
// Check if Speex encoder is available
if (!SpeexEncoderSettings.IsAvailable())
{
throw new InvalidOperationException("Speex encoder is not available on this system.");
}
// Create encoder settings
var encoderSettings = new SpeexEncoderSettings
{
// Basic configuration
Mode = SpeexEncoderMode.UltraWideBand,
SampleRate = 32000,
Channels = 1,
// Quality settings
Quality = 8.0f,
Complexity = 3,
// Rate control
VBR = true,
ABR = 15.0f,
// Voice optimization
VAD = true,
DTX = true,
// Frame configuration
NFrames = 1
};
Add the Speex output to the VideoCaptureCoreX instance:
// Create a Video Capture SDK core instance
var core = new VideoCaptureCoreX();
// Add the Speex output
core.Outputs_Add(encoderSettings, true);
Set the output format for the Video Edit SDK core instance:
// Create a Video Edit SDK core instance
var core = new VideoEditCoreX();
// Set the output format
core.Output_Format = encoderSettings;
Create a Media Blocks OPUS output instance:
// Create a Speex encoder instance
var speexEncoder = new SpeexEncoderBlock(encoderSettings);
Performance Considerations
When configuring the Speex encoder, consider these performance factors:
- Higher complexity values provide better quality but require more CPU resources
- VBR with VAD and DTX provides optimal bandwidth usage for speech content
- The NFrames parameter affects latency and processing efficiency
- Ultra Wide Band mode provides the highest quality but requires more bandwidth
- Using ABR helps maintain consistent bandwidth usage while allowing quality variations
This implementation of the Speex encoder is particularly well-suited for VoIP applications, podcast encoding, and other speech-focused audio applications where bandwidth efficiency and speech quality are primary concerns.