Skip to main content

Speex Encoder

Speex is a patent-free audio compression format designed specifically for speech encoding. The VisioForge provides a flexible encoder with multiple operation modes and configuration options to optimize speech compression for different use cases.

Cross-platform Speex output (X-engines: VideoCaptureCoreX, VideoEditCoreX, Media Blocks)

Encoder Modes

The Speex encoder supports four distinct modes of operation, each optimized for different frequency ranges:

  • Auto (0): Automatically selects the most appropriate band mode based on input
  • Ultra Wide Band (1): Optimized for 32 kHz sampling rate
  • Wide Band (2): Optimized for 16 kHz sampling rate
  • Narrow Band (3): Optimized for 8 kHz sampling rate

Use the SpeexEncoderSettings class to configure the encoder mode and other parameters.

Supported Audio Parameters

Sample Rates

The encoder supports three standard sampling rates:

  • 8,000 Hz (Narrow Band)
  • 16,000 Hz (Wide Band)
  • 32,000 Hz (Ultra Wide Band)

Channel Configuration

Supports both:

  • Mono (1 channel)
  • Stereo (2 channels)

Rate Control Methods

The Speex encoder implements several rate control mechanisms that can be used independently or in combination:

Fixed Quality Mode

Uses the Quality parameter to maintain consistent quality:

var settings = new SpeexEncoderSettings {
Quality = 8.0f, // Range: 0-10, default: 8
VBR = false // Disable VBR for pure quality-based encoding
};

Variable Bit Rate (VBR)

Dynamically adjusts bitrate based on content complexity:

var settings = new SpeexEncoderSettings {
VBR = true,
Quality = 8.0f // Acts as the target quality for VBR
};

Average Bit Rate (ABR)

Maintains a target average bitrate over time:

var settings = new SpeexEncoderSettings {
ABR = 15.0f, // Target bitrate in kbps
VBR = true // ABR requires VBR to be enabled
};

Fixed Bitrate

Uses a constant bitrate throughout encoding:

var settings = new SpeexEncoderSettings {
Bitrate = 24.6f, // Fixed bitrate in kbps
VBR = false
};

The supported bitrates range from 2.15 kbps to 24.6 kbps:

  • 2.15 kbps
  • 3.95 kbps
  • 5.95 kbps
  • 8.00 kbps
  • 11.0 kbps
  • 15.0 kbps
  • 18.2 kbps
  • 24.6 kbps

Advanced Features

Voice Activity Detection (VAD)

Detects presence of speech in the audio:

var settings = new SpeexEncoderSettings {
VAD = true, // Enable voice activity detection
DTX = true // Usually enabled with VAD for bandwidth efficiency
};

Discontinuous Transmission (DTX)

Reduces bandwidth usage during silence periods:

var settings = new SpeexEncoderSettings {
DTX = true // Enable discontinuous transmission
};

Encoding Complexity

Controls the trade-off between encoding quality and CPU usage:

var settings = new SpeexEncoderSettings {
Complexity = 3 // Range: 1-10, default: 3
};

Complete Usage Example

Here's a comprehensive example showing how to configure and use the Speex encoder:

// Check if Speex encoder is available
if (!SpeexEncoderSettings.IsAvailable())
{
throw new InvalidOperationException("Speex encoder is not available on this system.");
}

// Create encoder settings
var encoderSettings = new SpeexEncoderSettings
{
// Basic configuration
Mode = SpeexEncoderMode.UltraWideBand,
SampleRate = 32000,
Channels = 1,

// Quality settings
Quality = 8.0f,
Complexity = 3,

// Rate control
VBR = true,
ABR = 15.0f,

// Voice optimization
VAD = true,
DTX = true,

// Frame configuration
NFrames = 1
};

Add the Speex output to the VideoCaptureCoreX instance:

// Create a Video Capture SDK core instance
var core = new VideoCaptureCoreX();

// Add the Speex output
core.Outputs_Add(encoderSettings, true);

Set the output format for the Video Edit SDK core instance:

// Create a Video Edit SDK core instance
var core = new VideoEditCoreX();

// Set the output format
core.Output_Format = encoderSettings;

Create a Media Blocks OPUS output instance:

// Create a Speex encoder instance
var speexEncoder = new SpeexEncoderBlock(encoderSettings);

Performance Considerations

When configuring the Speex encoder, consider these performance factors:

  1. Higher complexity values provide better quality but require more CPU resources
  2. VBR with VAD and DTX provides optimal bandwidth usage for speech content
  3. The NFrames parameter affects latency and processing efficiency
  4. Ultra Wide Band mode provides the highest quality but requires more bandwidth
  5. Using ABR helps maintain consistent bandwidth usage while allowing quality variations

This implementation of the Speex encoder is particularly well-suited for VoIP applications, podcast encoding, and other speech-focused audio applications where bandwidth efficiency and speech quality are primary concerns.