Skip to content

Commit 14c8db4

Browse files
committed
Replace torchaudio.load with soundfile to fix FFmpeg/torchcodec issue
torchaudio 2.11+ hardcodes torchcodec which requires FFmpeg DLLs that are often missing on Windows. Replaced all torchaudio.load() calls with soundfile.read() + librosa.resample() in both dia/model.py (load_audio) and engine.py (_prepare_cloning_inputs). Removed torchcodec from requirements.txt.
1 parent c1e50d5 commit 14c8db4

File tree

2 files changed

+16
-3
lines changed

2 files changed

+16
-3
lines changed

dia/model.py

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -411,9 +411,23 @@ def _generate_output(self, generated_codes: torch.Tensor) -> np.ndarray:
411411
return result
412412

413413
def load_audio(self, audio_path: str) -> torch.Tensor:
414-
audio, sr = torchaudio.load(audio_path, channels_first=True) # C, T
414+
# Use soundfile instead of torchaudio.load to avoid FFmpeg/torchcodec dependency
415+
import soundfile as sf
416+
import numpy as np
417+
audio_np, sr = sf.read(audio_path, dtype='float32')
418+
if audio_np.ndim == 1:
419+
audio_np = audio_np[np.newaxis, :] # [1, T] mono
420+
else:
421+
audio_np = audio_np.T # [C, T] channels first
422+
audio = torch.from_numpy(audio_np)
415423
if sr != DEFAULT_SAMPLE_RATE:
416-
audio = torchaudio.functional.resample(audio, sr, DEFAULT_SAMPLE_RATE)
424+
import librosa
425+
resampled = []
426+
for ch in range(audio.shape[0]):
427+
resampled.append(torch.from_numpy(
428+
librosa.resample(audio[ch].numpy(), orig_sr=sr, target_sr=DEFAULT_SAMPLE_RATE)
429+
))
430+
audio = torch.stack(resampled)
417431
audio = audio.to(self.device).unsqueeze(0) # 1, C, T
418432
audio_data = self.dac_model.preprocess(audio, DEFAULT_SAMPLE_RATE)
419433
_, encoded_frame, _, _, _ = self.dac_model.encode(audio_data) # 1, C, T

requirements.txt

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@ soundfile # Requires libsndfile system library (e.g., sudo apt-get install libsn
1212
huggingface_hub
1313
descript-audio-codec
1414
safetensors
15-
torchcodec
1615
openai-whisper
1716

1817
# Configuration & Utilities

0 commit comments

Comments
 (0)