Reddit Machine Learning11h ago|Research & Papers Products & Services

Dual-engine approach for detecting AI-generated music in compressed audio

The author explores a hybrid approach to detect AI-generated music, combining a CNN-based model and a source separation engine, to overcome the limitations of CNN-only models on compressed audio formats like MP3.

💡

Why it matters

This hybrid approach offers a more robust solution for detecting AI-generated audio content, which is crucial as AI-generated media becomes more prevalent.

Key Points

1CNN-based detection on mel-spectrograms breaks when audio is compressed to MP3
2Combining a CNN model with a source separation engine (Demucs) can achieve 80%+ detection rate on AI-generated music
3The hybrid approach works regardless of audio codec (MP3, AAC, OGG) and saves compute by only using the expensive source separation when the CNN is uncertain

Details

The author was working on detecting AI-generated music and faced the same issue as Deezer's team - CNN-based models trained on mel-spectrograms perform well on uncompressed WAV files but break down when the audio is compressed to MP3. To address this, the author added a second engine based on source separation using the Demucs model. The idea is to separate the audio into 4 stems (vocals, drums, bass, other), remix them, and measure the difference between the original and reconstructed audio. For human-recorded music, the stems bleed into each other during recording, so the reconstruction process produces noticeable differences. For AI-generated music, where each stem is synthesized independently, the reconstruction yields nearly identical results. This hybrid approach achieved a human false positive rate of ~1.1% and an AI detection rate of 80%+, working across different audio codecs. The limitations include varying detection rates across different AI generators, non-deterministic behavior of Demucs in borderline cases, and only being tested on music (not speech or sound effects).

Dual-engine approach for detecting AI-generated music in compressed audio

Why it matters

Key Points

Details

Dive deeper

Related Articles

Create Datasets from TikTok Videos

Is TensorFlow the

Comparing ResNet and Facial Landmarks for Real-time Student…

ACL ARR Submission Desk Rejected Due to Duplicate Versions

Audit Finds Issues with LoCoMo Long-Term Memory Benchmark

Building a Transformer Out of Claudes — Collaboration Reque…

Building a Demand Forecasting System for Multi-Location Ret…

Looking for Definition of Open-World Learning Problem

Concerns About Increasing Appendix Lengths in AI Conference…

Choosing Between ACL SRW, ICML Workshop, and AACL for Paper…

AI Curator

Ask me anything about AI

Related Articles

Create Datasets from TikTok Videos

Comparing ResNet and Facial Landmarks for Real-time Student…

ACL ARR Submission Desk Rejected Due to Duplicate Versions

Audit Finds Issues with LoCoMo Long-Term Memory Benchmark

Building a Transformer Out of Claudes — Collaboration Reque…

Building a Demand Forecasting System for Multi-Location Ret…

Looking for Definition of Open-World Learning Problem

Concerns About Increasing Appendix Lengths in AI Conference…

Choosing Between ACL SRW, ICML Workshop, and AACL for Paper…