Deep Active Speech Cancellation with Mamba-Masking Network

We present a novel deep learning network for Active Speech Cancellation (ASC), advancing beyond Active Noise Cancellation (ANC) methods by effectively canceling both noise and speech signals. The proposed Mamba-Masking architecture introduces a masking mechanism that directly interacts with the encoded reference signal, enabling adaptive and precisely aligned anti-signal generation—even under rapidly changing, high-frequency conditions, as commonly found in speech. Complementing this, a multi-band segmentation strategy further improves phase alignment across frequency bands. Additionally, we introduce an optimization-driven loss function that provides near-optimal supervisory signals for anti-signal generation. Experimental results demonstrate substantial performance gains, achieving up to 7.2dB improvement in ANC scenarios and 6.2dB in ASC, significantly outperforming existing methods.

Model

The first audio column, labeled Primary signal d(n), represents the signal that a listener would hear without the application of any ANC algorithm. The subsequent column, DeepASC , presents the canceling signal e(n) generated by our proposed model in response to the input signal from the first column. The following columns, ARN, DeepANC, and FxLMS, display the results produced by the ARN, DeepANC, and FxLMS methods, respectively, for the same input signal.

The first audio column, labeled Primary signal d(n), represents the signal that a listener would hear without the application of any ASC algorithm. The subsequent column, DeepASC , presents the canceling signal e(n) generated by our proposed model in response to the input signal from the first column. The following columns, ARN, DeepANC, and FxLMS, display the results produced by the ARN, DeepANC, and FxLMS methods, respectively, for the same input signal.