From the paper Data is Overrated: Perceptual Metrics Can Lead Learning in the Absence of Training Data. The following are examples from the test set (Perceived Music Quality Dataset).

Key

Reference

Griffin-Lim

Reconstruction

U(x)

P(x)

MSE

MS-SSIM

NLPD

Reference audio (16kHz, mono)

Reference mel-spectrogram reconstructed with Griffin-Lim

Autoencoded mel-spectrograms reconstructed with Griffin-Lim

Autoencoder trained on uniform noise

Autoencoder trained on MusicCaps dataset

Autoencoder output (trained w. Mean Squared Error)

Autoencoder output (trained w. Multiscale Structural Similarity)

Autoencoder output (trained w. Normalized Laplacian Pyramid Distance)

See paper for details on mel-spectrogram, Griffin-Lim and autoencoder parameters.

Acoustic

Reference

acoustic-Original.wav

acoustic.png

Griffin-Lim

acoustic-GL.wav

U(x) - MSE

acoustic-MSE-noise.wav

acoustic-MSE-noise.png

P(x) - MSE

acoustic-MSE-audio.wav

acoustic-MSE-audio.png

U(x) - MS-SSIM

acoustic-SSIM-noise.wav

acoustic-SSIM-noise.png

P(x) - MS-SSIM

acoustic-SSIM-audio.wav

acoustic-SSIM-audio.png

U(x) - NLPD

acoustic-NLPD-noise.wav

acoustic-NLPD-noise.png

P(x) - NLPD

acoustic-NLPD-audio.wav

acoustic-NLPD-audio.png

Blues

Reference

blues-Original.wav

blues.png

Griffin-Lim

blues-GL.wav

U(x) - MSE

blues-MSE-noise.wav

blues-MSE-noise.png

P(x) - MSE

blues-MSE-audio.wav

blues-MSE-audio.png

U(x) - MS-SSIM

blues-SSIM-noise.wav

blues-SSIM-noise.png

P(x) - MS-SSIM

blues-SSIM-audio.wav

blues-SSIM-audio.png

U(x) - NLPD

blues-NLPD-noise.wav

blues-NLPD-noise.png

P(x) - NLPD

blues-NLPD-audio.wav

blues-NLPD-audio.png

Classical

Reference

classical-Original.wav

classical.png

Griffin-Lim

classical-GL.wav

U(x) - MSE

classical-MSE-noise.wav

classical-MSE-noise.png

P(x) - MSE

classical-MSE-audio.wav

classical-MSE-audio.png

U(x) - MS-SSIM

classical-SSIM-noise.wav

classical-SSIM-noise.png

P(x) - MS-SSIM

classical-SSIM-audio.wav

classical-SSIM-audio.png

U(x) - NLPD

classical-NLPD-noise.wav

classical-NLPD-noise.png

P(x) - NLPD

classical-NLPD-audio.wav

classical-NLPD-audio.png

Country

Reference

country-Original.wav

country.png

Griffin-Lim

country-GL.wav

U(x) - MSE

country-MSE-noise.wav

country-MSE-noise.png

P(x) - MSE

country-MSE-audio.wav