From the paper Data is Overrated: Perceptual Metrics Can Lead Learning in the Absence of Training Data. The following are examples from the test set (Perceived Music Quality Dataset).
Reference
Griffin-Lim
Reconstruction
U(x)
P(x)
MSE
MS-SSIM
NLPD
Reference audio (16kHz, mono)
Reference mel-spectrogram reconstructed with Griffin-Lim
Autoencoded mel-spectrograms reconstructed with Griffin-Lim
Autoencoder trained on uniform noise
Autoencoder trained on MusicCaps dataset
Autoencoder output (trained w. Mean Squared Error)
Autoencoder output (trained w. Multiscale Structural Similarity)
Autoencoder output (trained w. Normalized Laplacian Pyramid Distance)
See paper for details on mel-spectrogram, Griffin-Lim and autoencoder parameters.
Reference
Griffin-Lim
U(x) - MSE
P(x) - MSE
U(x) - MS-SSIM
P(x) - MS-SSIM
U(x) - NLPD
P(x) - NLPD
Reference
Griffin-Lim
U(x) - MSE
P(x) - MSE
U(x) - MS-SSIM
P(x) - MS-SSIM
U(x) - NLPD
P(x) - NLPD
Reference
Griffin-Lim
U(x) - MSE
P(x) - MSE
U(x) - MS-SSIM
P(x) - MS-SSIM
U(x) - NLPD
P(x) - NLPD
Reference
Griffin-Lim