The document discusses several papers on using deep learning techniques for speaker recognition and identification. It describes using convolutional neural networks with spectrograms as input to identify speakers and cluster them without prior identity knowledge. It also discusses using BLSTM recurrent neural networks for polyphonic sound event detection and spoofing detection. An end-to-end attention model with CNNs and temporal pooling is presented for text-dependent speaker verification. Embedding's from deep neural networks are investigated as an alternative to i-vectors for text-independent speaker verification. Related research applying CNNs, DNNs and BLSTM RNNs to speaker recognition tasks is also cited.
Related topics: