This article proposes a framework to compare the encoding of spoken language representations in the human brain and in artificial neural networks. The proposed technique is based on the same principle that underlies electroencephalography (EEG) and allows for a direct comparison of responses to a phonetic property in the brain and in deep neural networks without applying any transformations. Results from eight trained networks, including a replication experiment, show substantial similarities in peak latency encoding between the human brain and intermediate convolutional networks. The proposed technique can be used to compare encoding between the human brain and intermediate convolutional layers for any acoustic property and for other neuroimaging techniques.
