| description abstract | Due to the progress in deep learning technology, techniques that generate spoofed speech have
significantly emerged. Such synthetic speech can be exploited for harmful purposes, like imper-
sonation or disseminating false information. Researchers in the area investigate the useful fea-
tures for spoof detection. This paper extensively investigates three problems in spoof detection in
speech, namely, the imbalanced sample per class, which may negatively affect the performance of
any detection models, the effect of the feature early and late fusion, and the analysis of unseen
attacks on the model. Regarding the imbalanced issue, we have proposed two approaches (a
Synthetic Minority Over Sampling Technique (SMOTE)-based and a Bootstrap-based model). We
have used the OpenSMILE toolkit, to extract different feature sets, their results and early and late
fusion of them have been investigated. The experiments are evaluated using the ASVspoof 2019
datasets which encompass synthetic, voice-conversion, and replayed speech samples. Addition-
ally, Support Vector Machine (SVM) and Deep Neural Network (DNN) have been adopted in the
classification. The outcomes from various test scenarios indicated that neither the imbalanced
nature of the dataset nor a specific feature or their fusions outperformed the brute force version of
the model as the best Equal Error Rate (EER) achieved by the Imbalance model is 6.67 % and 1.80
% for both Logical Access (LA) and Physical Access (PA) respectively. | en_US |