A New Descriptor Selection Scheme for SVM in Unbalanced Class Problem: A Case Study Using Skin Sensitisation Dataset
Public Domain
-
2007/07/01
-
Details
-
Personal Author:
-
Description:A novel descriptor selection scheme for Support Vector Machine (SVM) classification method has been proposed and its utility demonstrated using a skin sensitisation dataset as an example. A backward elimination procedure, guided by mean accuracy (the average of specificity and sensitivity) of a leave-one-out cross validation, is devised for the SVM. Subsets of descriptors were first selected using a sequential t-test filter or a Random Forest filter, before backward elimination was applied. Different kernels for SVM were compared using this descriptor selection scheme. The Radial Basis Function (RBF) kernel worked best when a sequential t-test filter was adopted. The highest mean accuracy, 84.9%, was obtained using SVM with 23 descriptors. The sensitivity and the specificity were as high as 93.1% and 76.6%, respectively. A linear kernel was found to be optimal when a Random Forest filter was used. The performance using 24 descriptors was comparable with a RBF kernel with a sequential t-test filter. As a comparison, Fisher's linear discriminant analysis (LDA) under the same descriptor selection scheme was carried out. SVM was shown to outperform the LDA. [Description provided by NIOSH]
-
Subjects:
-
Keywords:
-
ISSN:1062-936X
-
Document Type:
-
Genre:
-
Place as Subject:
-
CIO:
-
Division:
-
Topic:
-
Location:
-
Pages in Document:423-441
-
Volume:18
-
Issue:5
-
NIOSHTIC Number:nn:20032635
-
Citation:SAR QSAR Environ Res 2007 Jul; 18(5-6):423-441
-
Contact Point Address:S. Li, Health Effects Laboratory Division, National Institute for Occupational Safety and Health, Morgantown, WV 26505
-
Email:swl4@cdc.gov
-
Federal Fiscal Year:2007
-
Peer Reviewed:True
-
Source Full Name:SAR and QSAR in Environmental Research
-
Collection(s):
-
Main Document Checksum:urn:sha-512:92e8372aec277adbf965d0f7237dcd5907ca025b23a76ade165fac2d03857ae36946faf8dcca8e2ffc12178a6fd76c3a98a72ed12f41c4cda8465d4534d122f7
-
Download URL:
-
File Type:
ON THIS PAGE
CDC STACKS serves as an archival repository of CDC-published products including
scientific findings,
journal articles, guidelines, recommendations, or other public health information authored or
co-authored by CDC or funded partners.
As a repository, CDC STACKS retains documents in their original published format to ensure public access to scientific information.
As a repository, CDC STACKS retains documents in their original published format to ensure public access to scientific information.
You May Also Like