|
Indian Journal of Medical Informatics. 2009; 4(1): 5 |
||||||||
ArticleFeature Subset Selection using Nomogram in Type II Diabetes Databases Sarojini Balakrishnan1, Ramaraj Narayanasamy2 and Nickolas Savarimuthu3 1PhD Research Scholar, Department of Computer Science,Mother Teresa Women's University, Kodaikanal. India. Professor, Department of MCA, KLN College of Information Technology, Madurai. India. e-mail: balakrishnan.sarojini@gmail.com 2Principal & Professor, Department of Computer Science & Engineering, GKM College of Engineering & Technology, Chennai. India. e-mail: ramaraj_tce@yahoo.co.in 3Assistant Professor, Department of Computer Applications, National Institute of Technology, Tiruchirappalli. India. e-mail: nickolas@nitt.edu |
||||||||
|
Abstract: Advancement in data mining and machine learning has promoted computer-based approaches such as Computer-aided diagnosis, Expert systems and Prognostic studies in medical applications. Medical data are processed and analyzed using data mining techniques to derive useful knowledge. These data are multidimensional, and represented by a large number of features. The irrelevant and redundant features among them may negatively impact the performance of the data mining algorithms. Feature selection identifies the features that improve the predictive accuracy of the classifiers. The proposed work focuses on identifying the significant features that influence the predictive accuracy of the Naïve Bayes Classifier using the visualization tool, Nomogram. The effect of each feature on the performance of the classifier is analyzed using nomogram and an optimal feature subset that enhances the predictive accuracy is derived. The proposed method, Nomogram-RFE, is experimented with Pima Indians Diabetes Dataset and the performance of the classifier is evaluated on five criteria: classification accuracy, sensitivity, specificity, the area under the receiver operating characteristic and Brier score. The experimental results show that the optimal feature subset derived enhances the predictive power of a classifier and reduces false positive and false negative rates as measured by the sensitivity and specificity of the classifier. A low Brier score for the optimal feature subset indicates lower deviation between the predicted probability and the actual outcome. Keywords:Feature selection; nomogram; Naïve Bayes Classifier; accuracy; sensitivity; specificity; AUC; Brier score |
1. IntroductionThe digital revolution has made it possible to inexpensively collect and store large amounts of patient data in databases containing rich medical information. The data gathered is generally collected as a result of patient-care activity to benefit the individual patient and as a result, medical databases may contain data that is redundant, incomplete, imprecise or inconsistent. So, it is difficult for physicians to spot patterns in these data that may help in making clinical decisions. The development of various machine learning tools to deal with these characteristics of medical datasets for accurate medical diagnosis and prediction was frequently motivated [1]. Data mining provides algorithms and tools for identifying valid, novel, potentially useful, and ultimately understandable patterns from data [2]. The discovered patterns may represent valuable knowledge that could lead to medical discoveries, for example, identification of combinations of features that lead to diagnosis of the disease. The application of data mining in medicine has been proved in the areas of diagnosis, prognosis and treatment [3]. Studies show that improved medical diagnosis and prognosis may be achieved through automatic analysis of patient data stored in medical records i.e. by learning from past experiences [4]. As the healthcare environment has become more reliant on computer technology, the use of data mining techniques can aid physicians to make accurate diagnosis. Patient data collected for diagnosis and prognosis typically comprises of clinical and laboratory parameters as well as results of particular investigations, specific to a given medical problem. Data from medical sources are voluminous and heterogeneous in nature. The so-called "curse of dimensionality" pertinent to many learning algorithms, denotes the drastic rise of computational complexity and classification error with datasets having large number of dimensions [5]. The inclusion of useless or irrelevant features causes the predictive performance of the classifier to decline [2]. As the medical data comprises rather excessive number of features, special attention was given to the problem of selecting only the few relevant features that have the greatest influence on the outcome of prediction. A small subset of features may carry enough information to construct reasonably accurate predictive models. Feature selection is one of the frequently used data preprocessing techniques for data mining to identify the influential features for the problem at hand [6]. Feature selection techniques extract informative features from medical data bases to facilitate medical decision making in an effective and efficient way and to provide new insight into the relation of symptoms, clinical data and findings, disease states, diagnosis and treatment. Integrating feature selection technique in machine learning systems helps in accurate medical diagnosis [7, 8]. There are various feature selection methods to rank the importance of input features and thereby enhance classifier performance by eliminating the irrelevant features. Most of those methods are mainly concerned with the ranking of input features, rather than about the insights of each feature. However, in a practical clinical situation, physicians may wish to correlate the actual effect of each feature on the results: an interpretation about how the prediction result would change if a feature's value were to change. Jakulin et al. introduced a nomogram approach for visualizing SVMs that can graphically expose its internal structure and visualize the effect of each feature by means of the log odds ratio, such as logistic regression [9]. Naïve Bayes, as a classifier, learns from training data from the conditional probability of each attribute given the class label. It uses Bayes rule to compute the probability of the classes given the particular instance of the attributes. Prediction of the class is done by identifying the class with the highest posterior probability. Naïve Bayes as a standard classification method in machine learning, stems partly because it is easy to program, it is intuitive, it is fast to train and can easily deal with missing attributes. Research shows Naïve Bayes performs well in spite of strong dependencies among attributes [10]. In this paper, we use a well-established visualization technique called nomograms to visualize Naïve Bayes Classifier. Nomogram gives the insight of a model by visualizing the effect of each feature on prediction. Nomograms are used to assess the probability of the observed outcome, where the effects of the attributes are independent given the class and are added up to form the final prediction. The Naïve Bayesian nomogram reveals the structure of the model and relative influences of the feature values to the class probability. The visualization can be used for exploratory analysis of features and to determine the effects of individual features to class probabilities [11]. The proposed work focuses on how the naïve Bayesian nomogram can be used for the feature selection process. We have shown that a small subset of features might carry enough information to construct accurate predictive models. The paper is organized as follows: Section II discusses related research work. Section III describes the Dataset. Section IV describes the problem and in Section V experimental results and analysis are presented. Section VI presents comparative study of the proposed approach with other feature selection methods that use Naïve Bayes Classifier on the same dataset. 2. Related Work Custom made tools are necessitated by the enormous growth of medical information systems. The data explosion makes information extraction a very hard task and decision support a nightmare [4]. Many studies in health informatics literature have investigated the effectiveness of the clinical decision support systems and concluded that these systems are indeed helpful [12] for effective medical diagnosis. It is proved that the successful implementation of machine learning methods can help the integration of computer-based systems in the healthcare environment providing opportunities to facilitate and enhance the work of medical experts and ultimately to improve the efficiency and quality of medical care [13]. Machine learning offers various methods for efficient construction of descriptive models from data. At its best, the modern modeling technique should offer both: accuracy of the developed model and transparency so that the decision maker knows why and how the model derives the decision. Machine learning methods may support both perspectives, and are as such increasingly used in developing fields of Intelligent Medical Data Analysis [14] and Medical Data Mining [15]. Moustakis and Charissis' work [16] surveyed the role of machine learning in medical decision making and provided an extensive literature review on various machine learning applications in medicine that could be useful to practitioners interested in applying machine learning methods to improve the efficiency and quality of decision making systems in medical applications. Data mining techniques have been applied to a variety of medical domains to improve medical decision making [17]. Predicting breast cancer survivability using data mining techniques [18], application of data mining to discover subtle factors affecting the success and failure of back surgery which led to improvements in care [19], data mining classification techniques for medical diagnosis decision support in a clinical setting [20] and the techniques of data mining used to search for relationships in a large clinical database [21]. A comparison of different learning models used in Medical Data Mining and a practical guideline to select the best suited algorithm for a specific medical application is found in [4]. Feature selection has been an active and fruitful field of research and development for decades in statistical pattern recognition [22], machine learning [23, 24], data mining [25] and statistics [26]. It was proven in both theory and practice that it is effective in enhancing learning efficiency, increasing predictive accuracy, and reducing complexity of learned results [27]-[ 29]. Let G be some subset of the feature set F and fG be the value vector of G. In general, the goal of feature selection can be formalized as selecting a minimum subset G such that P(C|G = fG) is equal or as close as possible to P(C|F = f), where P(C|G = fG) is the probability distribution of different classes given the feature values in G and P(C|F = f ) is the original distribution given the feature values in F [28] Several feature selection techniques have been proposed in the literature, including some important survey on feature selection algorithms such as Molina et al. [30] and Guyon and Elisseeff [31]. Many researchers are involved in studying various important aspects of feature selection, such as the goodness of a feature subset, while determining an optimal one [32]. Different feature selection methods can be broadly categorized as the wrapper model [33] and the filter model [23]. The wrapper model uses the predictive accuracy of a predetermined learning algorithm to determine the goodness of the selected subset. The most important point about this method is the higher computational cost [33]. According to Liu and Motoda [32] the filter model "separates feature selection from classifier learning and selects feature subsets that are independent of any learning algorithm. It relies on various measures of the general characteristics of training data such as distance, information dependency and consistency". Research has shown that the wrapper model outperforms the filter model, on comparing the predictive power on unseen data [34]. According to the availability of class labels, there are feature selection methods for supervised learning [35] as well as unsupervised learning [36]. Existing feature selection methods mainly exploit two approaches: individual feature evaluation and subset evaluation [29, 31]. Individual evaluation methods rank features according to their importance in differentiating instances of different classes and can only remove irrelevant and redundant features like similar rankings. Methods of subset evaluation search for a minimum subset of features that satisfies some goodness measure and can remove irrelevant features as well as redundant ones [37]. Many authors have reported improvement in the performance of the classifier when feature selection algorithms are used [38 - 40]. In many pattern recognition applications, identifying the most characteristic features of the observed data, i.e., feature selection [33], [41]-[44] is critical to minimize the classification error. Medical data suffers from peculiar problems of redundancy as some prescribed tests may contain common components. Further X-ray, MRI, CT-scan and other images form an integral part of medical data making the data voluminous. Some of the observations made by physicians are subjective. All these factors lead to irrelevant and redundant data which impacts the diagnosis, prognosis, and treatment of the patient. A high degree of predictive accuracy is expected in the field of medicine. The Prediction accuracy of any data mining technique is based on the quantity and quality of the data [45]. In a real-world environment, there are many possible reasons why the inaccurate or inconsistent data occur in a database, e.g., equipment mal functioning, the deletion of data instances (or records) due to the inconsistency with other recorded data, not entering data due to misunderstanding, considering the data as an unimportant at the time of entry, etc. the As a result, medical databases may contain data that is redundant, incomplete, imprecise or inconsistent, which may affect the performance of the data mining techniques [18]. It is a well accepted fact that data preprocessing is important for effective data mining [7]. Feature selection in medical data mining is appreciable as the diagnosis of the disease could be done in this patient-care activity with minimum number of features while still maintaining or even enhancing accuracy [46]. As discussed by Pechenizkiy[47], classification have been applied successfully in a number of medical applications like localization of a primary tumor, prognostics of recurrence of breast cancer, diagnosis of thyroid diseases, and rheumatology [48]. A legitimate way of evaluating the feature subset is through the error rate/accuracy of the classifier being designed [40]. The classification error rate/accuracy is used as a performance indicator for a mining task, for a selected feature subset; simply conduct the "before-and-after" experiment to compare the error rate/accuracy of the classifier learned on the full set of features and that learned on the selected subset [7]. Recursive Feature Elimination (RFE) approach is an iterative procedure to remove non-discriminative features in binary classification problem and has been proved to be one of the most suitable feature selection methods by extensive experiments [7]. The performance of a classifier can be visualized by using a Receiver Operating Characteristic (ROC) curve [19] and can be compared using the area under the ROC curve, abbreviated AUC [21]. The Naive Bayes Classifier finds its origins in the Bayesian theory of probability. Bayesian networks and classifiers started to appear in the research arena between 1980 and 1990 as applications for machine learning [49][Pearl, 1988]. As quoted by Domingos, 1997 [10], "in summary, the Bayesian Classifier has much broader applicability than previously thought. Since it also has advantages in terms of simplicity, learning speed, classification speed, storage space and incrementality, its use should perhaps be considered more often". Naive Bayes has proven effective in many practical applications, including text classification, medical diagnosis, and systems performance management [10, 50, 51]. Visualization is the easiest and perhaps the most effective way to present the structure of a data set. Nomograms were invented by French mathematician Maurice d'Ocagne in 1891 to graphically represent a class of mathematical functions. Nomograms are not an uncertain novelty, but a milestone in the history of visualization [52]. To visualize a logistic regression model, the use of nomograms was first proposed by Lubsen and coauthors [53]. With an excellent implementation of logistic regression nomograms in S-Plus and R statistical packages by Harrell [54], the idea has recently been picked up and nomograms have been used much to present probabilistic classification models in clinical medicine and oncology, for instance [55]. Naive Bayesian Classifier too can be visualized in the form of a nomogram [11]. Naive Bayesian nomograms, as presented in this paper, are a visualization technique extended from logistic regression [56,57]. Kononenko [56] realized that besides good predictive performance, Naïve Bayes Classifiers can be used to explain to what degree and how, attribute values influence the class probability when classifying an example. He showed that Naïve Bayes Classifier could be written as a sum of information contributed by each attribute, where a contribution of a feature value fi is log2 P(c| fi) - log2 P(c). Compared to present state-of-the-art visualization of Naïve Bayesian Classifier by means of pie charts and bars implemented through Evidence Visualizer within a data mining suite MineSet [57], nomograms use simpler graphical presentation, are easier to read, and may be used to predict outcome probabilities without the computer or calculator. The research work attempted for feature selection using SVM-Nomogram is found in [58, 59]. 3. Dataset The data set used for the experiments was obtained from the UCI Machine Learning Repository [60]. The dataset was configured from a large collection of data held by the National Institutes of Diabetes and Digestive and Kidney Diseases. All patients in the dataset are Pima-Indian women with the age of at least 21 years old and living near Phoenix, Arizona, USA. The target class `1' represents a positive test for diabetes and `0' represents a negative test for diabetes. The dataset includes 768 complete cases in which 268 (34.9%) cases are in class `1' and 500 (65.1%) cases are in class `0'. Each case is described by eight clinical findings namely: number of times pregnant, glucose tolerance test, diastolic blood pressure, triceps skin fold thickness, 2-hour serum insulin, body mass index, diabetes pedigree function and age. All features are of numeric data type. 4. Methodology The objective of this research
work is to find an optimal feature subset that greatly influences the
predictive accuracy of the Naïve Bayes Classifier (NBC). In
the paper, we propose a method to visualize the Naïve Bayes
Classifier that clearly exposes the quantitative information on the
effect of feature values to class probabilities. The method can be used
both to reveal the structure of the NBC model, and to support the
prediction. The NBC model is visualized in the form of a nomogram. The
nomogram is used to visualize the probabilistic predictions of the
Naïve Bayes Classifier without losing any information.
Identifying the significant features for decision making will enhance
the medical diagnosis in an effective and efficient way. The
Naïve Bayesian nomogram is mathematically derived below [11]. The individual contribution of each known attribute value in the Nomogram is equal to log OR(ai), and the sum of individual scores corresponds to F(C|X). For the purpose of feature selection, the features that have low effects on prediction output (i.e., short length of the line) given by low Log OR scores are eliminated using Recursive Feature Elimination approach and the performance of the classifier is observed. The process is repeated until the optimal feature subset that provides the best predictive performance is obtained. The approach is validated in terms of increase in the accuracy of the classifier and an improved Area Under ROC (Receiver Operating Characteristic curve) AUC. 5. Experiment Results and AnalysisThe experiments were performed using the orange data mining suite [61] on the PIMA Indians Diabetes dataset. The performance of the Naïve Bayes Classifier is evaluated using 10-fold cross validation test. We employed the Recursive Feature Elimination method for feature selection. Figure 1 shows nomogram visualization of Naïve Bayes Classifier on class 1 or `yes' instances. In the nomogram, the span of an attribute axis identifies the important attributes. The more important the feature is, greater is the length of the line. Using the Log OR line, one can easily see how much each feature influences the target probability class `yes'. When the Log OR score of a feature is high, it gives greater positive effect on the probability. Moreover, longer features on the nomogram will have a wider range of Log OR score and thus have stronger effects on the target prediction probability. Attribute axis is aligned to zero-point influence (prior probability), which allows for a straightforward comparison of contributions across different values and attributes. The zero-point influence line vertically splits the nomogram to the right (positive) and left (negative) part. The visualized class `yes' is characterized with the attribute values on the right, whereas the class `no' is characterized with values presented on the left side of the nomogram. Accordingly, the values farthest from the center are the most influential class indicators. Therefore, a feature selection method based on the nomogram determines more important features according to the lengths of the lines. The Log OR line shows how much each feature has impact on the Log OR sum and on the target probability class `yes'. Figure 1 presents the sorted order of positive influence of the features on the prediction output. It is observed from Figure 1, that the features Glucose tolerance test, Body mass index and age have higher Log OR scores compared to other features. As it can be seen from Table I the performance of the classifier improves on the removal of features with low Log OR scores. This wrapper based feature selection approach considers the accuracy of the classifier as a primary performance indicator and the feature subset that gives the maximum predictive accuracy is chosen as the optimal feature subset. The experimental results show that the classifier attains the maximum performance on removal of the least five influential features. The optimal feature subset consists of the features {Glucose tolerance test, Body mass index, age}. The performance of the classifier is evaluated on five criteria: classification accuracy, sensitivity, specificity, the area under the receiver operating characteristic, outcome probability estimation as measured by Brier score. The accuracy of the classifier is the measure of how effectively the classifier identifies the true value of the class label. It is evident from Table I that the classification accuracy of the classifier increases on the removal of least impact features. The best classification accuracy for the dataset is obtained after the removal of the least five influential features on prediction output `yes'. The accuracy of the classifier for the complete set of features is 0.7590 and for the optimal feature
subset is 0.7790. The classification accuracy of the classifier shows a significant improvement of 2%. The performance of the classifier deteriorates after that. For medical applications, Sensitivity (positive hit rate) and Specificity (negative hit rate) are more frequently used than the classification accuracy. The Sensitivity measures the proportion of actual positives which are correctly identified as such (i.e. the percentage of sick people who are identified as having the disease); and the specificity measures the proportion of actual negatives which are correctly identified (i.e. the percentage of healthy people who are identified as not having the disease). It is observed from Table-II, the value of specificity deteriorates where as sensitivity increases on the removal of each least influential feature. In theory, sensitivity and specificity are independent in the sense that it is possible to achieve 100 % in both (for instance, a human classifying the black and yellow balls most likely does). In practice, there often is a trade-off, as both can't be achieved. This is because much of the characteristics identified to determine whether a sample gives a positive or negative test may not be as obvious as black or yellow colors. Sensitivity and specificity are inversely proportional, meaning that as sensitivity increases, specificity decreases and vice versa [20]. The tradeoff
subset is 0.7790. The classification accuracy of the classifier shows a significant improvement of 2%. The performance of the classifier deteriorates after that. For medical applications, Sensitivity (positive hit rate) and Specificity (negative hit rate) are more frequently used than the classification accuracy. The Sensitivity measures the proportion of actual positives which are correctly identified as such (i.e. the percentage of sick people who are identified as having the disease); and the specificity measures the proportion of actual negatives which are correctly identified (i.e. the percentage of healthy people who are identified as not having the disease). It is observed from Table-II, the value of specificity deteriorates where as sensitivity increases on the removal of each least influential feature. In theory, sensitivity and specificity are independent in the sense that it is possible to achieve 100 % in both (for instance, a human classifying the black and yellow balls most likely does). In practice, there often is a trade-off, as both can't be achieved. This is because much of the characteristics identified to determine whether a sample gives a positive or negative test may not be as obvious as black or yellow colors. Sensitivity and specificity are inversely proportional, meaning that as sensitivity increases, specificity decreases and vice versa [20]. The tradeoff between sensitivity and specificity, as well as the performance of the classifier, can be visualized and studied using the ROC curve. It is purely acceptable as well as desirable in medical domain, that the classifier correctly classify positive instances but it is appreciable if the classifier reduces the false negatives. In other words a healthy patient may be identified as sick but not a sick patient as healthy, which makes the medical diagnosis a failure. Hence it makes sense to sacrifice the precision of positive classifications in exchange for improving the precision of negative classifications. The diagonal line divides the ROC space in areas of good or bad classification/diagnostic. Points above the diagonal line indicate good classification results, while points below the line indicate wrong results. The default threshold (0.5) point shows the point on the ROC curve achieved by the classifier if it predicts the target class if its probability equals or exceeds 0.5. From Figure 2 and 3, it can be seen that the probability of the classifier is only 0.286 for the whole set of features and for the optimal feature set it is 0.555. Table-I shows that the AUC is high for the optimal feature subset compared to whole feature set. The ROC curves before and after feature selections are shown in the Figures 2 and 3 respectively. The Brier score measures the accuracy of a set of probability assessments, proposed by Brier (1950) [62]; is the average deviation between predicted probabilities for a set of events and their actual outcomes. A lower Brier score means lower deviation. A lower Brier score represents higher accuracy. Table - I shows a lower Brier score of 0.3072 for the optimal feature subset. 6. Comparative AnalysisA lot of research work has been attempted to select the informative features of the Pima dataset. For comparison we consider only the wrapper feature selection methods that use naïve Bayesian classifier to evaluate the quality of the selected feature subset. John and Kohavi used Naïve Bayes classifier with three different search strategies: Hill climbing, Best First Search-backward and Best First Search-forward [33]. Ranjit Abraham Proposed CHI-WSS [63] that simplifies the wrapper approach based feature selection by reducing the feature dimensionality through the elimination of irrelevant and least relevant features using CHI-squared statistics. The other wrapper approaches CFS Correlation Feature selection method, Wrapper Subset feature selection and Consistency-based subset feature selection method are found in the literature. ANOVA and Functional Networks Feature Selection method is a wrapper method based on functional networks and analysis of variance decomposition [64]. The feature selection method, SVM Ranking with backward search [65], ranks the attributes based on feature relevance weight using Support Vector Machine attribute evaluation and use backward elimination approach to find the optimal feature subset. The proposed method, Nomogram-RFE, is compared with other wrapper based feature selection methods. All these feature selection methods use 10-fold cross validation method to evaluate the accuracy of the classifier. Table -II shows the comparative analysis based on the accuracy of the classifier and percentage of feature reduction. It is evident from the table the listed feature selection methods improve the accuracy of the classifier after feature selection. The feature selection method CHI-WSS records highest accuracy after feature selection. It improves the accuracy of the classifier by 3.5% and achieves a feature reduction of 50%. The proposed method Nomogram-RFE performs next to CHI-WSS. It improves the accuracy of the classifier by 2% and achieves a feature reduction of 62.5%. Reduce in the number of features mean reduce in the processing time and diagnosis with less number of features. Further the performance of these two methods may be compared based on AUC, the Area under the ROC curve, as AUC is a statistically consistent and more discriminating measure than accuracy measure [66]. The AUC of CHI-WSS is 84.34[47] whereas Nomogram-RFE produces AUC of 84.71. Two other approaches suggested by Kemal report higher classification accuracy for the same dataset. The new cascade learning system combining the Generalized Discriminant Analysis and Least Square Support Vector Machine report a classification accuracy of 79.16%. When comparing the performance before and after GDA-LS-SVM, it is observed that the performance of the classifier improves by 0.95% after preprocess [67]. The second method combining ANFIS with PCA, the feature extraction method, reduce the number of features to four and attain a classification accuracy of 89.47% [68]. Both these approaches use feature extraction methods to reduce the dimensionality of the dataset. Feature extraction transforms the original feature space to lower dimensionality. Although it reduces the dimensionality, the number of features that must be measured remains the same where as feature selection, directly reduces the number of original features by selecting a subset of them that still retains sufficient information for classification. It is observed from the study that the application of feature selection using nomogram improve the performance of the Naïve-Bayes Classifier by 2% with the feature reduction of 62.5%. 7. ConclusionIn the words of Colin Ware - "one of the greatest benefits of data visualization is the sheer quantity of information that can be rapidly interpreted if it is presented well" [69]. In this paper, we propose a method for the visualization of a Naïve Bayes Classifier that clearly exposes the quantitative information on the effect of attribute values on class probabilities. The main benefit of this approach is simple and clear visualization of the complete model and the quantitative information it contains. The visualization of the effects of features on predicted output helps in the feature selection process. The features, which are less influential on the predicted output, are removed and an optimal feature subset that enhances the predictive accuracy of the classifier is obtained. This nomogram visualization technique along with the feature selection process will be of use to medical practitioners in medical diagnosis with less number of tests (features). Using this approach a feature reduction of 62.5% is achieved and the classification accuracy is improved by 2%. In medical domain reduction in the number of features means reduction in the number of clinical measures to be made and it helps physicians to diagnose the disease with less number of more discriminating features. The main focus of this research is to identify significant features influencing or characterizing if an individual has diabetes or not. As a future work, the approach can be extended on other classifiers also and the performance may be analyzed. References:1. Brakto I and Kononneko I. Learning diagnostic rules from incomplete and noisy data. In Phelps B, Ed. Artificial Intelligence methods and Statistical methods. Hampshire: Technical Press, 1987.2. Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R. Eds. Advances in Knowledge Discovery and Data Mining. Cambridge, MA: MIT Press, 1996. 3. Dilly,Ruth.DataMining.2002 http://www.pcc.qub.ac.uk/tec/courses/datamining/stu_notes /dm_book_1.html. 4. Lavrac N. Selected techniques for data mining in medicine. Artif Intell Med. 1999; 16: 3-23 . 5. Bellman R. Adaptive Control Processes: A Guided Tour: Princeton University Press, 1961. 6. Liu H and Motoda H. Feature Extraction, Construction and Selection: A Data Mining Perspective. Boston: Kluwer Academic Publishers, 1998. 7. Liu H and Yu L. Feature Selection for Data Mining. Intelligent Data Analysis.An International Journal. 1997; 1: 131_156. 8. Kononenko I, Bratko I and Kukar M. Application of machine learning to medical diagnosis. In R.S.Michalski, I.Bratko, and M.Kubat (eds.): Machine Learning, Data Mining and Knowledge Discovery: Methods and Applications: John Wiley & Sons, 1997. 389-408. 9. Jakulin A, Mozina M, Demsar J, Bratko I and Zupan B. Nomograms for visualizing support vector machines. In Proc. of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. Association for Computing Machinery, New York, 2005: 108_117. 10. Domingos P and Pazzani M. On the optimality of the simple Bayesian Classifier under zero-one loss. Machine Learning, 29:103-130 11. Martin Mozina, Janez Demsar, Michael Kattan, Blaz Zupan. Nomograms for Visualization of Naive Bayesian Classifier. In Proc of the European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD). Pisa, Italy, 2004: 337_348. 12. Cynthia EWK, Farquhar M, Slutsky JR. Clinicians' attitudes to clinical practice guidelines. Med J Aust. 2002; 177: 502-506. 13. Magoulas GD and Prentza A. Machine learning in medical applications, 1999: http://www.dcs.bbk.ac.uk/~gmagoulas/ACAI99_workshop.pdf . 14. Lavrac N, Keravnou E, Zupan B. Eds. Intelligent data analysis in medicine and pharmacology. Boston: Kluwer Academic Publishers, 1997. 15. Zupan B, Lavrac N, Keravnou E. Data mining techniques and applications in medicine. Artif Intell Med. 1999; 16:1-2. 16. Moustakis V and Charissis G. Machine learning and medical decision making. In Proc of Workshop on Machine Learning in Medical Applications. Greece, 1999:1-19. 17. Kononeko I and Kukar M. Machine learning for medical diagnosis. In Proc of workshop on Computer-Aided Data Analysis in Medicine. Ljubljana: IJS Scientific Publishing, 1995. 18. Dursun Delen, Glenn Walker, Amit Kadam. Predicting breast cancer survivability: a comparison of three data mining methods. Artif Intell Med. 2005; 34:113-27. 19. Hedberg SR. The data gold rush. Byte. 1995: 83-88. 20. Herron P. Machine Learning for Medical Decision Support: Evaluating Diagnostic Performance of Machine Learning classification Algorithms. Data Mining: Spring, 2004. 21. Prather JC, Lobach DF, Goodwin LK, Hales JW, Hage ML, Edward Hammond W. Medical Data Mining: Knowledge Discovery in a Clinical Data Warehouse. In Proc of AMIA Annu Fall Symp. 1997:101-105. 22. Mitra P, Murthy CA, Pal SK. Unsupervised feature selection using feature similarity. In Proc IEEE Transactions on Pattern Analysis and Machine Intelligence. 2002; 24: 301_312. 23. Liu H, Motoda H, Yu L. Feature selection with selective sampling. In Proc of the Nineteenth International Conference on Machine Learning. Sydney, Australia, 2002: 395_402 24. Robnik Šikonja M and Kononenko I. Theoretical and empirical analysis of Relief and RreliefF. Machine Learning. 2003; 53:23_69. 25. Dash M, Choi K, Scheuermann P, Liu H. Feature selection for clustering _ a filter solution. In Proc of the Second International Conference on Data Mining. 2002:115_122. 26. Miller A. Subset Selection in Regression: Chapman and Hall/CRC, 2edition, 2002. 27. Almuallim H and Dietterich TG. Learning boolean concepts in the presence of many irrelevant features. Artificial Intelligence. 1994; 69: 279_305. 28. Koller D and Sahami M. Toward optimal feature selection. In Proc. of the thirteenth International Conference on Machine Learning.1996: 284_292. 29. Blum AL and Langley P. Selection of relevant features and examples in machine learning. Artificial Intelligence. 1997; 97: 245-271 30. Molina LC, Belanche L and Nebot A. Attribute Selection Algorithms: A survey and experimental evaluation. In Proc IEEE's KDD. 2002; 306-313 31. Guyon I and Elisseeff A. An Introduction to Variable and Feature Selection. Journal of Machine Learning Research. 2003; 3: 1157-1182 32. Liu H and Motoda H. Feature Selection for Knowledge Discovery and Data Mining. Boston: Kluwer Academic Publishers, 1998. 33. Kohavi R and John GH. Wrappers for feature subset selection. Artificial Intelligence.1997; 97: 273-324 34. John GH, Kohavi R, Pfleger K. Irrelevant features and the subset selection problem. In Proc of the Eleventh International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers, 1994. 121-129. 35. Yu L and Liu H. Feature Selection for High-Dimensional Data: A Fast Correlation-based Filter Solution. In Proc ICML, 2003: 856-863. 36. Dash M, Choi K, Scheuermann P, Liu H. Feature selection for clustering _- a filter solution. In Proc Second International Conference on Data Mining. 2002: 115_122 37. Yu L and Liu H. Efficient feature selection via analysis of Relevance and Redundancy. Journal of Machine Learning Research. 2004; 5: 1205-1224 38. Almuallim H, and Dietterich TG. Efficient algorithms for identifying relevant features. In Proc. of the Ninth Canadian Conference on Artificial Intelligence, Vancouver, BC: Morgan Kaufmann publishers, 1992. 39. Aha DW and Bankert RL. A comparative evaluation of sequential feature selection algorithms. In D. Fisher & J.-H. Lenz Eds. Artificial Intelligence and Statistics V. New York: Springer-Verlag, 1996. 40. Siedlecki W and Skalansky J. On automatic feature selection. Int. J. Pattern Recog. Art. Intell. 1998; 2: 197-220. 41. Webb AR. Statistical Pattern Recognition. 2nd ed. Chichester: Wiley, 2002. 42. Jain AK, Duin RPW and Mao J. Statistical Pattern Recognition: A Review. In proc. IEEE Trans. Pattern Analysis and Machine Intelligence. 2000; 22 : 4-37. 43. Kwak N and Choi CH. Input Feature Selection by Mutual Information Based on Parzen Window. In Proc. IEEE Trans. Pattern Analysis and Machine Intelligence. 2002; 24 : 1667-1671. 44. Langley P. Selection of Relevant Features in Machine Learning. In Proc Relevance: AAAI Fall Symp: AAAI Press,1994. pp. 127-131. 45. Burke HB, Goodman PH, Rosen DB, Henson DE, Weinstein JN, Harrell FE Jr, Marks Jr, Winchester DP and Bostwick DG. Artificial neural networks improve the accuracy of cancer survival prediction. Cancer 1997; 79: 857—862. 46. Ranjit Abraham, Jay B.Simha and Iyengar SS. Medical data mining with a new algorithm for feature selection and Naïve Bayesian Classifier. In proc IEEE International Conference on Information Technology, 2007: 44-49. 47. Pechinizkiy M, Tsymbal A and Puuronen S. PCA-based Feature Transformations for Classification: Issues in Medical Diagnostics. In: R. Long Eds., Proc of 17th IEEE Symposium on Computer-Based Medical Systems, Bethesda, MD, 2004: 535-540 48. Richards G, Rayward-Smith VJ, Sonksen PH, Carey S and Weng C. Data mining for indicators of early mortality in a database of clinical records. Artif Intell Med 2001; 22: 215—231. 49. Pearl J. Probabilistic Reasoning in Intelligent Systems: Networks of plausible inference. San Mateo. CA: Morgan Kaufmann,1988 http://personales.unican.es/gutierjm/main/ai.html (Accessed August 2009). 50. Tom M. Mitchell. Machine Learning: McGraw-Hill, 1997. 51. Hellerstein J, Jayram Thathachar and Rish I. Recognizing end-user transactions in performance management. In Proc of AAAI-2000. Austin, Texas, 2000: 596_602. 52. Hankins T.L. Blood, dirt, and nomograms: A particular history of graphs. ISIS: Journal of the History of Science in Society, 1999; 90: 50_80. 53. Lubsen J, Pool J and Van der Does E. A practical device for the application of a diagnostic or prognostic function. Methods of Information in Medicine, 1978; 17: 127_129. 54. Harrell FE. Regression Modeling Strategies: With applications to Linear Models, Logistic Regression, and Survival Analysis. New York: Springer, 2001. 55. Kattan MW, Eastham JA, Stapleton AM, Wheeler TM and Scardino PT. A preoperative nomogram for disease recurrence following radical prostatectomy for prostate cancer. J Natl Cancer Inst. 1998; 90: 766-71. 56. Kononenko I. Inductive and bayesian learning in medical diagnosis. Applied Artificial Intelligence.1993; 7: 31-337. 57. Becker B, Kohavi R and Sommerfield D. Visualizing the simple Bayesian Classifier. In Fayyad U, Grinstein G and Wierse A Eds., Information visualization in data mining and knowledge discovery: Morgan Kaufmann Publishers, San Francisco, 2001. 237_249. 58. Baek Hwan Cho, Hwanjo Yu, Jongshill Lee, Young Joon Chee, In Young Kim and Sun I Kim. Nonlinear Support Vector Machine Visualization for Risk Factor Analysis Using Nomograms and Localized Radial Basis Function Kernels. In proc IEEE Transactions on Information Technology in Biomedicine. 2008; 12: 247-256. 59. Cho BH. Application of irregular and unbalanced data to predict diabetic nephropathy using visualization and feature selection methods. Artif Intell Med. 2008: 42: 37-53. 60. Asuncion A and Newman DJ (2007). UCI Machine Learning Repository. 61. Demsar J and Zupan B. Orange: From experimental machine learning to interactive data mining. 2004. White Paper http://www.ailab.si/orange . Faculty of Computer and Information Science, University of Ljubljana, Slovenia. 62. Brier GW. Verification of forecasts expressed in terms of probability. Monthly Weather Review. 1950; 75: 1-3. Paper received on 19/07/2009; accepted on 02/10/2009 Correspondence: Sarojini Balakrishnan
This Open Access article is available at: http://ijmi.org/index.php/ijmi/article/view/y09i1a5 © 2009 Author(s); licensee Indian Journal of Medical Informatics under Creative Commons Attribution-No Derivative Works 3.0 License . |
Comments on this article
-
seo software - seo tool - seo program - seo application
(485 Replies)
Groobbeli Groobbeli Groobbeli (2011-03-04) -
Newbik
Oceancini Oceancini Oceancini (2011-03-27) -
Greetings
RineekMaRlole RineekMaRlole RineekMaRlole (2011-04-06) -
seo
(1 Reply)
zeroxtrpo zeroxtrpo James Scriven (2011-05-05) -
seo
zeroxtrpo zeroxtrpo James Scriven (2011-05-08) -
Compensation
neiguefub neiguefub neiguefub (2011-05-25) -
Nice post
clainette clainette clainette (2011-08-28) -
car navigation systems 24 gucci bracelet
METPHEWPOTH METPHEWPOTH METPHEWPOTH (2011-08-29) -
hi i'm new to this forum
(372 Replies)
Andromedaav32 Andromedaav32 Andromedaav32 (2011-08-31) -
avatar book 2 online pl
Fonexpaxianop Fonexpaxianop Fonexpaxianop (2011-10-04) -
buying zithromax 500 mg in canada
Gromisilsss Gromisilsss Gromisilsss (2011-10-05) -
direct lender payday loans
Carryislora Carryislora Carryislora (2011-10-11) -
Это интересно
jerbesseN jerbesseN jerbesseN (2011-10-12) -
BUY CHEAP ONLINE LOTENSIN
figiociavonia figiociavonia figiociavonia (2011-10-14) -
справочник адресов и телефонов частных лиц г ярославля
hotliak hotliak hotliak (2011-10-19) -
GHD Eras Glamour Hair Straighteners
elumbetitle elumbetitle http://www.cheapghdausale.com (2011-10-21) -
richest 1% of the Connection States who in the end?
irralBash irralBash irralBash (2011-10-22) -
ghd online
MearfJawstete MearfJawstete The North Face Clothing (2011-10-30) -
buy zithromax 500mg in usa zithromax dosing
cithromaxer cithromaxer cithromaxer (2011-11-06) -
Buy relenza
Bobaddids Bobaddids BobaddidsNH (2011-11-08) -
Продать голд в WoW
Grolderee Grolderee GroldereeJT (2011-11-10) -
Today 11/11/2011
MariaMoon MariaMoon Maria Monezi (2011-11-11) -
Uggs Short
Slinilliown Slinilliown Ugg Boots UK (2011-11-15) -
Purchase viagra online
merstenfertersak merstenfertersak merstenfertersak (2011-11-20) -
New post
Pedaheelm Pedaheelm PedaheelmXN (2011-12-03) -
kolagen
HagactiOt HagactiOt HagactiOt (2011-12-03) -
Uggs Online
Hereeutteft Hereeutteft Ugg Boots Canada (2011-12-12) -
odszkodowania z oc
Tattomoffilla Tattomoffilla TattomoffillaCH (2011-12-17) -
Всетаки для домашнего фотоальбома
photobang photobang photobangLP (2011-12-18) -
Canada Goose coats
WeegoZerb WeegoZerb Canada Goose Parka (2011-12-19) -
ugg outlet
soissagnese soissagnese soissagneseUE (2011-12-19) -
cheap uggs,ugg canada,cheap uggs
leaymlconylf leaymlconylf ugg boots canada (2011-12-22) -
coach outlet viejas
pedederiobe pedederiobe pedederiobeKH (2011-12-22) -
Ugg Boots Outlet
nopsmoowsmasy nopsmoowsmasy Ugg Canada Online (2012-01-01) -
Recommended sites
VomoLoulp VomoLoulp VomoLoulpTK (2012-01-03) -
Valid online pharmacy
rcerkiesnih rcerkiesnih cvekrtieeoHJ (2012-01-03) -
clomid how long side effects persist - conception on 4th cycle on clomid
ZoyeMoonlord ZoyeMoonlord Zoye Moonlor (2012-01-04) -
Новый 2012 год
stroytorg-RT stroytorg-RT stroytorg-RTGP (2012-01-09) -
Then you will find the free of charge on-line observe checks which could give you an excellent notion of what to expect via the mcsa exams.
cafebroarve cafebroarve cafebroarve (2012-01-12) -
protonix and levothyroxine and cozaar - cozaar verses diovan
BeshkungeiRL BeshkungeiRL Beshik U Kengei (2012-01-22) -
cytotec induction cytotec what to expect
GaipsesmaddiX GaipsesmaddiX GaipsesmaddiXTA (2012-01-22) -
Michael Kors Handbags Outlet
smalomell smalomell Michael Kors Outlet Sale (2012-01-24)