Volume 6 Supplement 1
Diagnosis of alcoholism based on neural network analysis of phenotypic risk factors
© Falk; licensee BioMed Central Ltd 2005
Published: 30 December 2005
Alcoholism is a serious public health problem. It has both genetic and environmental causes. In an effort to gain understanding of the underlying genetic susceptibility to alcoholism, a long-term study has been undertaken. The Collaborative Study on the Genetics of Alcoholism (COGA) provides a rich source of genetic and phenotypic data. One ongoing problem is the difficulty of reliably diagnosing alcoholism, despite many known risk factors and measurements. We have applied a well known pattern-matching method, neural network analysis, to phenotypic data provided to participants in Genetic Analysis Workshop 14 by COGA. The aim is to train the network to recognize complex phenotypic patterns that are characteristic of those with alcoholism as well as those who are free of symptoms. Our results indicate that this approach may be helpful in the diagnosis of alcoholism.
Training and testing of input/output pairs of risk factors by means of a "feed-forward back-propagation" neural network resulted in reliability of about 94% in predicting the presence or absence of alcoholism based on 36 input phenotypic risk factors. Pruning the neural network to remove relatively uninformative factors resulted in a reduced network of 14 input factors that was still 95% reliable. Some of the factors selected by the pruning steps have been identified as traits that show either linkage or association to potential candidate regions.
The complex, multivariate picture formed by known risk factors for alcoholism can be incorporated into a neural network analysis that reliably predicts the presence or absence of alcoholism about 94–95% of the time. Several characteristics that were identified by a pruned neural network have previously been shown to be important in this disease based on more traditional linkage and association studies. Neural networks therefore provide one less traditional approach to both identifying alcoholic individuals and determining the most informative risk factors.
Alcoholism, like many other complex traits, offers a challenge to those trying to categorize individuals as either normal or affected. If we are to succeed in finding genes underlying susceptibility to the trait or traits, it is necessary to have reliable methods for assigning disease phenotypes. Many diagnostic methods for alcoholism have been proposed that use a combination of responses on questionnaires, physical measurements, and observational data. Two of the main methods in use today are known as DSM-III-R+Feighner and DSM-IV-R. Results from these diagnosis standards are available for most individuals in the Collaborative Study on the Genetics of Alcoholism (COGA) database.
Information that was provided to participants of Genetic Analysis Workshop 14 (GAW14) included both diagnoses, together with a large number of "phenotypic" variables for each individual. Taken together, these variables provide a complex, multivariate picture of some of the information used in determining a diagnosis of alcoholism. In order to test the reliability of the given set of risk factors to predict the affection status of individuals in the dataset, we decided to apply a well known pattern matching technique to the data. We developed a back-propagation, feed-forward neural network using most of the risk factor data available. The risk factors were coded to provide input used to train the network to predict whether each individual's pattern of input factors indicated the presence of alcoholism or whether the pattern suggested a normal phenotype. We used the second diagnostic method mentioned above, namely DSM-IV-R, coded as ALDX2 in the GAW dataset. Our results indicate that neural networks can be useful in helping to determine the disease classification of individuals with respect to alcoholism.
Coding of 36 risk factors for alcoholism
Persistent desire to stop drinking
Ever binge drink
So much time drinking...
no or < 1 month
yes > 1 month
Narrowing of drinking repertoire
Gave up activities to drink
Blackouts (3 or more)
Physical health problems
Scaled from 0 to 1
3-column binary coding
Drinks per day
< 20 years
> 40 years
After determining that training in the separate replicates was quite reliable (97–98.6%) and that validation was also quite good (85.7–90%), we attempted to determine which of the input parameters were most informative in obtaining a reliably trained neural network. To accomplish this we systematically pruned input factors and compared the results to those of the "full" neural network. Pruning was done by sequentially dropping one input factor at a time and noting the new number of "incorrect" predictions. Those input factors that had the most impact (i.e., increased the number of errors by more than 40% of the number of incorrect predictions for the entire set of input factors) were retained for the pruned network. The other input factors were dropped. Based on this pruning method, we selected 14 of the original 36 input factors and used them for training a streamlined neural network. We also examined the average differences between input values for the correctly and incorrectly classified individuals in the affected and unaffected classes to determine which parameters differed most significantly between the two groups.
Diagnosis of alcoholism is clearly a complex task, and several methods of classification have been devised to help determine a reliable, robust diagnosis. We have taken the set of risk factors and phenotypic measurements provided to GAW participants and have trained a neural network to classify individuals as either affected or normal. The factors appear to allow for fairly accurate training, with at least 97% agreement between the provided diagnosis and the predicted diagnosis in the full neural network. Validation, while not as high, is still quite good (between 85 and 90%). It is even possible to define a fairly narrow set of factors that continue to do a good job of predicting. It is interesting to note that several of the factors remaining in the pruned set have been cited in previous analyses of the COGA data as being linked to or associated with genes of interest. For example, ecb21 has been shown to exhibit linkage disequilibrium with GABAA receptor genes on chromosome 4 . "Maximum number of drinks in a 24-hour period", when used as a quantitative trait, has also shown evidence of linkage on chromosome 4 near the alcohol dehydrogenase gene cluster . The factor ttth1, also present in the reduced set of factors, has shown evidence of linkage on chromosome 7 . Thus, the pruning may have identified several risk factors that are, in fact, likely to be linked to or associated with genes implicated in alcoholism.
Comparison of maximum number of drinks in a 24-hour period between correctly and incorrectly classified individuals
Neural network outcome
Max. no. of drinks
Unaffected, some symptoms
Neural network analysis does not necessarily replace the more standard regression methods. Rather, it may provide new (alternative) insight into the importance of risk factors (as in the case of individuals designated unaffected but with some symptoms. It would be interesting to compare the set of significant covariates identified by a regression analysis with the pruned set of risk factors in the neural network analysis. It would be encouraging to find significant overlap. The fact that several of the pruned factors have previously been identified in linkage or association studies suggests that this might be the case.
The complex, multivariate picture formed by known risk factors for alcoholism can be incorporated into a neural network analysis that reliable predicts presence or absence of alcoholism about 94–95% of the time. Results show that one of the important indicators of susceptibility to alcoholism is the maximum number of drinks consumed in 24 hours. This characteristic and others that were identified by the pruned neural network have been shown to be important in this disease based on more traditional linkage and association studies. Neural networks therefore provide one less traditional approach to both identifying alcoholic individuals and determining the most informative risk factors.
Collaborative Study on the Genetics of Alcoholism
Genetic Analysis Workshop 14
This work is supported by a grant from the NIH (GM29177).
- Falk CT, Gilchrist JM, Pericak-Vance MA, Speer MC: Using neural networks as an aid in the determination of disease status: comparison of clinical diagnosis to neural-network predictions in a pedigree with autosomal dominant limb-girdle muscular dystrophy. Am J Hum Genet. 1998, 62: 941-949. 10.1086/301780.PubMed CentralView ArticlePubMedGoogle Scholar
- Porjesz B, Almasy L, Edenberg HJ, Wang K, Chorlian DB, Foroud T, Goate A, Rice JP, O'Connor SJ, Rohrbaugh J, Kuperman S, Bauer LO, Crowe RR, Schuckit MA, Hesselbrock V, Conneally PM, Tischfield JA, Li TK, Reich T, Begleiter H: Linkage disequilibrium between the beta frequency of the human EEG and a GABAA receptor gene locus. Proc Natl Acad Sci USA. 2002, 99: 3729-3733. 10.1073/pnas.052716399.PubMed CentralView ArticlePubMedGoogle Scholar
- Saccone NL, Kwon JM, Corbett J, Goate A, Rochberg N, Edenberg HJ, Foroud T, Li TK, Begleiter H, Reich T, Rice JP: A genome screen of maximum number of drinks as an alcoholism phenotype. Am J Med Genet. 2000, 96: 632-637. 10.1002/1096-8628(20001009)96:5<632::AID-AJMG8>3.0.CO;2-#.View ArticlePubMedGoogle Scholar
- Jones KA, Porjesz B, Almasy L, Bierut L, Goate A, Wang JC, Dick DM, Hinrichs A, Kwon J, Rice JP, Rohrbaugh J, Stock H, Wu W, Bauer LO, Chorlian DB, Crowe RR, Edenberg HJ, Foroud T, Hesselbrock V, Kuperman S, Nurnberger J, O'Connor SJ, Schuckit MA, Stimus AT, Tischfield JA, Reich T, Begleiter H: Linkage and linkage disequilibrium of evoked EEG oscillations with CHRM2 receptor gene polymorphisms: implications for human brain dynamics and cognition. Int J Psychophysiol. 2004, 53: 75-90. 10.1016/j.ijpsycho.2004.02.004.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.