Major gene identification for FASN gene in Korean cattles by data mining

데이터마이닝을 이용한 한우의 우수 지방산합성효소 유전자 조합 선별

  • Kim, Byung-Doo (Department of liberal arts in engineering, Kyungil University) ;
  • Kim, Hyun-Ji (Department of Statistics, Yeungnam University) ;
  • Lee, Seong-Won (Department of Computer Engineering, Kyungwoon University) ;
  • Lee, Jea-Young (Department of Statistics, Yeungnam University)
  • 김병두 (경일대학교 자연계열자율전공학과) ;
  • 김현지 (영남대학교 통계학과) ;
  • 이성원 (경운대학교 컴퓨터공학과) ;
  • 이제영 (영남대학교 통계학과)
  • Received : 2014.07.18
  • Accepted : 2014.10.21
  • Published : 2014.11.30


Economic traits of livestock are affected by environmental factors and genetic factors. In addition, it is not affected by one gene, but is affected by interaction of genes. We used a linear regression model in order to adjust environmental factors. And, in order to identify gene-gene interaction effect, we applied data mining techniques such as neural network, logistic regression, CART and C5.0 using five-SNPs (single nucleotide polymorphism) of FASN (fatty acid synthase). We divided total data into training (60%) and testing (40%) data, and applied the model which was designed by training data to testing data. By the comparison of prediction accuracy, C5.0 was identified as the best model. It were selected superior genotype using the decision tree.

가축의 경제적인 특성은 환경적인 요인과 유전적인 요인의 영향을 받으며, 또한 하나의 유전자가 아닌 여러 유전자의 상호작용의 영향을 받는다고 알려져 있다. 본 논문에서는 선형회귀모형을 활용하여 환경적인 요인을 보정한 자료로 한우의 맛과 육질에 영향을 준다고 밝혀진 지방산합성효소의 단일염기다형성 5개를 이용해 한우의 경제 형질에 영향을 미치는 우수 유전자 조합을 선별하고 우수 유전자형을 밝힌다. 이를 위해 데이터마이닝 기법인 인공신경망, 로지스틱 회귀모형, C5.0, CART 기법을 이용하였다. 공정한 모형 평가를 위해 전체 데이터를 훈련용 데이터 (60%)와 검증용 데이터 (40%)로 나누었고, 훈련용 데이터에서 설정된 모형을 검증용 데이터에 적용시켜 정확도를 비교하였다. 그 결과 C5.0이 최적 모형으로 선정되었으며, C5.0의 의사결정나무를 통해 우수 유전자 조합을 선별하였다.



  1. Berson, A., Smith, S. and Thearling, K. (2000). Building data mining applications for CRM, McGraw-Hill, New York.
  2. Breiman, L., Friedman, J. H., Olshen, R. and Stone, C. J. (1984). Classification and regression tree, Chapman & Hall, New York.
  3. Casas, E., White, S. N., Riley, D. G., Smith, T. P. L., Brenneman, R. A., Olson, T. A., Johnson, D. D., Coleman, S. W., Bennett, G. L. and Chase, C. C. (2005). Assessment of single nucleotide polymorphisms in genes residing on choromosomes 14 and 29 for association with carcass composition traits in Bos indicus cattle. Journal of Animal Scuence, 83, 13-19.
  4. Freund, Y. and Mason, L. (1999). The alternating decision tree learning algorithm. Proceedings of the Sixteenth International Conference on Machine Learning, 99, 121-133.
  5. Good, P. (2000). Permutation test : A practical guide to resampling methods for testing hypotheses, Springer-Verlag, New York.
  6. Heo, M. H. and Lee, Y. G. (2008). Data mining modeling and example, Hannarae, Seoul.
  7. Lee, J. W., Park, M. R. and Yoo, H. N. (2005). Statistical methods for life science research, Free Academy, Seoul.
  8. Lee, J. Y. and Jin, M. H. (2012). Major gene interaction identification in Hanwoo by adjusted environmental effects. Journal of the Korean Data & Information Science Society, 23, 467-474.
  9. Lee, Y. S., Oh, D. Y. and Yeo, J. S. (2011). Study on identification of candidate DNA marker related with beef quality in QTL region of BTA 2 in Hanwoo population. Journal of the Korean Data & Information Science Society, 22, 661-669.
  10. Mandell, I., Buchanan-Smith, G. and C. P. Campbell. 1998. Effects of forage vs grain feeding on carcass characteristics, fatty acid composition, and beef quality in Limousin-cross steers when time on feed is controlled. Journal of Animal Science, 76, 2619-2630.
  11. Matsuhashi. T., Maruyama. S., Uemoto. Y., Kobayashi. N., Mannen. H., Abe. T., Sakaguchi. S. and Kobayashi. E. (2011). Effects of bovine fatty acid synthase, stearoyl-coenzyme A desaturase, sterol regulatory element-binding protein 1, and growth ghormone gene polymorphisms on fatty acid composition and carcass traits in Japanese Black cattle. Journal of Animal Science, 89, 12-22.
  12. Melton, S. L., Amiri, M., Davis, G. W. and Backus, W. R. (1982). Flavor and chemical characteristics of ground beef from grass-, forage-grain- and grain-finished steers. Journal of Animal Science, 55, 77-87.
  13. Oh, D. Y., Lee, Y. S., La, B. M., Yeo, J. S., Chung, E. Y., Kim, Y. Y. and Lee, C. Y. (2011). Fatty acid composition of beef is associated with exonic nucleotide variants of the gene encoding FASN. Molecular Biology Reports, 39, 4083-4090.
  14. Park, I. S., Han, J. T., Sohn, H. S. and Kang, S. B. (2011). Developing the administrative model using the data mining technique for injury in National Health Insurance. Journal of the Korean Data & Information Science Society, 23, 467-476.
  15. Quinlan, J. R. (1993). C4.5: Programs for machine learning, Morgan-Kaufmann Publishers, San Mateo, CA.
  16. Sarle, W. S. (1994). Neural networks and statistical models. Proceedings of the 19th Annual SAS Users Group International Conference, 1-13.
  17. Tan, P., Steinbach, M. and Kumar, V. (2006). Introduction to data mining, Addison Wesley Longman, California, USA.

Cited by

  1. Application of DNA marker related with marbling score in Hanwoo cow vol.27, pp.3, 2016,
  2. The effect of dietary addition of herbal probiotics for the production of high quality Hanwoo vol.27, pp.4, 2016,