Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Genomics & Informatics
Journal Basic Information
Journal DOI :
Korea Genome Organization
Editor in Chief :
Volume & Issues
Volume 1, Issue 2 - Dec 2003
Volume 1, Issue 1 - Sep 2003
Selecting the target year
Data Mining for High Dimensional Data in Drug Discovery and Development
Lee, Kwan R. ; Park, Daniel C. ; Lin, Xiwu ; Eslava, Sergio ;
Genomics & Informatics, volume 1, issue 2, 2003, Pages 65~74
Data mining differs primarily from traditional data analysis on an important dimension, namely the scale of the data. That is the reason why not only statistical but also computer science principles are needed to extract information from large data sets. In this paper we briefly review data mining, its characteristics, typical data mining algorithms, and potential and ongoing applications of data mining at biopharmaceutical industries. The distinguishing characteristics of data mining lie in its understandability, scalability, its problem driven nature, and its analysis of retrospective or observational data in contrast to experimentally designed data. At a high level one can identify three types of problems for which data mining is useful: description, prediction and search. Brief review of data mining algorithms include decision trees and rules, nonlinear classification methods, memory-based methods, model-based clustering, and graphical dependency models. Application areas covered are discovery compound libraries, clinical trial and disease management data, genomics and proteomics, structural databases for candidate drug compounds, and other applications of pharmaceutical relevance.
Interpretation of Association Networks among Protein Sequence Motifs
Kam, Hye J. ; Lee, Junehawk ; Lee, Doheon ; Lee, Kwang H. ;
Genomics & Informatics, volume 1, issue 2, 2003, Pages 75~79
Every protein can be characterized by either a distinct motif or a combination of motifs. Nevertheless, little is known about the relationships among (more than two) the motifs. Some of the proteins in the world are share motifs for evolutional or other biological benefits - they can save energy, time and resource for controlling and managing a variety of proteins. In some cases of motifs, the tendency is quite common and they can act the 'hub' motif of a network of the motif associations. The hubs are structurally and functionally important in themselves and also important in disease-related mutations. They will be highly resistant mutation to conserve their functions. But, in case of the a rare mutation, mutations on the position of hub can more easily cause fatal diseases.
Classification of Human Papillomavirus (HPV) Risk Type via Text Mining
Park, Seong-Bae ; Hwang, Sohyun ; Zhang, Byoung-Tak ;
Genomics & Informatics, volume 1, issue 2, 2003, Pages 80~86
Human Papillomavirus (HPV) infection is known as the main factor for cervical cancer which is a leading cause of cancer deaths in women worldwide. Because there are more than 100 types in HPV, it is critical to discriminate the HPVs related with cervical cancer from those not related with it. In this paper, the risk type of HPVs using their textual explanation. The important issue in this problem is to distinguish false negatives from false positives. That is, we must find high-risk HPVs as many as possible though we may miss some low-risk HPVs. For this purpose, the AdaCost, a cost-sensitive learner is adopted to consider different costs between training examples. The experimental results on the HPV sequence database show that the consideration of costs gives higher performance. The improvement in F-score is higher than that of the accuracy, which implies that the number of high-risk HPVs found is increased.
Poor Correlation Between the New Statistical and the Old Empirical Algorithms for DNA Microarray Analysis
Kim, Ju Han ; Kuo, Winston P. ; Kong, Sek-Won ; Ohno-Machado, Lucila ; Kohane, Isaac S. ;
Genomics & Informatics, volume 1, issue 2, 2003, Pages 87~93
DNA microarray is currently the most prominent tool for investigating large-scale gene expression data. Different algorithms for measuring gene expression levels from scanned images of microarray experiments may significantly impact the following steps of functional genomic analyses.
recently introduced high-density microarrays and new statistical algorithms in Microarray Suit (MAS) version 5.0
. Very high correlations (0.92 - 0.97) between the new algorithms and the old algorithms (MAS 4.0) across several species and conditions were reported. We found that the column-wise array correlations had a tendency to be much higher than the row-wise gene correlations, which may be much more meaningful in the following higher-order data analyses including clustering and pattern analyses. In this paper, not only the detailed comparison of the two sets of algorithms is illustrated, but the impact of the introducing new algorithms on the further clustering analysis of microarray data and of possible pitfalls in mixing the old and the new algorithms were also described.
Rank-Based Nonlinear Normalization of Oligonucleotide Arrays
Park, Peter J. ; Kohane, Isaac S. ; Kim, Ju Han ;
Genomics & Informatics, volume 1, issue 2, 2003, Pages 94~100
Motivation: Many have observed a nonlinear relationship between the signal intensity and the transcript abundance in microarray data. The first step in analyzing the data is to normalize it properly, and this should include a correction for the nonlinearity. The commonly used linear normalization schemes do not address this problem. Results: Nonlinearity is present in both cDNA and oligonucleotide arrays, but we concentrate on the latter in this paper. Across a set of chips, we identify those genes whose within-chip ranks are relatively constant compared to other genes of similar intensity. For each gene, we compute the sum of the squares of the differences in its within-chip ranks between every pair of chips as our statistic and we select a small fraction of the genes with the minimal changes in ranks at each intensity level. These genes are most likely to be non-differentially expressed and are subsequently used in the normalization procedure. This method is a generalization of the rank-invariant normalization (Li and Wong, 2001), using all available chips rather than two at a time to gather more information, while using the chip that is least likely to be affected by nonlinear effects as the reference chip. The assumption in our method is that there are at least a small number of nondifferentially expressed genes across the intensity range. The normalized expression values can be substantially different from the unnormalized values and may result in altered down-stream analysis.
Construction of Deletion Map of 16q by LOH Analysis from HCC Patients and Physical Map on 16q 23.3 - 24.1 Region
Chung, Jiyeol ; Choi, Nae Yun ; Shim, Myoung Sup ; Choi, Dong Wook ; Kang, Hyen Sam ; Kim, Chang Min ; Kim, Ung Jin ; Park, Sun Hwa ; Kim, Hyeon ; Lee, Byeong Jae ;
Genomics & Informatics, volume 1, issue 2, 2003, Pages 101~107
Loss of heterozygosity (LOH) has been used to detect deleted regions of a specific chromosome in cancer cells. LOH on chromosome 16q has been reported to occur frequently in progressed hepatocellular carcinoma (HCC). Liver tissues from 37 Korean HCC patients were analyzed for LOH by using 25 polymorphic microsatellite markers distributed along 16q. Out of the 37 HCC patients studied, 21 patients (56.8%) showed LOH in various regions of 16q with at least one polymorphic marker. Puring the analysis of these 21 LOH cases, 6 patients showed interstitial LOHs in which the boundary of the LOH region was defined. With two rounds of LOH analysis, five commonly occurring interstitial LOH regions were identified; 16q21-22.1, 16q22.2 - 22.3, 16q22.3, 16q23.2 and 16q23.3 - 24.1. Among the five LOH regions the 16q23.3 - 24.1 region has been reported to be related with chromosome instability. A complete physical map, which covers the 3.2 Mb region of 16q23.3 - 24.1 (D16S402 and D16S486), was constructed to identify novel candidate tumor suppressor genes. We provide the minimally tiling path map consisting of 28 BAC clones. There was one gap between NT_10422.11 and NT_019609.9 of the human genome sequence contig (NCBI sequence build 33, April 29, 2003). This gap can be filled by sequencing the R-1425M20 clone which bridges these sequence contigs.
Combined Genome Mapping of RFLP-AFLP-SSR in Pepper
Lee, Je Min ; Kim, Byung-Dong ;
Genomics & Informatics, volume 1, issue 2, 2003, Pages 108~112
We have constructed a molecular linkage map of pepper (Capsicum spp.) in an interspecific
population of 107 plants with 320 RFLP, 136 AFLP, and 46 SSR markers. The resulting linkage map consists of 15 linkage groups covering 1,720 cM with an average map distance of 3.7 cM between framework markers. Most RFLP markers (
) were pepper-derived clones and these markers were evenly distributed all over the genome. Genes for defense and biosynthesis of carotenoids and capsaicinoids were mapped on this linkage map. By using 30 primer combinations, AFLP markers were generated in the
population. For development of SSR markers in Capsicum, microsatellites were isolated from two small-insert genomic libraries and the GenBank database. This combined map provides a starting point for high-resolution QTL analysis, gene isolation, and molecular breeding.
Heterologous Regulation of BCG hsp65 Promoter by M.leprae 18 kDa Transcription Repression Responsive Element
Kim, Hyun Bae ; You, Ji Chang ;
Genomics & Informatics, volume 1, issue 2, 2003, Pages 113~118
Among a number of antigens characterized in M leprae, an etiological agent of Leprosy, the 18 kDa antigen, is unique to M leprae. We have previously determined a sequence specific element in the 18 kDa gene of M leprae, which confers transcriptional repression. In this report, we have examined if the element could be applied to genes other than the 18 kDa gene of M leprae. To identify the roles of the regulatory sequence in heterologous promoter, we have constructed pB3 vector series, which contains BCG hsp65 promoter and the M leprae 18 kDa transcription repression responsive element in tandem using LacZ gene as a reporter gene. Cloning of hsp65 promoters of M bovis BCG or M smegmatis in front of LacZ gene resulted in normal
galactosidase activity as expected. However, when the sequence element was placed between the promoter and the LacZ gene,
-galactosidase activity was reduced 10-fold less. Also we have examined with pB3(-) vector, that harbors the transcription repression responsive element in a reversed orientation, the
-galactosidase activity was found to be similar to pB3(+) vector. Thus, these results further confirm that M leprae 18 kDa transcription repression responsive element could regulate BCG hsp65 heterologous promoter and that the element could act as an operator for the transcription of mycobacteria.