• Title/Summary/Keyword: microarray expression data

Search Result 357, Processing Time 0.026 seconds

Comparison of clustering methods of microarray gene expression data (마이크로어레이 유전자 발현 자료에 대한 군집 방법 비교)

  • Lim, Jin-Soo;Lim, Dong-Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.1
    • /
    • pp.39-51
    • /
    • 2012
  • Cluster analysis has proven to be a useful tool for investigating the association structure among genes and samples in a microarray data set. We applied several cluster validation measures to evaluate the performance of clustering algorithms for analyzing microarray gene expression data, including hierarchical clustering, K-means, PAM, SOM and model-based clustering. The available validation measures fall into the three general categories of internal, stability and biological. The performance of clustering algorithms is evaluated using simulated and SRBCT microarray data. Our results from simulated data show that nearly every methods have good results with same result as the number of classes in the original data. For the SRBCT data the best choice for the number of clusters is less clear than the simulated data. It appeared that PAM, SOM, model-based method showed similar results to simulated data under Silhouette with of internal measure as well as PAM and model-based method under biological measure, while model-based clustering has the best value of stability measure.

Performance of the Agilent Microarray Platform for One-color Analysis of Gene Expression

  • Song Sunny;Lucas Anne;D'Andrade Petula;Visitacion Marc;Tangvoranuntakul Pam;FulmerSmentek Stephanie
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2006.02a
    • /
    • pp.78-78
    • /
    • 2006
  • Gene expression analysis can be performed by one-color (intensity-based) or two-color (ratio-based) microarray platforms depending on the specific applications and needs of the researcher. The traditional two-color approach is well founded from a historical and scientific standpoint, and the one-color approach, when paired with high quality microarrays and a robust workflow, offers additional flexibility in experimental design. Two of the major requirements of any microarray platform are system reproducibility, which provides the means for high confidence experiments and accurate comparison across multiple samples; and high sensitivity, for the detection of significant gene expression changes, including small fold changes across multiple gene sets. Each of these requirements is fulfilled by the Agilent One-color Gene Expression Platform as illustrated by the data included in this study. As a result, researchers have the ability to take advantage of the enhanced performance and sensitivity of Agilent's 60-mer oligonucleotide microarrays, and experience the first commercial microarray platform compatible with both one- and two-color detection.

  • PDF

PathTalk: Interpretation of Microarray Gene-Expression Clusters in Association with Biological Pathways

  • Chung, Tae-Su;Chung, Hee-Joon;Kim, Ju-Han
    • Genomics & Informatics
    • /
    • v.5 no.3
    • /
    • pp.124-128
    • /
    • 2007
  • Microarray technology enables us to measure the expression of tens of thousands of genes simultaneously under various experimental conditions. Clustering analysis is one of the most successful methods for analyzing microarray data using the assumption that co-expressed genes may be co-regulated. It is important to extract meaningful clusters from a long unordered list of clusters and to evaluate the functional homogeneity and heterogeneity of clusters. Many quality measures for clustering results have been suggested in different conditions. In the present study, we consider biological pathways as a collection of biological knowledge and used them as a reference for measuring the quality of clustering results and functional homogeneities. PathTalk visualizes and evaluates functional relationships between gene clusters and biological pathways.

Ensemble Gene Selection Method Based on Multiple Tree Models

  • Mingzhu Lou
    • Journal of Information Processing Systems
    • /
    • v.19 no.5
    • /
    • pp.652-662
    • /
    • 2023
  • Identifying highly discriminating genes is a critical step in tumor recognition tasks based on microarray gene expression profile data and machine learning. Gene selection based on tree models has been the subject of several studies. However, these methods are based on a single-tree model, often not robust to ultra-highdimensional microarray datasets, resulting in the loss of useful information and unsatisfactory classification accuracy. Motivated by the limitations of single-tree-based gene selection, in this study, ensemble gene selection methods based on multiple-tree models were studied to improve the classification performance of tumor identification. Specifically, we selected the three most representative tree models: ID3, random forest, and gradient boosting decision tree. Each tree model selects top-n genes from the microarray dataset based on its intrinsic mechanism. Subsequently, three ensemble gene selection methods were investigated, namely multipletree model intersection, multiple-tree module union, and multiple-tree module cross-union, were investigated. Experimental results on five benchmark public microarray gene expression datasets proved that the multiple tree module union is significantly superior to gene selection based on a single tree model and other competitive gene selection methods in classification accuracy.

Cancer-Subtype Classification Based on Gene Expression Data (유전자 발현 데이터를 이용한 암의 유형 분류 기법)

  • Cho Ji-Hoon;Lee Dongkwon;Lee Min-Young;Lee In-Beum
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.10 no.12
    • /
    • pp.1172-1180
    • /
    • 2004
  • Recently, the gene expression data, product of high-throughput technology, appeared in earnest and the studies related with it (so-called bioinformatics) occupied an important position in the field of biological and medical research. The microarray is a revolutionary technology which enables us to monitor several thousands of genes simultaneously and thus to gain an insight into the phenomena in the human body (e.g. the mechanism of cancer progression) at the molecular level. To obtain useful information from such gene expression measurements, it is essential to analyze the data with appropriate techniques. However the high-dimensionality of the data can bring about some problems such as curse of dimensionality and singularity problem of matrix computation, and hence makes it difficult to apply conventional data analysis methods. Therefore, the development of method which can effectively treat the data becomes a challenging issue in the field of computational biology. This research focuses on the gene selection and classification for cancer subtype discrimination based on gene expression (microarray) data.

Statistical Analysis about Ability to Mouse Embryonic Stem Cell Differentiation using cDNA Microarray

  • Choi, Hang-Suk;Kim, Sung-Ju;Lee, Young-Jin;Cha, Kyung-Joon;Kim, Chul-Geun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.4
    • /
    • pp.951-958
    • /
    • 2005
  • As a foundation study of stem cell applied research, it is necessary to identify the large gene expression through cDNA microarray to understand principles of the level of molecular about cell function. In this paper, we investigated the gene expression through the K-means clustering method and path analysis with genes related to pluripoteny and differentiation in an mouse early stage embryonic development process and embryonic stem cell differentiation. We find a few biological phenomenon through this study. Also, we realize that this process provides functional relationship of unknown genes.

  • PDF

A Report on the Inter-Gene Correlations in cDNA Microarray Data Sets (cDNA 마이크로어레이에서 유전자간 상관 관계에 대한 보고)

  • Kim, Byung-Soo;Jang, Jee-Sun;Kim, Sang-Cheol;Lim, Jo-Han
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.3
    • /
    • pp.617-626
    • /
    • 2009
  • A series of recent papers reported that the inter-gene correlations in Affymetrix microarray data sets were strong and long-ranged, and the assumption of independence or weak dependence among gene expression signals which was often employed without justification was in conflict with actual data. Qui et al. (2005) indicated that applying the nonparametric empirical Bayes method in which test statistics were pooled across genes for performing the statistical inference resulted in the large variance of the number of differentially expressed genes. Qui et al. (2005) attributed this effect to strong and long-ranged inter-gene correlations. Klebanov and Yakovlev (2007) demonstrated that the inter-gene correlations provided a rich source of information rather than being a nuisance in the statistical analysis and they developed, by transforming the original gene expression sequence, a sequence of independent random variables which they referred to as a ${\delta}$-sequence. We note in this report using two cDNA microarray data sets experimented in this country that the strong and long-ranged inter-gene correlations were still valid in cDNA microarray data and also the ${\delta}$-sequence of independence could be derived from the cDNA microarray data. This note suggests that the inter-gene correlations be considered in the future analysis of the cDNA microarray data sets.

Identification of Differentially Expressed Genes in the Dicer 1 Knock-down Mouse Embryos using Microarray

  • Lee, Jae-Dal;Cui, Xiang-Shun
    • Reproductive and Developmental Biology
    • /
    • v.32 no.4
    • /
    • pp.229-235
    • /
    • 2008
  • Silencing of Dicer1 by siRNA did not inhibit development up to the blastocyst stage, but decreased expression of selected transcription factors, including Oct-4, Sox2 and Nanog, suggesting that Dicer1 gene expression is associated with differentiation processes at the blastocyst stage (Cui et al., 2007). In order to get insights into genes which may be linked with microRNA system, we compared gene expression profiles in Gapdh and Dicer1 siRNA-microinjected blastocysts using the Applied Biosystem microarray technology. Our data showed that 397 and 737 out of 16354 genes were up- and down-regulated, respectively, following siRNA microinjection (p<0.05), including 24 up- and 28 down-regulated transcription factors. Identification of genes that are preferentially expressed at particular Dicer1 knock down embryos provides insights into the complex gene regulatory networks that drive differentiation processes in embryos at blastocyst stage.

Expression Profiles of Streptomyces Doxorubicin Biosynthetic Gene Cluster Using DNA Microarray System (DNA Microarray 시스템을 이용한 방선균 독소루비신 생합성 유전자군의 발현패턴 분석)

  • Kang Seung-Hoon;Kim Myung-Gun;Park Hyun-Joo;Kim Eung-Soo
    • KSBB Journal
    • /
    • v.20 no.3
    • /
    • pp.220-227
    • /
    • 2005
  • Doxorubicin is an anthracycline-family polyketide compound with a very potent anti-cancer activity, typically produced by Streptomyces peucetius. To understand the potential target biosynthetic genes critical for the doxorubicin everproduction, a doxorubicin-specific DNA microarray chip was fabricated and applied to reveal the growth-phase-dependent expression profiles of biosynthetic genes from two doxorubicin-overproducing strains along with the wild-type strain. Two doxorubicin-overproducing 5. peucetius strains were generated via over-expression of a dnrl (a doxorubicin-specific positive regulatory gene) and a doxA (a gene involved in the conversion from daunorubicin to doxorubicin) using a streptomycetes high expression vector containing a strong ermE promoter. Each doxorubicin-overproducing strain was quantitatively compared with the wild-type doxorubicin producer based on the growth-phase-dependent doxorubicin productivity as well as doxorubicin biosynthetic gene expression profiles. The doxorubicin-specific DNA microarray chip data revealed the early-and-steady expressions of the doxorubicin-specific regulatory gene (dnrl), the doxorubicin resistance genes (drrA, drrB, drrC), and the doxorubicin deoxysugar biosynthetic gene (dnmL) are critical for the doxorubicin overproduction in S. peucetius. These results provide that the relationship between the growth-phase-dependent doxorubicin productivity and the doxorubicin biosynthetic gene expression profiles should lead us a rational design of molecular genetic strain improvement strategy.

Gene Expression Data Analysis Using Seed Clustering (시드 클러스터링 방법에 의한 유전자 발현 데이터 분석)

  • Shin Myoung
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.42 no.1
    • /
    • pp.1-7
    • /
    • 2005
  • Cluster analysis of microarray data has been often used to find biologically relevant Broups of genes based on their expression levels. Since many functionally related genes tend to be co-expressed, by identifying groups of genes with similar expression profiles, the functionalities of unknown genes can be inferred from those of known genes in the same group. In this Paper we address a novel clustering approach, called seed clustering, and investigate its applicability for microarray data analysis. In the seed clustering method, seed genes are first extracted by computational analysis of their expression profiles and then clusters are generated by taking the seed genes as prototype vectors for target clusters. Since it has strong mathematical foundations, the seed clustering method produces the stable and consistent results in a systematic way. Also, our empirical results indicate that the automatically extracted seed genes are well representative of potential clusters hidden in the data, and that its performance is favorable compared to current approaches.