• Title/Summary/Keyword: microarray expression data

Search Result 357, Processing Time 0.022 seconds

Statistical Methods for Gene Expression Data

  • Kim, Choongrak
    • Communications for Statistical Applications and Methods
    • /
    • v.11 no.1
    • /
    • pp.59-77
    • /
    • 2004
  • Since the introduction of DNA microarray, a revolutionary high through-put biological technology, a lot of papers have been published to deal with the analyses of the gene expression data from the microarray. In this paper we review most papers relevant to the cDNA microarray data, classify them in statistical methods' point of view, and present some statistical methods deserving consideration and future study.

A Method for Microarray Data Analysis based on Bayesian Networks using an Efficient Structural learning Algorithm and Data Dimensionality Reduction (효율적 구조 학습 알고리즘과 데이타 차원축소를 통한 베이지안망 기반의 마이크로어레이 데이타 분석법)

  • 황규백;장정호;장병탁
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.11
    • /
    • pp.775-784
    • /
    • 2002
  • Microarray data, obtained from DNA chip technologies, is the measurement of the expression level of thousands of genes in cells or tissues. It is used for gene function prediction or cancer diagnosis based on gene expression patterns. Among diverse methods for data analysis, the Bayesian network represents the relationships among data attributes in the form of a graph structure. This property enables us to discover various relations among genes and the characteristics of the tissue (e.g., the cancer type) through microarray data analysis. However, most of the present microarray data sets are so sparse that it is difficult to apply general analysis methods, including Bayesian networks, directly. In this paper, we harness an efficient structural learning algorithm and data dimensionality reduction in order to analyze microarray data using Bayesian networks. The proposed method was applied to the analysis of real microarray data, i.e., the NC160 data set. And its usefulness was evaluated based on the accuracy of the teamed Bayesian networks on representing the known biological facts.

Microarray Data Sharing System (마이크로어레이 데이터 공유 시스템)

  • Yoon, Jee-Hee;Hong, Dong-Wan;Lee, Jong-Keun
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.8
    • /
    • pp.18-31
    • /
    • 2009
  • Improved reliability of microarray data and its reproducibility lead to recent increment in demand of data sharing and utilization among laboratories, but house-keeping and publicly opened microarray experimental data can hardly be accessed and utilized since they are in heterogeneous formats according to the various experimental methods and microarray platforms. In this paper, we propose a microarray sharing method which can easily retrieve and integrate microarray data from different experiment platforms, data formats, normalization methods, and analysis methods. Our system is based on web-service technology. The biologists of each site are able to search UDDI(Universal Description, Discovery, and Integration) registry, and download microarray data with common data structure of standard format recommended by MGED(Microarray Gene Expression Databases) society. The common data structure defined in this paper consists of IDF(Investigation Design Format), ADF(Array Design Format), SDRF(Sample and Relationship Format), and EDF(Expression Data Format). These components play role as templates to integrate microarray data with various structure and can be stored in standard formats such as MAGE-ML, MAGE-TAB, and XML Schema. In addition, our system provides advanced tools of automatic microarray data submitter and file manager to manipulate local microarray data efficiently.

Quality Control Usage in High-Density Microarrays Reveals Differential Gene Expression Profiles in Ovarian Cancer

  • Villegas-Ruiz, Vanessa;Moreno, Jose;Jacome-Lopez, Karina;Zentella-Dehesa, Alejandro;Juarez-Mendez, Sergio
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.17 no.5
    • /
    • pp.2519-2525
    • /
    • 2016
  • There are several existing reports of microarray chip use for assessment of altered gene expression in different diseases. In fact, there have been over 1.5 million assays of this kind performed over the last twenty years, which have influenced clinical and translational research studies. The most commonly used DNA microarray platforms are Affymetrix GeneChip and Quality Control Software along with their GeneChip Probe Arrays. These chips are created using several quality controls to confirm the success of each assay, but their actual impact on gene expression profiles had not been previously analyzed until the appearance of several bioinformatics tools for this purpose. We here performed a data mining analysis, in this case specifically focused on ovarian cancer, as well as healthy ovarian tissue and ovarian cell lines, in order to confirm quality control results and associated variation in gene expression profiles. The microarray data used in our research were downloaded from ArrayExpress and Gene Expression Omnibus (GEO) and analyzed with Expression Console Software using RMA, MAS5 and Plier algorithms. The gene expression profiles were obtained using Partek Genomics Suite v6.6 and data were visualized using principal component analysis, heat map, and Venn diagrams. Microarray quality control analysis showed that roughly 40% of the microarray files were false negative, demonstrating over- and under-estimation of expressed genes. Additionally, we confirmed the results performing second analysis using independent samples. About 70% of the significant expressed genes were correlated in both analyses. These results demonstrate the importance of appropriate microarray processing to obtain a reliable gene expression profile.

Standard-based Integration of Heterogeneous Large-scale DNA Microarray Data for Improving Reusability

  • Jung, Yong;Seo, Hwa-Jeong;Park, Yu-Rang;Kim, Ji-Hun;Bien, Sang Jay;Kim, Ju-Han
    • Genomics & Informatics
    • /
    • v.9 no.1
    • /
    • pp.19-27
    • /
    • 2011
  • Gene Expression Omnibus (GEO) has kept the largest amount of gene-expression microarray data that have grown exponentially. Microarray data in GEO have been generated in many different formats and often lack standardized annotation and documentation. It is hard to know if preprocessing has been applied to a dataset or not and in what way. Standard-based integration of heterogeneous data formats and metadata is necessary for comprehensive data query, analysis and mining. We attempted to integrate the heterogeneous microarray data in GEO based on Minimum Information About a Microarray Experiment (MIAME) standard. We unified the data fields of GEO Data table and mapped the attributes of GEO metadata into MIAME elements. We also discriminated non-preprocessed raw datasets from others and processed ones by using a two-step classification method. Most of the procedures were developed as semi-automated algorithms with some degree of text mining techniques. We localized 2,967 Platforms, 4,867 Series and 103,590 Samples with covering 279 organisms, integrated them into a standard-based relational schema and developed a comprehensive query interface to extract. Our tool, GEOQuest is available at http://www.snubi.org/software/GEOQuest/.

Biological Pathway Extension Using Microarray Gene Expression Data

  • Chung, Tae-Su;Kim, Ji-Hun;Kim, Kee-Won;Kim, Ju-Han
    • Genomics & Informatics
    • /
    • v.6 no.4
    • /
    • pp.202-209
    • /
    • 2008
  • Biological pathways are known as collections of knowledge of certain biological processes. Although knowledge about a pathway is quite significant to further analysis, it covers only tiny portion of genes that exists. In this paper, we suggest a model to extend each individual pathway using a microarray expression data based on the known knowledge about the pathway. We take the Rosetta compendium dataset to extend pathways of Saccharomyces cerevisiae obtained from KEGG (Kyoto Encyclopedia of genes and genomes) database. Before applying our model, we verify the underlying assumption that microarray data reflect the interactive knowledge from pathway, and we evaluate our scoring system by introducing performance function. In the last step, we validate proposed candidates with the help of another type of biological information. We introduced a pathway extending model using its intrinsic structure and microarray expression data. The model provides the suitable candidate genes for each single biological pathway to extend it.

Poor Correlation Between the New Statistical and the Old Empirical Algorithms for DNA Microarray Analysis

  • Kim, Ju Han;Kuo, Winston P.;Kong, Sek-Won;Ohno-Machado, Lucila;Kohane, Isaac S.
    • Genomics & Informatics
    • /
    • v.1 no.2
    • /
    • pp.87-93
    • /
    • 2003
  • DNA microarray is currently the most prominent tool for investigating large-scale gene expression data. Different algorithms for measuring gene expression levels from scanned images of microarray experiments may significantly impact the following steps of functional genomic analyses. $Affymetrix^{(R)}$ recently introduced high-density microarrays and new statistical algorithms in Microarray Suit (MAS) version 5.0$^{(R)}$. Very high correlations (0.92 - 0.97) between the new algorithms and the old algorithms (MAS 4.0) across several species and conditions were reported. We found that the column-wise array correlations had a tendency to be much higher than the row-wise gene correlations, which may be much more meaningful in the following higher-order data analyses including clustering and pattern analyses. In this paper, not only the detailed comparison of the two sets of algorithms is illustrated, but the impact of the introducing new algorithms on the further clustering analysis of microarray data and of possible pitfalls in mixing the old and the new algorithms were also described.

Gene Expression study of human chromosomal aneuploid

  • Lee Su-Man
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2006.02a
    • /
    • pp.98-107
    • /
    • 2006
  • Chromosomal copy number changes (aneuploidies) are common in human populations. The extra chromosome can affect gene expression by whole-genome level. By gene expression microarray analysis, we want to find aberrant gene expression due to aneuploidies in Klinefelter (+X) and Down syndrome (+21). We have analyzed the inactivation status of X-linked genes in Klinefelter Syndrome (KS) by using X-linked cDNA microarray and cSNP analysis. We analyzed the expression of 190 X-linked genes by cDNA microarray from the lymphocytes of five KS patients and five females (XX) with normal males (XY) controls. cDNA microarray experiments and cSNP analysis showed the differentially expressed genes were similar between KS and XX cases. To analyze the differential gene expressions in Down Syndrome (DS), Amniotic Fluid (AF)cells were collected from 12 pregnancies at $16{\sim}18$ weeks of gestation in DS (n=6) and normal (n=6) subjects. We also analysis AF cells for a DNA microarray system and compared the chip data with two dimensional protein gel analysis of amniotic fluid. Our data may provide the basis for a more systematic identification of biological markers of fetal DS, thus leading to an improved understanding of pathogenesis for fetal DS.

  • PDF

Network-based Microarray Data Analysis Tool

  • Park, Hee-Chang;Ryu, Ki-Hyun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.1
    • /
    • pp.53-62
    • /
    • 2006
  • DNA microarray data analysis is a new technology to investigate the expression levels of thousands of genes simultaneously. Since DNA microarray data structures are various and complicative, the data are generally stored in databases for approaching to and controlling the data effectively. But we have some difficulties to analyze and control the data when the data are stored in the several database management systems or that the data are stored to the file format. The existing analysis tools for DNA microarray data have many difficult problems by complicated instructions, and dependency on data types and operating system. In this paper, we design and implement network-based analysis tool for obtaining to useful information from DNA microarray data. When we use this tool, we can analyze effectively DNA microarray data without special knowledge and education for data types and analytical methods.

  • PDF

Screening and Clustering for Time-course Yeast Microarray Gene Expression Data using Gaussian Process Regression (효모 마이크로어레이 유전자 발현데이터에 대한 가우시안 과정 회귀를 이용한 유전자 선별 및 군집화)

  • Kim, Jaehee;Kim, Taehoun
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.3
    • /
    • pp.389-399
    • /
    • 2013
  • This article introduces Gaussian process regression and shows its application with time-course microarray gene expression data. Gene screening for yeast cell cycle microarray expression data is accomplished with a ratio of log marginal likelihood that uses Gaussian process regression with a squared exponential covariance kernel function. Gaussian process regression fitting with each gene is done and shown with the nine top ranking genes. With the screened data the Gaussian model-based clustering is done and its silhouette values are calculated for cluster validity.