• Title/Summary/Keyword: Multiple regression model

Search Result 2,501, Processing Time 0.029 seconds

Application of discrete Weibull regression model with multiple imputation

  • Yoo, Hanna
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.3
    • /
    • pp.325-336
    • /
    • 2019
  • In this article we extend the discrete Weibull regression model in the presence of missing data. Discrete Weibull regression models can be adapted to various type of dispersion data however, it is not widely used. Recently Yoo (Journal of the Korean Data and Information Science Society, 30, 11-22, 2019) adapted the discrete Weibull regression model using single imputation. We extend their studies by using multiple imputation also with several various settings and compare the results. The purpose of this study is to address the merit of using multiple imputation in the presence of missing data in discrete count data. We analyzed the seventh Korean National Health and Nutrition Examination Survey (KNHANES VII), from 2016 to assess the factors influencing the variable, 1 month hospital stay, and we compared the results using discrete Weibull regression model with those of Poisson, negative Binomial and zero-inflated Poisson regression models, which are widely used in count data analyses. The results showed that the discrete Weibull regression model using multiple imputation provided the best fit. We also performed simulation studies to show the accuracy of the discrete Weibull regression using multiple imputation given both under- and over-dispersed distribution, as well as varying missing rates and sample size. Sensitivity analysis showed the influence of mis-specification and the robustness of the discrete Weibull model. Using imputation with discrete Weibull regression to analyze discrete data will increase explanatory power and is widely applicable to various types of dispersion data with a unified model.

A Study on Stochastic Estimation of Monthly Runoff by Multiple Regression Analysis (다중회귀분석에 의한 하천 월 유출량의 추계학적 추정에 관한 연구)

  • 김태철;정하우
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.22 no.3
    • /
    • pp.75-87
    • /
    • 1980
  • Most hydro]ogic phenomena are the complex and organic products of multiple causations like climatic and hydro-geological factors. A certain significant correlation on the run-off in river basin would be expected and foreseen in advance, and the effect of each these causual and associated factors (independant variables; present-month rainfall, previous-month run-off, evapotranspiration and relative humidity etc.) upon present-month run-off(dependent variable) may be determined by multiple regression analysis. Functions between independant and dependant variables should be treated repeatedly until satisfactory and optimal combination of independant variables can be obtained. Reliability of the estimated function should be tested according to the result of statistical criterion such as analysis of variance, coefficient of determination and significance-test of regression coefficients before first estimated multiple regression model in historical sequence is determined. But some error between observed and estimated run-off is still there. The error arises because the model used is an inadequate description of the system and because the data constituting the record represent only a sample from a population of monthly discharge observation, so that estimates of model parameter will be subject to sampling errors. Since this error which is a deviation from multiple regression plane cannot be explained by first estimated multiple regression equation, it can be considered as a random error governed by law of chance in nature. This unexplained variance by multiple regression equation can be solved by stochastic approach, that is, random error can be stochastically simulated by multiplying random normal variate to standard error of estimate. Finally hybrid model on estimation of monthly run-off in nonhistorical sequence can be determined by combining the determistic component of multiple regression equation and the stochastic component of random errors. Monthly run-off in Naju station in Yong-San river basin is estimated by multiple regression model and hybrid model. And some comparisons between observed and estimated run-off and between multiple regression model and already-existing estimation methods such as Gajiyama formula, tank model and Thomas-Fiering model are done. The results are as follows. (1) The optimal function to estimate monthly run-off in historical sequence is multiple linear regression equation in overall-month unit, that is; Qn=0.788Pn+0.130Qn-1-0.273En-0.1 About 85% of total variance of monthly runoff can be explained by multiple linear regression equation and its coefficient of determination (R2) is 0.843. This means we can estimate monthly runoff in historical sequence highly significantly with short data of observation by above mentioned equation. (2) The optimal function to estimate monthly runoff in nonhistorical sequence is hybrid model combined with multiple linear regression equation in overall-month unit and stochastic component, that is; Qn=0. 788Pn+0. l30Qn-1-0. 273En-0. 10+Sy.t The rest 15% of unexplained variance of monthly runoff can be explained by addition of stochastic process and a bit more reliable results of statistical characteristics of monthly runoff in non-historical sequence are derived. This estimated monthly runoff in non-historical sequence shows up the extraordinary value (maximum, minimum value) which is not appeared in the observed runoff as a random component. (3) "Frequency best fit coefficient" (R2f) of multiple linear regression equation is 0.847 which is the same value as Gaijyama's one. This implies that multiple linear regression equation and Gajiyama formula are theoretically rather reasonable functions.

  • PDF

Comparison of Genetic Parameter Estimates of Total Sperm Cells of Boars between Random Regression and Multiple Trait Animal Models

  • Oh, S.-H.;See, M.T.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.21 no.7
    • /
    • pp.923-927
    • /
    • 2008
  • The objective of this study was to compare random regression model and multiple trait animal model estimates of the (co) variance of total sperm cells over the active lifetime of AI boars. Data were provided by Smithfield Premium Genetics (Rose Hill, NC). Total number of records and animals for the random regression model were 19,629 and 1,736, respectively. Data for multiple trait animal model analyses were edited to include only records produced at 9, 12, 15, 18, 21, 24, and 27 months of age. For the multiple trait method estimates of genetic and residual variance for total sperm cells were heterogeneous among age classifications. When comparing multiple trait method to random regression, heritability estimates were similar except for total sperm cells at 24 months of age. The multiple trait method also resulted in higher estimates of heritability of total sperm cells at every age when compared to random regression results. Random regression analysis provided more detail with regard to changes of variance components with age. Random regression methods are the most appropriate to analyze semen traits as they are longitudinal data measured over the lifetime of boars.

Water Demand Forecasting by Characteristics of City Using Principal Component and Cluster Analyses

  • Choi, Tae-Ho;Kwon, O-Eun;Koo, Ja-Yong
    • Environmental Engineering Research
    • /
    • v.15 no.3
    • /
    • pp.135-140
    • /
    • 2010
  • With the various urban characteristics of each city, the existing water demand prediction, which uses average liter per capita day, cannot be used to achieve an accurate prediction as it fails to consider several variables. Thus, this study considered social and industrial factors of 164 local cities, in addition to population and other directly influential factors, and used main substance and cluster analyses to develop a more efficient water demand prediction model that considers unique localities of each city. After clustering, a multiple regression model was developed that proved that the $R^2$ value of the inclusive multiple regression model was 0.59; whereas, those of Clusters A and B were 0.62 and 0.74, respectively. Thus, the multiple regression model was considered more reasonable and valid than the inclusive multiple regression model. In summary, the water demand prediction model using principal component and cluster analyses as the standards to classify localities has a better modification coefficient than that of the inclusive multiple regression model, which does not consider localities.

Prediction of Pitting Corrosion Characteristics of AL-6XN Steel with Sensitization and Environmental Variables Using Multiple Linear Regression Method (다중선형회귀법을 활용한 예민화와 환경변수에 따른 AL-6XN강의 공식특성 예측)

  • Jung, Kwang-Hu;Kim, Seong-Jong
    • Corrosion Science and Technology
    • /
    • v.19 no.6
    • /
    • pp.302-309
    • /
    • 2020
  • This study aimed to predict the pitting corrosion characteristics of AL-6XN super-austenitic steel using multiple linear regression. The variables used in the model are degree of sensitization, temperature, and pH. Experiments were designed and cyclic polarization curve tests were conducted accordingly. The data obtained from the cyclic polarization curve tests were used as training data for the multiple linear regression model. The significance of each factor in the response (critical pitting potential, repassivation potential) was analyzed. The multiple linear regression model was validated using experimental conditions that were not included in the training data. As a result, the degree of sensitization showed a greater effect than the other variables. Multiple linear regression showed poor performance for prediction of repassivation potential. On the other hand, the model showed a considerable degree of predictive performance for critical pitting potential. The coefficient of determination (R2) was 0.7745. The possibility for pitting potential prediction was confirmed using multiple linear regression.

Bayesian Estimation for the Multiple Regression with Censored Data : Mutivariate Normal Error Terms

  • Yoon, Yong-Hwa
    • Journal of the Korean Data and Information Science Society
    • /
    • v.9 no.2
    • /
    • pp.165-172
    • /
    • 1998
  • This paper considers a linear regression model with censored data where each error term follows a multivariate normal distribution. In this paper we consider the diffuse prior distribution for parameters of the linear regression model. With censored data we derive the full conditional densities for parameters of a multiple regression model in order to obtain the marginal posterior densities of the relevant parameters through the Gibbs Sampler, which was proposed by Geman and Geman(1984) and utilized by Gelfand and Smith(1990) with statistical viewpoint.

  • PDF

Effect of Soil Factors on Vegetation Values of Salt Marsh Plant Communities: Multiple Regression Model

  • Ihm, Byung-Sun;Lee, Jeom-Sook;Kim, Jong-Wook;Kim, Joon-Ho
    • Journal of Ecology and Environment
    • /
    • v.29 no.4
    • /
    • pp.361-364
    • /
    • 2006
  • The objective of the current study was to characterize and apply multiple regression model relating to vegetation values of the plant species over salt marshes. For each salt marsh community, vegetation and soil variables were investigated in the western coast and the southern coast in South Korea. Osmotic potential of soil and $Cl^-$ content of soil as independent variable had positive and negative influences on vegetation values. Multiple regression model showed that vegetation values of 14 coastal plant communities were determined by pH of soil, osmotic potential of soil and sand content. The multiple regression equation may be applied to the explanation of distribution and abundance of plant communities with exiting ordination plots.

Development and Evaluation of Simple Regression Model and Multiple Regression Model for TOC Contentation Estimation in Stream Flow (하천수내 TOC 농도 추정을 위한 단순회귀모형과 다중회귀모형의 개발과 평가)

  • Jung, Jaewoon;Cho, Sohyun;Choi, Jinhee;Kim, Kapsoon;Jung, Soojung;Lim, Byungjin
    • Journal of Korean Society on Water Environment
    • /
    • v.29 no.5
    • /
    • pp.625-629
    • /
    • 2013
  • The objective of this study is to develop and evaluate simple and multiple regression models for Total Organic Carbon (TOC) concentration estimation in stream flow. For development (using water quality data in 2012) and evaluation (using water quality data in 2011) of regression models, we used water quality data from downstream of Yeongsan river basin during 2011 and 2012, and correlation analysis between TOC and water quality parameters was conducted. The concentrations of TOC were positively correlated with Chemical Oxygen Demand (COD), Biochemical Oxygen Demand (BOD), TN (Total Nitrogen), Water Temperature (WT) and Electric Conductivity (EC). From these results, simple and multiple regression models for TOC estimation were developed as follows : $TOC=0.5809{\times}BOD+3.1557$, $TOC=0.4365{\times}COD+1.3731$. As a result of the application evaluation of the developed regression models, the multiple regression model was found to estimate TOC better than simple regression models.

The Development of the DEA-AR Model using Multiple Regression Analysis and Efficiency Evaluation of Regional Corporation in Korea (다중회귀분석을 이용한 DEA-AR 모형 개발 및 국내 지방공사의 효율성 평가)

  • Sim, Gwang-Sic;Kim, Jae-Yun
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.37 no.1
    • /
    • pp.29-43
    • /
    • 2012
  • We design a DEA-AR model using multiple regression analysis with new methods which limit weights. When there are multiple input and single output variables, our model can be used, and the weights of input variables use the regression coefficient and coefficient of determination. To verify the effectiveness of the new model, we evaluate the efficiency of the Regional Corporations in Korea. Accordance with statistical analysis, it proved that there is no difference between the efficiency value of the DEA-AR using AHP and our DEA-AR model. Our model can be applied to a lot of research by substituting DEA-AR model relying on AHP in the future.

An Approach to Applying Multiple Linear Regression Models by Interlacing Data in Classifying Similar Software

  • Lim, Hyun-il
    • Journal of Information Processing Systems
    • /
    • v.18 no.2
    • /
    • pp.268-281
    • /
    • 2022
  • The development of information technology is bringing many changes to everyday life, and machine learning can be used as a technique to solve a wide range of real-world problems. Analysis and utilization of data are essential processes in applying machine learning to real-world problems. As a method of processing data in machine learning, we propose an approach based on applying multiple linear regression models by interlacing data to the task of classifying similar software. Linear regression is widely used in estimation problems to model the relationship between input and output data. In our approach, multiple linear regression models are generated by training on interlaced feature data. A combination of these multiple models is then used as the prediction model for classifying similar software. Experiments are performed to evaluate the proposed approach as compared to conventional linear regression, and the experimental results show that the proposed method classifies similar software more accurately than the conventional model. We anticipate the proposed approach to be applied to various kinds of classification problems to improve the accuracy of conventional linear regression.