• 제목/요약/키워드: data sampling

검색결과 5,007건 처리시간 0.033초

Heterogeneous Ensemble of Classifiers from Under-Sampled and Over-Sampled Data for Imbalanced Data

  • Kang, Dae-Ki;Han, Min-gyu
    • International journal of advanced smart convergence
    • /
    • 제8권1호
    • /
    • pp.75-81
    • /
    • 2019
  • Data imbalance problem is common and causes serious problem in machine learning process. Sampling is one of the effective methods for solving data imbalance problem. Over-sampling increases the number of instances, so when over-sampling is applied in imbalanced data, it is applied to minority instances. Under-sampling reduces instances, which usually is performed on majority data. We apply under-sampling and over-sampling to imbalanced data and generate sampled data sets. From the generated data sets from sampling and original data set, we construct a heterogeneous ensemble of classifiers. We apply five different algorithms to the heterogeneous ensemble. Experimental results on an intrusion detection dataset as an imbalanced datasets show that our approach shows effective results.

개선된 선형 샘플치 출력 조절기 (An improved linear sampled-data output regulators)

  • 정선태
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 1997년도 한국자동제어학술회의논문집; 한국전력공사 서울연수원; 17-18 Oct. 1997
    • /
    • pp.1726-1729
    • /
    • 1997
  • In general, the solvability of linear robust output regulation problem are not preserved under time-sampling. Thus, it is found that the digital regulator implemented by itme-sampling of anlog output regulator designed based on the continuous-time linear system model is nothing but a 1st order approximation with respect to time-sampling. By the way, one can design an improved sampled-data regulator with respect to sampling time by utilizing the intrinsic structure of the system. In this paper, we study the system structures which it is possible to design an improved sampled-data regulator with respect to sampling time.

  • PDF

Fast Volume Visualization Techniques for Ultrasound Data

  • Kwon Koo-Joo;Shin Byeong-Seok
    • 대한의용생체공학회:의공학회지
    • /
    • 제27권1호
    • /
    • pp.6-13
    • /
    • 2006
  • Ultrasound visualization is a typical diagnosis method to examine organs, soft tissues and fetus data. It is difficult to visualize ultrasound data because the quality of the data might be degraded by artifact and speckle noise, and gathered with non-linear sampling. Rendering speed is too slow since we can not use additional data structures or procedures in rendering stage. In this paper, we use several visualization methods for fast rendering of ultrasound data. First method, denoted as adaptive ray sampling, is to reduce the number of samples by adjusting sampling interval in empty space. Secondly, we use early ray termination scheme with sufficiently wide sampling interval and low threshold value of opacity during color compositing. Lastly, we use bilinear interpolation instead of trilinear interpolation for sampling in transparent region. We conclude that our method reduces the rendering time without loss of image quality in comparison to the conventional methods.

비선형 샘플치 시스템의 출력조절 (Output regulation of nonlinear sampled-data systems)

  • 정선태
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 1996년도 한국자동제어학술회의논문집(국내학술편); 포항공과대학교, 포항; 24-26 Oct. 1996
    • /
    • pp.391-394
    • /
    • 1996
  • The effects of time-sampling on nonlinear output regulation problem is investigated. Output regulatedness is preserved under time sampling as in linear systems, however output regulatability is not robust with respect to time-sampling, and thus one needs to seek an approximate nonlinear sampled-data output regulator.

  • PDF

Scheduling algirithm of data sampling times in the real-time distributed control systems

  • Hong, Seung-Ho
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 1992년도 한국자동제어학술회의논문집(국제학술편); KOEX, Seoul; 19-21 Oct. 1992
    • /
    • pp.112-117
    • /
    • 1992
  • The Real-time Distributed Control Systems(RDCS) consist of several distributed control processes which share a network medium to exchange their data. Performance of feedback control loops in the RDCS is subject to the network-induced delays from sensor to controller and from controller to actuator. The network-induced delays are directly dependent upon the data sampling times of the control components which share a network medium. In this study, a scheduling algorithm of determining data sampling times is developed using the window concept, where the sampling data from the control components dynamically share a limited number of windows.

  • PDF

SPATIAL AND TEMPORAL INFLUENCES ON SOIL MOISTURE ESTIMATION

  • Kim, Gwang-seob
    • Water Engineering Research
    • /
    • 제3권1호
    • /
    • pp.31-44
    • /
    • 2002
  • The effect of diurnal cycle, intermittent visit of observation satellite, sensor installation, partial coverage of remote sensing, heterogeneity of soil properties and precipitation to the soil moisture estimation error were analyzed to present the global sampling strategy of soil moisture. Three models, the theoretical soil moisture model, WGR model proposed Waymire of at. (1984) to generate rainfall, and Turning Band Method to generate two dimensional soil porosity, active soil depth and loss coefficient field were used to construct sufficient two-dimensional soil moisture data based on different scenarios. The sampling error is dominated by sampling interval and design scheme. The effect of heterogeneity of soil properties and rainfall to sampling error is smaller than that of temporal gap and spatial gap. Selecting a small sampling interval can dramatically reduce the sampling error generated by other factors such as heterogeneity of rainfall, soil properties, topography, and climatic conditions. If the annual mean of coverage portion is about 90%, the effect of partial coverage to sampling error can be disregarded. The water retention capacity of fields is very important in the sampling error. The smaller the water retention capacity of the field (small soil porosity and thin active soil depth), the greater the sampling error. These results indicate that the sampling error is very sensitive to water retention capacity. Block random installation gets more accurate data than random installation of soil moisture gages. The Walnut Gulch soil moisture data show that the diurnal variation of soil moisture causes sampling error between 1 and 4 % in daily estimation.

  • PDF

A Comparison of Systematic Sampling Designs for Forest Inventory

  • Yim, Jong Su;Kleinn, Christoph;Kim, Sung Ho;Jeong, Jin-Hyun;Shin, Man Yong
    • 한국산림과학회지
    • /
    • 제98권2호
    • /
    • pp.133-141
    • /
    • 2009
  • This study was conducted to support for determining an efficient sampling design for forest resources assessments in South Korea with respect to statistical efficiency. For this objective, different systematic sampling designs were simulated and compared based on an artificial forest population that had been built from field sample data and satellite data in Yang-Pyeong County, Korea. Using the k-NN technique, two thematic maps (growing stock and forest cover type per pixel unit) across the test area were generated; field data (n=191) and Landsat ETM+ were used as source data. Four sampling designs (systematic sampling, systematic sampling for post-stratification, systematic cluster sampling, and stratified systematic sampling) were employed as optimum sampling design candidates. In order to compute error variance, the Monte Carlo simulation was used (k=1,000). Then, sampling error and relative efficiency were compared. When the objective of an inventory was to obtain estimations for the entire population, systematic cluster sampling was superior to the other sampling designs. If its objective is to obtain estimations for each sub-population, post-stratification gave a better estimation. In order to successfully perform this procedure, it requires clear definitions of strata of interest per field observation unit for efficient stratification.

Comparison of two sampling intervals and three sampling intervals VSI charts for monitoring both means and variances

  • Chang, Duk-Joon
    • Journal of the Korean Data and Information Science Society
    • /
    • 제26권4호
    • /
    • pp.997-1006
    • /
    • 2015
  • In industrial quality control, when engineers use VSI control procedure they should consider both required time to signal and switching behaviors together in the case of production process changed. Up to the present, many researchers have studied fixed sampling interval (FSI) chart and variable sampling interval (VSI) chart in the points of average number of samples to signal (ANSS) and average time to signal (ATS). However, ANSS and ATS do not provide any switching information between different sampling intervals of VSI schemes. In this study, performances of two sampling intervals VSI chart and three sampling intervals VSI chart are evaluated and compared. The numerical results show that ANSS and ATS values of two sampling intervals VSI chart and three sampling interval VSI chart are similar regardless the amount of shifts. However, the values of switching behaviors including ANSW are less efficient in three sampling intervals VSI charts than in two sampling intervals VSI chart.

오염부지 최적 개념모델 수립을 위한 전략적 샘플링 기법 소개 (Introduction to the Strategic Sampling Approaches to Construct Optimal Conceptual Model of a Contaminated Site)

  • 박현지;김한석;윤성택;조호영;권만재
    • 한국지하수토양환경학회지:지하수토양환경
    • /
    • 제25권2_spc호
    • /
    • pp.28-54
    • /
    • 2020
  • Even though a systematic sampling approach is very crucial in both the general and detailed investigation phases to produce the best conceptual site model for contaminated sites, the concept is not yet established in South Korea. The U.S. Environmental Protection Agency (EPA) issued the 'Strategic Sampling Approaches Technical guide' in 2018 to help environmental professionals choose which sampling approaches may be needed and most effective for given site conditions. The EPA guide broadly defines strategic sampling as the application of focused data collection across targeted areas of the conceptual site model (CSM) to provide the appropriate amount and type of information needed for decision-making. These strategic sampling approaches can prevent the essential data from missing, minimize the uncertainty of projects and secure the data which are necessary for the important site-decisions. Furthermore, these provide collaborative data sets through the life cycle phases of projects, which can generate more positive proofs on the site-decisions. The strategic sampling approaches can be divided by site conditions. This technical guide categorized it into eight conditions; High-resolution site characterization in unconsolidated environments, High-resolution site characterization in fractured sedimentary rock environments, Incremental sampling, Contaminant source definition, Passive groundwater sampling, Passive sampling for surface water and sediment, Groundwater to surface water interaction, and Vapor intrusion. This commentary paper introduces specific sampling methods based on site conditions when the strategic sampling approaches are applied.

기계학습 알고리즘의 컴퓨팅시간 단축을 위한 새로운 통계적 샘플링 기법 (A New Statistical Sampling Method for Reducing Computing time of Machine Learning Algorithms)

  • 전성해
    • 한국지능시스템학회논문지
    • /
    • 제21권2호
    • /
    • pp.171-177
    • /
    • 2011
  • 기계학습에서 모형의 정확도와 컴퓨팅시간은 중요하게 다루어지는 부분이다. 일반적으로 모형을 구축하는 데 사용되는 컴퓨팅시간은 분석에 사용되는 데이터의 크기에 비례하여 커진다. 따라서 컴퓨팅시간 단축을 위하여 분석에 사용되는 데이터의 크기를 줄이는 샘플링전략이 필요하다. 하지만 학습데이터의 크기가 작게 되면 구축된 모형의 정확도도 함께 떨어지게 된다. 본 논문에서는 이와 같은 문제를 해결하기 위하여 전체데이터를 분석하지 않아도 전체를 분석할 때와 비슷한 모형성능을 유지할 수 있는 새로운 통계적 샘플링방법을 제안한다. 주어진 데이터의 구조에 따라 최선의 통계적 샘플링기법을 선택할 수 있는 기준을 제시한다. 군집, 층화, 계통추출에 의한 통계적 샘플링기법을 사용하여 정확도를 최대한 유지하면서 컴퓨팅시간을 단축할 수 있는 방법을 보인다. 제안방법의 성능을 평가하기 위하여 객관적인 기계학습 데이터를 이용하여 전체데이터와 샘플데이터 간의 정확도와 컴퓨팅시간을 비교하였다.