• Title/Summary/Keyword: Pseudo data

Search Result 788, Processing Time 0.031 seconds

Regression analysis of interval censored competing risk data using a pseudo-value approach

  • Kim, Sooyeon;Kim, Yang-Jin
    • Communications for Statistical Applications and Methods
    • /
    • v.23 no.6
    • /
    • pp.555-562
    • /
    • 2016
  • Interval censored data often occur in an observational study where the subject is followed periodically. Instead of observing an exact failure time, two inspection times that include it are available. There are several methods to analyze interval censored failure time data (Sun, 2006). However, in the presence of competing risks, few methods have been suggested to estimate covariate effect on interval censored competing risk data. A sub-distribution hazard model is a commonly used regression model because it has one-to-one correspondence with a cumulative incidence function. Alternatively, Klein and Andersen (2005) proposed a pseudo-value approach that directly uses the cumulative incidence function. In this paper, we consider an extension of the pseudo-value approach into the interval censored data to estimate regression coefficients. The pseudo-values generated from the estimated cumulative incidence function then become response variables in a generalized estimating equation. Simulation studies show that the suggested method performs well in several situations and an HIV-AIDS cohort study is analyzed as a real data example.

Named entity recognition using transfer learning and small human- and meta-pseudo-labeled datasets

  • Kyoungman Bae;Joon-Ho Lim
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.59-70
    • /
    • 2024
  • We introduce a high-performance named entity recognition (NER) model for written and spoken language. To overcome challenges related to labeled data scarcity and domain shifts, we use transfer learning to leverage our previously developed KorBERT as the base model. We also adopt a meta-pseudo-label method using a teacher/student framework with labeled and unlabeled data. Our model presents two modifications. First, the student model is updated with an average loss from both human- and pseudo-labeled data. Second, the influence of noisy pseudo-labeled data is mitigated by considering feedback scores and updating the teacher model only when below a threshold (0.0005). We achieve the target NER performance in the spoken language domain and improve that in the written language domain by proposing a straightforward rollback method that reverts to the best model based on scarce human-labeled data. Further improvement is achieved by adjusting the label vector weights in the named entity dictionary.

Predictive Optimization Adjusted With Pseudo Data From A Missing Data Imputation Technique (결측 데이터 보정법에 의한 의사 데이터로 조정된 예측 최적화 방법)

  • Kim, Jeong-Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.2
    • /
    • pp.200-209
    • /
    • 2019
  • When forecasting future values, a model estimated after minimizing training errors can yield test errors higher than the training errors. This result is the over-fitting problem caused by an increase in model complexity when the model is focused only on a given dataset. Some regularization and resampling methods have been introduced to reduce test errors by alleviating this problem but have been designed for use with only a given dataset. In this paper, we propose a new optimization approach to reduce test errors by transforming a test error minimization problem into a training error minimization problem. To carry out this transformation, we needed additional data for the given dataset, termed pseudo data. To make proper use of pseudo data, we used three types of missing data imputation techniques. As an optimization tool, we chose the least squares method and combined it with an extra pseudo data instance. Furthermore, we present the numerical results supporting our proposed approach, which resulted in less test errors than the ordinary least squares method.

KNOTOIDS, PSEUDO KNOTOIDS, BRAIDOIDS AND PSEUDO BRAIDOIDS ON THE TORUS

  • Diamantis, Ioannis
    • Communications of the Korean Mathematical Society
    • /
    • v.37 no.4
    • /
    • pp.1221-1248
    • /
    • 2022
  • In this paper we study the theory of knotoids and braidoids and the theory of pseudo knotoids and pseudo braidoids on the torus T. In particular, we introduce the notion of mixed knotoids in S2, that generalizes the notion of mixed links in S3, and we present an isotopy theorem for mixed knotoids. We then generalize the Kauffman bracket polynomial, <; >, for mixed knotoids and we present a state sum formula for <; >. We also introduce the notion of mixed pseudo knotoids, that is, multi-knotoids on two components with some missing crossing information. More precisely, we present an isotopy theorem for mixed pseudo knotoids and we extend the Kauffman bracket polynomial for pseudo mixed knotoids. Finally, we introduce the theories of mixed braidoids and mixed pseudo braidoids as counterpart theories of mixed knotoids and mixed pseudo knotoids, respectively. With the use of the L-moves, that we also introduce here for mixed braidoid equivalence, we formulate and prove the analogue of the Alexander and the Markov theorems for mixed knotoids. We also formulate and prove the analogue of the Alexander theorem for mixed pseudo knotoids.

Performance of speech recognition unit considering morphological pronunciation variation (형태소 발음변이를 고려한 음성인식 단위의 성능)

  • Bang, Jeong-Uk;Kim, Sang-Hun;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.111-119
    • /
    • 2018
  • This paper proposes a method to improve speech recognition performance by extracting various pronunciations of the pseudo-morpheme unit from an eojeol unit corpus and generating a new recognition unit considering pronunciation variations. In the proposed method, we first align the pronunciation of the eojeol units and the pseudo-morpheme units, and then expand the pronunciation dictionary by extracting the new pronunciations of the pseudo-morpheme units at the pronunciation of the eojeol units. Then, we propose a new recognition unit that relies on pronunciation by tagging the obtained phoneme symbols according to the pseudo-morpheme units. The proposed units and their extended pronunciations are incorporated into the lexicon and language model of the speech recognizer. Experiments for performance evaluation are performed using the Korean speech recognizer with a trigram language model obtained by a 100 million pseudo-morpheme corpus and an acoustic model trained by a multi-genre broadcast speech data of 445 hours. The proposed method is shown to reduce the word error rate relatively by 13.8% in the news-genre evaluation data and by 4.5% in the total evaluation data.

Finding Pseudo Periods over Data Streams based on Multiple Hash Functions (다중 해시함수 기반 데이터 스트림에서의 아이템 의사 주기 탐사 기법)

  • Lee, Hak-Joo;Kim, Jae-Wan;Lee, Won-Suk
    • Journal of Information Technology Services
    • /
    • v.16 no.1
    • /
    • pp.73-82
    • /
    • 2017
  • Recently in-memory data stream processing has been actively applied to various subjects such as query processing, OLAP, data mining, i.e., frequent item sets, association rules, clustering. However, finding regular periodic patterns of events in an infinite data stream gets less attention. Most researches about finding periods use autocorrelation functions to find certain changes in periodic patterns, not period itself. And they usually find periodic patterns in time-series databases, not in data streams. Literally a period means the length or era of time that some phenomenon recur in a certain time interval. However in real applications a data set indeed evolves with tiny differences as time elapses. This kind of a period is called as a pseudo-period. This paper proposes a new scheme called FPMH (Finding Periods using Multiple Hash functions) algorithm to find such a set of pseudo-periods over a data stream based on multiple hash functions. According to the type of pseudo period, this paper categorizes FPMH into three, FPMH-E, FPMH-PC, FPMH-PP. To maximize the performance of the algorithm in the data stream environment and to keep most recent periodic patterns in memory, we applied decay mechanism to FPMH algorithms. FPMH algorithm minimizes the usage of memory as well as processing time with acceptable accuracy.

Driving altitude generation method with pseudo-3D building model for unmanned aerial vehicles

  • Hyeon Joong Wi;In Sung Jang;Ahyun Lee
    • ETRI Journal
    • /
    • v.45 no.2
    • /
    • pp.240-253
    • /
    • 2023
  • Spatial information is geometrical information combined with the properties of an object. In city areas where unmanned aerial vehicle (UAV) usage demand is high, it is necessary to determine the appropriate driving altitude considering the height of buildings for safe driving. In this study, we propose a data-provision method that generates the driving altitude of UAVs with a pseudo-3D building model. The pseudo-3D building model is developed using high-precision spatial information provided by the National Geographic Information Institute. The proposed method generates the driving altitude of the UAV in terms of tile information, including the UAV's starting and arrival points and a straight line between the two points, and provides the data to users. To evaluate the efficacy of the proposed method, UAV driving altitude information was generated using data of 763 551 pseudo-3D buildings in Seoul. Subsequently, the generated driving altitude data of the UAV was verified in AirSim. In addition, the execution time of the proposed method and the calculated driving altitude were analyzed.

HYPERBOLIC STRUCTURE OF POINTWISE INVERSE PSEUDO-ORBIT TRACING PROPERTY FOR C1 DIFFEOMORPHISMS

  • Manseob Lee
    • Communications of the Korean Mathematical Society
    • /
    • v.38 no.1
    • /
    • pp.243-256
    • /
    • 2023
  • We deal with a type of inverse pseudo-orbit tracing property with respect to the class of continuous methods, as suggested and studied by Pilyugin [54]. In this paper, we consider a continuous method induced through the diffeomorphism of a compact smooth manifold, and using the concept, we proved the following: (i) If a diffeomorphism f of a compact smooth manifold M has the robustly pointwise inverse pseudoorbit tracing property, f is structurally stable. (ii) For a C1 generic diffeomorphism f of a compact smooth manifold M, if f has the pointwise inverse pseudo-orbit tracing property, f is structurally stable. (iii) If a diffeomorphism f has the robustly pointwise inverse pseudo-orbit tracing property around a transitive set Λ, then Λ is hyperbolic for f. Finally, (iv) for C1 generically, if a diffeomorphism f has the pointwise inverse pseudo-orbit tracing property around a locally maximal transitive set Λ, then Λ is hyperbolic for f. In addition, we investigate cases of volume preserving diffeomorphisms.

Multiple imputation for competing risks survival data via pseudo-observations

  • Han, Seungbong;Andrei, Adin-Cristian;Tsui, Kam-Wah
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.4
    • /
    • pp.385-396
    • /
    • 2018
  • Competing risks are commonly encountered in biomedical research. Regression models for competing risks data can be developed based on data routinely collected in hospitals or general practices. However, these data sets usually contain the covariate missing values. To overcome this problem, multiple imputation is often used to fit regression models under a MAR assumption. Here, we introduce a multivariate imputation in a chained equations algorithm to deal with competing risks survival data. Using pseudo-observations, we make use of the available outcome information by accommodating the competing risk structure. Lastly, we illustrate the practical advantages of our approach using simulations and two data examples from a coronary artery disease data and hepatocellular carcinoma data.