• 제목/요약/키워드: Pseudo data

검색결과 788건 처리시간 0.027초

Regression analysis of interval censored competing risk data using a pseudo-value approach

  • Kim, Sooyeon;Kim, Yang-Jin
    • Communications for Statistical Applications and Methods
    • /
    • 제23권6호
    • /
    • pp.555-562
    • /
    • 2016
  • Interval censored data often occur in an observational study where the subject is followed periodically. Instead of observing an exact failure time, two inspection times that include it are available. There are several methods to analyze interval censored failure time data (Sun, 2006). However, in the presence of competing risks, few methods have been suggested to estimate covariate effect on interval censored competing risk data. A sub-distribution hazard model is a commonly used regression model because it has one-to-one correspondence with a cumulative incidence function. Alternatively, Klein and Andersen (2005) proposed a pseudo-value approach that directly uses the cumulative incidence function. In this paper, we consider an extension of the pseudo-value approach into the interval censored data to estimate regression coefficients. The pseudo-values generated from the estimated cumulative incidence function then become response variables in a generalized estimating equation. Simulation studies show that the suggested method performs well in several situations and an HIV-AIDS cohort study is analyzed as a real data example.

Named entity recognition using transfer learning and small human- and meta-pseudo-labeled datasets

  • Kyoungman Bae;Joon-Ho Lim
    • ETRI Journal
    • /
    • 제46권1호
    • /
    • pp.59-70
    • /
    • 2024
  • We introduce a high-performance named entity recognition (NER) model for written and spoken language. To overcome challenges related to labeled data scarcity and domain shifts, we use transfer learning to leverage our previously developed KorBERT as the base model. We also adopt a meta-pseudo-label method using a teacher/student framework with labeled and unlabeled data. Our model presents two modifications. First, the student model is updated with an average loss from both human- and pseudo-labeled data. Second, the influence of noisy pseudo-labeled data is mitigated by considering feedback scores and updating the teacher model only when below a threshold (0.0005). We achieve the target NER performance in the spoken language domain and improve that in the written language domain by proposing a straightforward rollback method that reverts to the best model based on scarce human-labeled data. Further improvement is achieved by adjusting the label vector weights in the named entity dictionary.

결측 데이터 보정법에 의한 의사 데이터로 조정된 예측 최적화 방법 (Predictive Optimization Adjusted With Pseudo Data From A Missing Data Imputation Technique)

  • 김정우
    • 한국산학기술학회논문지
    • /
    • 제20권2호
    • /
    • pp.200-209
    • /
    • 2019
  • 미래 값을 예측할 때, 학습 오차(training error)를 최소화하여 추정된 모형은 보통 많은 테스트 오차(test error)를 야기할 수 있다. 이것은 추정 모델이 주어진 데이터 집합에만 집중하여 발생하는 모델 복잡성에 따른 과적합(overfitting) 문제이다. 일부 정규화 및 리샘플링 방법은 이 문제를 완화하여 테스트 오차를 줄이기 위해 도입되었지만, 이 방법들 또한 주어진 데이터 집합에서만 국한 되도록 설계되었다. 본 논문에서는 테스트 오차 최소화 문제를 학습 오차 최소화 문제로 변환하여 테스트 오차를 줄이기 위한 새로운 최적화 방법을 제안한다. 이 변환을 수행하기 위해 주어진 데이터 집합에 대해 의사(pseudo) 데이터라고 하는 새로운 데이터를 추가하였다. 그리고 적절한 의사 데이터를 만들기 위해 결측 데이터 보정법의 세 가지 유형을 사용하였다. 예측 모델로서 선형회귀모형, 자기회귀모형, ridge 회귀모형을 사용하고 이 모형들에 의사 데이터 방법을 적용하였다. 또한, 의사 데이터로 조정된 최적화 방법을 활용하여 환경 데이터 및 금융 데이터에 적용한 사례를 제시하였다. 결과적으로 이 논문에서 제시된 방법은 원래의 예측 모형보다 테스트 오차를 감소시키는 것으로 나타났다.

KNOTOIDS, PSEUDO KNOTOIDS, BRAIDOIDS AND PSEUDO BRAIDOIDS ON THE TORUS

  • Diamantis, Ioannis
    • 대한수학회논문집
    • /
    • 제37권4호
    • /
    • pp.1221-1248
    • /
    • 2022
  • In this paper we study the theory of knotoids and braidoids and the theory of pseudo knotoids and pseudo braidoids on the torus T. In particular, we introduce the notion of mixed knotoids in S2, that generalizes the notion of mixed links in S3, and we present an isotopy theorem for mixed knotoids. We then generalize the Kauffman bracket polynomial, <; >, for mixed knotoids and we present a state sum formula for <; >. We also introduce the notion of mixed pseudo knotoids, that is, multi-knotoids on two components with some missing crossing information. More precisely, we present an isotopy theorem for mixed pseudo knotoids and we extend the Kauffman bracket polynomial for pseudo mixed knotoids. Finally, we introduce the theories of mixed braidoids and mixed pseudo braidoids as counterpart theories of mixed knotoids and mixed pseudo knotoids, respectively. With the use of the L-moves, that we also introduce here for mixed braidoid equivalence, we formulate and prove the analogue of the Alexander and the Markov theorems for mixed knotoids. We also formulate and prove the analogue of the Alexander theorem for mixed pseudo knotoids.

형태소 발음변이를 고려한 음성인식 단위의 성능 (Performance of speech recognition unit considering morphological pronunciation variation)

  • 방정욱;김상훈;권오욱
    • 말소리와 음성과학
    • /
    • 제10권4호
    • /
    • pp.111-119
    • /
    • 2018
  • This paper proposes a method to improve speech recognition performance by extracting various pronunciations of the pseudo-morpheme unit from an eojeol unit corpus and generating a new recognition unit considering pronunciation variations. In the proposed method, we first align the pronunciation of the eojeol units and the pseudo-morpheme units, and then expand the pronunciation dictionary by extracting the new pronunciations of the pseudo-morpheme units at the pronunciation of the eojeol units. Then, we propose a new recognition unit that relies on pronunciation by tagging the obtained phoneme symbols according to the pseudo-morpheme units. The proposed units and their extended pronunciations are incorporated into the lexicon and language model of the speech recognizer. Experiments for performance evaluation are performed using the Korean speech recognizer with a trigram language model obtained by a 100 million pseudo-morpheme corpus and an acoustic model trained by a multi-genre broadcast speech data of 445 hours. The proposed method is shown to reduce the word error rate relatively by 13.8% in the news-genre evaluation data and by 4.5% in the total evaluation data.

다중 해시함수 기반 데이터 스트림에서의 아이템 의사 주기 탐사 기법 (Finding Pseudo Periods over Data Streams based on Multiple Hash Functions)

  • 이학주;김재완;이원석
    • 한국IT서비스학회지
    • /
    • 제16권1호
    • /
    • pp.73-82
    • /
    • 2017
  • Recently in-memory data stream processing has been actively applied to various subjects such as query processing, OLAP, data mining, i.e., frequent item sets, association rules, clustering. However, finding regular periodic patterns of events in an infinite data stream gets less attention. Most researches about finding periods use autocorrelation functions to find certain changes in periodic patterns, not period itself. And they usually find periodic patterns in time-series databases, not in data streams. Literally a period means the length or era of time that some phenomenon recur in a certain time interval. However in real applications a data set indeed evolves with tiny differences as time elapses. This kind of a period is called as a pseudo-period. This paper proposes a new scheme called FPMH (Finding Periods using Multiple Hash functions) algorithm to find such a set of pseudo-periods over a data stream based on multiple hash functions. According to the type of pseudo period, this paper categorizes FPMH into three, FPMH-E, FPMH-PC, FPMH-PP. To maximize the performance of the algorithm in the data stream environment and to keep most recent periodic patterns in memory, we applied decay mechanism to FPMH algorithms. FPMH algorithm minimizes the usage of memory as well as processing time with acceptable accuracy.

Driving altitude generation method with pseudo-3D building model for unmanned aerial vehicles

  • Hyeon Joong Wi;In Sung Jang;Ahyun Lee
    • ETRI Journal
    • /
    • 제45권2호
    • /
    • pp.240-253
    • /
    • 2023
  • Spatial information is geometrical information combined with the properties of an object. In city areas where unmanned aerial vehicle (UAV) usage demand is high, it is necessary to determine the appropriate driving altitude considering the height of buildings for safe driving. In this study, we propose a data-provision method that generates the driving altitude of UAVs with a pseudo-3D building model. The pseudo-3D building model is developed using high-precision spatial information provided by the National Geographic Information Institute. The proposed method generates the driving altitude of the UAV in terms of tile information, including the UAV's starting and arrival points and a straight line between the two points, and provides the data to users. To evaluate the efficacy of the proposed method, UAV driving altitude information was generated using data of 763 551 pseudo-3D buildings in Seoul. Subsequently, the generated driving altitude data of the UAV was verified in AirSim. In addition, the execution time of the proposed method and the calculated driving altitude were analyzed.

HYPERBOLIC STRUCTURE OF POINTWISE INVERSE PSEUDO-ORBIT TRACING PROPERTY FOR C1 DIFFEOMORPHISMS

  • Manseob Lee
    • 대한수학회논문집
    • /
    • 제38권1호
    • /
    • pp.243-256
    • /
    • 2023
  • We deal with a type of inverse pseudo-orbit tracing property with respect to the class of continuous methods, as suggested and studied by Pilyugin [54]. In this paper, we consider a continuous method induced through the diffeomorphism of a compact smooth manifold, and using the concept, we proved the following: (i) If a diffeomorphism f of a compact smooth manifold M has the robustly pointwise inverse pseudoorbit tracing property, f is structurally stable. (ii) For a C1 generic diffeomorphism f of a compact smooth manifold M, if f has the pointwise inverse pseudo-orbit tracing property, f is structurally stable. (iii) If a diffeomorphism f has the robustly pointwise inverse pseudo-orbit tracing property around a transitive set Λ, then Λ is hyperbolic for f. Finally, (iv) for C1 generically, if a diffeomorphism f has the pointwise inverse pseudo-orbit tracing property around a locally maximal transitive set Λ, then Λ is hyperbolic for f. In addition, we investigate cases of volume preserving diffeomorphisms.

Multiple imputation for competing risks survival data via pseudo-observations

  • Han, Seungbong;Andrei, Adin-Cristian;Tsui, Kam-Wah
    • Communications for Statistical Applications and Methods
    • /
    • 제25권4호
    • /
    • pp.385-396
    • /
    • 2018
  • Competing risks are commonly encountered in biomedical research. Regression models for competing risks data can be developed based on data routinely collected in hospitals or general practices. However, these data sets usually contain the covariate missing values. To overcome this problem, multiple imputation is often used to fit regression models under a MAR assumption. Here, we introduce a multivariate imputation in a chained equations algorithm to deal with competing risks survival data. Using pseudo-observations, we make use of the available outcome information by accommodating the competing risk structure. Lastly, we illustrate the practical advantages of our approach using simulations and two data examples from a coronary artery disease data and hepatocellular carcinoma data.