• 제목/요약/키워드: news text

검색결과 375건 처리시간 0.052초

Social Media Fake News in India

  • Al-Zaman, Md. Sayeed
    • Asian Journal for Public Opinion Research
    • /
    • 제9권1호
    • /
    • pp.25-47
    • /
    • 2021
  • This study analyzes 419 fake news items published in India, a fake-news-prone country, to identify the major themes, content types, and sources of social media fake news. The results show that fake news shared on social media has six major themes: health, religion, politics, crime, entertainment, and miscellaneous; eight types of content: text, photo, audio, and video, text & photo, text & video, photo & video, and text & photo & video; and two main sources: online sources and the mainstream media. Health-related fake news is more common only during a health crisis, whereas fake news related to religion and politics seems more prevalent, emerging from online media. Text & photo and text & video have three-fourths of the total share of fake news, and most of them are from online media: online media is the main source of fake news on social media as well. On the other hand, mainstream media mostly produces political fake news. This study, presenting some novel findings that may help researchers to understand and policymakers to control fake news on social media, invites more academic investigations of religious and political fake news in India. Two important limitations of this study are related to the data source and data collection period, which may have an impact on the results.

Urdu News Classification using Application of Machine Learning Algorithms on News Headline

  • Khan, Muhammad Badruddin
    • International Journal of Computer Science & Network Security
    • /
    • 제21권2호
    • /
    • pp.229-237
    • /
    • 2021
  • Our modern 'information-hungry' age demands delivery of information at unprecedented fast rates. Timely delivery of noteworthy information about recent events can help people from different segments of life in number of ways. As world has become global village, the flow of news in terms of volume and speed demands involvement of machines to help humans to handle the enormous data. News are presented to public in forms of video, audio, image and text. News text available on internet is a source of knowledge for billions of internet users. Urdu language is spoken and understood by millions of people from Indian subcontinent. Availability of online Urdu news enable this branch of humanity to improve their understandings of the world and make their decisions. This paper uses available online Urdu news data to train machines to automatically categorize provided news. Various machine learning algorithms were used on news headline for training purpose and the results demonstrate that Bernoulli Naïve Bayes (Bernoulli NB) and Multinomial Naïve Bayes (Multinomial NB) algorithm outperformed other algorithms in terms of all performance parameters. The maximum level of accuracy achieved for the dataset was 94.278% by multinomial NB classifier followed by Bernoulli NB classifier with accuracy of 94.274% when Urdu stop words were removed from dataset. The results suggest that short text of headlines of news can be used as an input for text categorization process.

텍스트 마이닝과 기계 학습을 이용한 국내 가짜뉴스 예측 (Fake News Detection for Korean News Using Text Mining and Machine Learning Techniques)

  • 윤태욱;안현철
    • Journal of Information Technology Applications and Management
    • /
    • 제25권1호
    • /
    • pp.19-32
    • /
    • 2018
  • Fake news is defined as the news articles that are intentionally and verifiably false, and could mislead readers. Spread of fake news may provoke anxiety, chaos, fear, or irrational decisions of the public. Thus, detecting fake news and preventing its spread has become very important issue in our society. However, due to the huge amount of fake news produced every day, it is almost impossible to identify it by a human. Under this context, researchers have tried to develop automated fake news detection method using Artificial Intelligence techniques over the past years. But, unfortunately, there have been no prior studies proposed an automated fake news detection method for Korean news. In this study, we aim to detect Korean fake news using text mining and machine learning techniques. Our proposed method consists of two steps. In the first step, the news contents to be analyzed is convert to quantified values using various text mining techniques (Topic Modeling, TF-IDF, and so on). After that, in step 2, classifiers are trained using the values produced in step 1. As the classifiers, machine learning techniques such as multiple discriminant analysis, case based reasoning, artificial neural networks, and support vector machine can be applied. To validate the effectiveness of the proposed method, we collected 200 Korean news from Seoul National University's FactCheck (http://factcheck.snu.ac.kr). which provides with detailed analysis reports from about 20 media outlets and links to source documents for each case. Using this dataset, we will identify which text features are important as well as which classifiers are effective in detecting Korean fake news.

Automatic Name Line Detection for Person Indexing Based on Overlay Text

  • Lee, Sanghee;Ahn, Jungil;Jo, Kanghyun
    • Journal of Multimedia Information System
    • /
    • 제2권1호
    • /
    • pp.163-170
    • /
    • 2015
  • Many overlay texts are artificially superimposed on the broadcasting videos by humans. These texts provide additional information to the audiovisual content. Especially, the overlay text in news videos contains concise and direct description of the content. Therefore, it is most reliable clue for constructing a news video indexing system. To make the automatic person indexing of interview video in the TV news program, this paper proposes the method to only detect the name text line among the whole overlay texts in one frame. The experimental results on Korean television news videos show that the proposed framework efficiently detects the overlaid name text line.

뉴스 비디오 시퀀스에서 텍스트 시작 프레임 검출 방법의 비교 (Comparison of Text Beginning Frame Detection Methods in News Video Sequences)

  • 이상희;안정일;조강현
    • 방송공학회논문지
    • /
    • 제21권3호
    • /
    • pp.307-318
    • /
    • 2016
  • Overlay texts are artificially superimposed on the broadcasting videos by human producers. These texts provide additional information to the audiovisual content. Especially, the overlay texts in news video contain concise and direct description of the content. Therefore, it is most reliable clue for constructing a news video indexing system. To make this indexing system in the TV news program, it is important to detect and recognize the texts. This paper proposes the identification of the overlay text beginning frame to help the detection and recognition of the overlay text in news videos. Since all frames in the video sequences do not contain the overlay texts, the overlay text extraction from every frame is unnecessary and time-wasting. Therefore, to focus on only the frame containing the overlay text can be enhanced the accuracy of the overlay text detection. The comparative experiments of the text beginning frame identification methods were carried out with respect to Korean television news videos. Then the appropriate processing method is proposed.

신문기사와 소셜 미디어를 활용한 한국어 문서요약 데이터 구축 (Building a Korean Text Summarization Dataset Using News Articles of Social Media)

  • 이경호;박요한;이공주
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제9권8호
    • /
    • pp.251-258
    • /
    • 2020
  • 문서 요약을 위한 학습 데이터는 문서와 그 요약으로 구성된다. 기존의 문서 요약 데이터는 사람이 수동으로 요약을 작성하였기 때문에 대량의 데이터 확보가 어려웠다. 그렇기 때문에 온라인으로 쉽게 수집 가능하며 문서의 품질이 우수한 인터넷 신문기사가 문서 요약 연구에 많이 활용되어 왔다. 본 연구에서는 언론사가 소셜 미디어에 게시한 설명글과 제목, 부제를 본문의 요약으로 사용하여 한국어 문서 요약 데이터를 구성하는 것을 제안한다. 약 425,000개의 신문기사와 그 요약데이터를 구축할 수 있었다. 구성한 데이터의 유용성을 보이기 위해 추출 요약 시스템을 구현하였다. 본 연구에서 구축한 데이터로 학습한 교사 학습 모델과 비교사 학습 모델의 성능을 비교하였다. 실험 결과 제안한 데이터로 학습한 모델이 비교사 학습 알고리즘에 비해 더 높은 ROUGE 점수를 보였다.

조현병 관련 주요 일간지 기사에 대한 텍스트 마이닝 분석 (Text-Mining Analyses of News Articles on Schizophrenia)

  • 남희정;류승형
    • 대한조현병학회지
    • /
    • 제23권2호
    • /
    • pp.58-64
    • /
    • 2020
  • Objectives: In this study, we conducted an exploratory analysis of the current media trends on schizophrenia using text-mining methods. Methods: First, web-crawling techniques extracted text data from 575 news articles in 10 major newspapers between 2018 and 2019, which were selected by searching "schizophrenia" in the Naver News. We had developed document-term matrix (DTM) and/or term-document matrix (TDM) through pre-processing techniques. Through the use of DTM and TDM, frequency analysis, co-occurrence network analysis, and topic model analysis were conducted. Results: Frequency analysis showed that keywords such as "police," "mental illness," "admission," "patient," "crime," "apartment," "lethal weapon," "treatment," "Jinju," and "residents" were frequently mentioned in news articles on schizophrenia. Within the article text, many of these keywords were highly correlated with the term "schizophrenia" and were also interconnected with each other in the co-occurrence network. The latent Dirichlet allocation model presented 10 topics comprising a combination of keywords: "police-Jinju," "hospital-admission," "research-finding," "care-center," "schizophrenia-symptom," "society-issue," "family-mind," "woman-school," and "disabled-facilities." Conclusion: The results of the present study highlight that in recent years, the media has been reporting violence in patients with schizophrenia, thereby raising an important issue of hospitalization and community management of patients with schizophrenia.

언어모델 인터뷰 영향 평가를 통한 텍스트 균형 및 사이즈간의 통계 분석 (Statistical Analysis Between Size and Balance of Text Corpus by Evaluation of the effect of Interview Sentence in Language Modeling)

  • 정의정;이영직
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 2002년도 하계학술발표대회 논문집 제21권 1호
    • /
    • pp.87-90
    • /
    • 2002
  • This paper analyzes statistically the relationship between size and balance of text corpus by evaluation of the effect of interview sentences in language model for Korean broadcast news transcription system. Our Korean broadcast news transcription system's ultimate purpose is to recognize not interview speech, but the anchor's and reporter's speech in broadcast news show. But the gathered text corpus for constructing language model consists of interview sentences a portion of the whole, $15\%$ approximately. The characteristic of interview sentence is different from the anchor's and the reporter's in one thing or another. Therefore it disturbs the anchor and reporter oriented language modeling. In this paper, we evaluate the effect of interview sentences in language model for Korean broadcast news transcription system and analyze statistically the relationship between size and balance of text corpus by making an experiment as the same procedure according to varying the size of corpus.

  • PDF

인터넷신문의 뉴스기사 페이지 구성과 콘텐츠에 대한 분석 -네이버, 다음, 네이트, 야후를 중심으로- (An Analysis of the Contents and Make-up of the Page in a News Story of the Internet Newspaper -focusing on Naver, Daum, Nate, Yahoo-)

  • 박광순
    • 한국산학기술학회논문지
    • /
    • 제15권3호
    • /
    • pp.1345-1354
    • /
    • 2014
  • 본 연구는 인터넷신문의 뉴스기사 본문 페이지 구성과 뉴스기사 본문 주변 공간의 콘텐츠 유형을 비교 분석하였다. 분석결과 네이버 뉴스기사 본문 페이지의 포맷은 다음, 네이트, 야후의 뉴스기사 본문 페이지보다 더 복잡하게 구성되었다. 또한 네이버는 다른 세 포털에 비해 광고 수, 광고 유형, 오락 콘텐츠, 다양한 유형의 콘텐츠가 더 높게 게재되었다. 특히 연예인 관련 뉴스기사의 게재 비율이 다른 포털사이트에 비해 높았다. 뉴스기사 본문 페이지에 뉴스기사를 가장 많이 게재한 포털사이트는 다음이었으며, 광고를 가장 적게 게재한 포털사이트는 야후였다. 그러나 전체적으로 볼 때, 이들 세 포털사이트의 뉴스기사 페이지의 포맷과 콘텐츠는 매우 유사하게 구성되었다. 결론적으로 독자들의 광고회피와 뉴스기사의 다양성 측면에서의 뉴스기사 이용의 편리성은 포털사이트의 뉴스서비스가 언론사닷컴의 뉴스서비스 보다 더 높은 것으로 평가할 수 있다.

Joint Hierarchical Semantic Clipping and Sentence Extraction for Document Summarization

  • Yan, Wanying;Guo, Junjun
    • Journal of Information Processing Systems
    • /
    • 제16권4호
    • /
    • pp.820-831
    • /
    • 2020
  • Extractive document summarization aims to select a few sentences while preserving its main information on a given document, but the current extractive methods do not consider the sentence-information repeat problem especially for news document summarization. In view of the importance and redundancy of news text information, in this paper, we propose a neural extractive summarization approach with joint sentence semantic clipping and selection, which can effectively solve the problem of news text summary sentence repetition. Specifically, a hierarchical selective encoding network is constructed for both sentence-level and document-level document representations, and data containing important information is extracted on news text; a sentence extractor strategy is then adopted for joint scoring and redundant information clipping. This way, our model strikes a balance between important information extraction and redundant information filtering. Experimental results on both CNN/Daily Mail dataset and Court Public Opinion News dataset we built are presented to show the effectiveness of our proposed approach in terms of ROUGE metrics, especially for redundant information filtering.