• 제목/요약/키워드: data similarity

검색결과 2,044건 처리시간 0.028초

Similarity Measure Design on High Dimensional Data

  • Nipon, Theera-Umpon;Lee, Sanghyuk
    • 한국융합학회논문지
    • /
    • 제4권1호
    • /
    • pp.43-48
    • /
    • 2013
  • Designing of similarity on high dimensional data was done. Similarity measure between high dimensional data was considered by analysing neighbor information with respect to data sets. Obtained result could be applied to big data, because big data has multiple characteristics compared to simple data set. Definitely, analysis of high dimensional data could be the pre-study of big data. High dimensional data analysis was also compared with the conventional similarity. Traditional similarity measure on overlapped data was illustrated, and application to non-overlapped data was carried out. Its usefulness was proved by way of mathematical proof, and verified by calculation of similarity for artificial data example.

데이터베이스에서 유사도 질의 처리 비용 감소 방법 (A Method of Reducing the Processing Cost of Similarity Queries in Databases)

  • 김선경;박지수;손진곤
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제11권4호
    • /
    • pp.157-162
    • /
    • 2022
  • 오늘날 대부분의 데이터는 데이터베이스(database: DB)에 저장된다. 이러한 DB 환경에서 사용자는 자신이 원하는 데이터를 찾아줄 것을 DB에게 요청하게 된다. DB 질의 중 유사도 질의는 DB 사용자가 원하는 조건으로 유사도가 포함되어 있는 것을 말한다. 그러나 유사도 질의를 처리하기 위한 과정은 처리 레코드의 범위를 줄일 수 있는 색인을 이용하기 힘들어 테이블의 전체 레코드에 대해서 매번 유사도를 계산하는 비용이 높다. 본 논문은 이러한 문제점을 해결하기 위하여 경량 유사도 함수를 정의한다. 경량 유사도 함수는 유사도 함수에 비해 데이터를 여과하는 정확도는 떨어지지만 비용이 유사도 함수에 비하여 적게 소모되는 특징이 있다. 이러한 경량 유사도 함수의 특징을 이용하여 유사도 질의 처리 비용 감소 방법을 제시한다. 그리고 유클리드 거리 함수에 경량 유사도 함수로 체비쇼프 거리를 제시하고 기존의 유사도 함수를 이용하는 질의와 경량 유사도 함수를 이용하는 질의의 처리 비용을 비교한다. 그리고 실험을 통하여 유클리드 유사도에 대한 경량 유사도 함수로 체비쇼프 거리를 적용하였을 때 유사도 질의 처리 비용이 감소하는 것을 확인한다.

아이템의 유사도를 고려한 트랜잭션 클러스터링 (Transactions Clustering based on Item Similarity)

  • 이상욱;김재련
    • 한국지능정보시스템학회:학술대회논문집
    • /
    • 한국지능정보시스템학회 2002년도 추계정기학술대회
    • /
    • pp.250-257
    • /
    • 2002
  • Clustering is a data mining method, which consists in discovering interesting data distributions in very large databases. In traditional data clustering, similarity of a cluster of object is measured by pairwise similarity of objects in that paper. In view of the nature of clustering transactions, we devise in this paper a novel measurement called item similarity and utilize this to perform clustering. With this item similarity measurement, we develop an efficient clustering algorithm for target marketing in each group.

  • PDF

Clustering method for similar user with Miexed Data in SNS

  • Song, Hyoung-Min;Lee, Sang-Joon;Kwak, Ho-Young
    • 한국컴퓨터정보학회논문지
    • /
    • 제20권11호
    • /
    • pp.25-30
    • /
    • 2015
  • The enormous increase of data with the development of the information technology make internet users to be hard to find suitable information tailored to their needs. In the face of changing environment, the information filtering method, which provide sorted-out information to users, is becoming important. The data on the internet exists as various type. However, similarity calculation algorithm frequently used in existing collaborative filtering method is tend to be suitable to the numeric data. In addition, in the case of the categorical data, it shows the extreme similarity like Boolean Algebra. In this paper, We get the similarity in SNS user's information which consist of the mixed data using the Gower's similarity coefficient. And we suggest a method that is softer than radical expression such as 0 or 1 in categorical data. The clustering method using this algorithm can be utilized in SNS or various recommendation system.

Information Quantification Application to Management with Fuzzy Entropy and Similarity Measure

  • Wang, Hong-Mei;Lee, Sang-Hyuk
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제10권4호
    • /
    • pp.275-280
    • /
    • 2010
  • Verification of efficiency in data management fuzzy entropy and similarity measure were discussed and verified by applying reliable data selection problem and numerical data similarity evaluation. In order to calculate the certainty or uncertainty fuzzy entropy and similarity measure are designed and proved. Designed fuzzy entropy and similarity are considered as dissimilarity measure and similarity measure, and the relation between two measures are explained through graphical illustration. Obtained measures are useful to the application of decision theory and mutual information analysis problem. Extension of data quantification results based on the proposed measures are applicable to the decision making and fuzzy game theory.

Similarity measurement based on Min-Hash for Preserving Privacy

  • Cha, Hyun-Jong;Yang, Ho-Kyung;Song, You-Jin
    • International Journal of Advanced Culture Technology
    • /
    • 제10권2호
    • /
    • pp.240-245
    • /
    • 2022
  • Because of the importance of the information, encryption algorithms are heavily used. Raw data is encrypted and secure, but problems arise when the key for decryption is exposed. In particular, large-scale Internet sites such as Facebook and Amazon suffer serious damage when user data is exposed. Recently, research into a new fourth-generation encryption technology that can protect user-related data without the use of a key required for encryption is attracting attention. Also, data clustering technology using encryption is attracting attention. In this paper, we try to reduce key exposure by using homomorphic encryption. In addition, we want to maintain privacy through similarity measurement. Additionally, holistic similarity measurements are time-consuming and expensive as the data size and scope increases. Therefore, Min-Hash has been studied to efficiently estimate the similarity between two signatures Methods of measuring similarity that have been studied in the past are time-consuming and expensive as the size and area of data increases. However, Min-Hash allowed us to efficiently infer the similarity between the two sets. Min-Hash is widely used for anti-plagiarism, graph and image analysis, and genetic analysis. Therefore, this paper reports privacy using homomorphic encryption and presents a model for efficient similarity measurement using Min-Hash.

Information Management by Data Quantification with FuzzyEntropy and Similarity Measure

  • Siang, Chua Hong;Lee, Sanghyuk
    • 한국융합학회논문지
    • /
    • 제4권2호
    • /
    • pp.35-41
    • /
    • 2013
  • Data management with fuzzy entropy and similarity measure were discussed and verified by applying reliable data selection problem. Calculation of certainty or uncertainty for data, fuzzy entropy and similarity measure are designed and proved. Proposed fuzzy entropy and similarity are considered as dissimilarity measure and similarity measure, and the relation between two measures are explained through graphical illustration.Obtained measures are useful to the application of decision theory and mutual information analysis problem. Extension of data quantification results based on the proposed measures are applicable to the decision making and fuzzy game theory.

유사측도를 이용한 신뢰성 있는 데이터의 추출 (Reliable Data Selection using Similarity Measure)

  • 류수록;이상혁
    • 한국지능시스템학회논문지
    • /
    • 제18권2호
    • /
    • pp.200-205
    • /
    • 2008
  • 데이터 분석을 위하여 데이터의 불확실성에 대한 측도로서 퍼지 집합에 대한 엔트로피를 소개하였고, 또한 데이터간의 유사도를 나타내는 유사측도를 구성하였다. 퍼지 소속 함수간의 유사측도는 거리측도를 이용하여 구성하였고, 제안한 유사측도를 증명을 통하여 확인하였다. 제안한 유사측도의 유용성을 확인하기 위하여 신뢰성 있는 데이터추출 예제에 적용하였다. 적용결과를 퍼지 엔트로피와 통계적 지식을 통하여 얻어진 이전의 결과와 비교하였다.

Relation between Certainty and Uncertainty with Fuzzy Entropy and Similarity Measure

  • Lee, Sanghyuk;Zhai, Yujia
    • 한국융합학회논문지
    • /
    • 제5권4호
    • /
    • pp.155-161
    • /
    • 2014
  • We survey the relation of fuzzy entropy measure and similarity measure. Each measure represents features of data uncertainty and certainty between comparative data group. With the help of one-to-one correspondence characteristics, distance measure and similarity measure have been expressed by the complementary characteristics. We construct similarity measure using distance measure, and verification of usefulness is proved. Furthermore analysis of similarity measure from fuzzy entropy measure is also discussed.

선택집합의 변화를 통하여 도출된 선호도 및 유사성 정보를 활용한 포지셔닝 우위 평가 (Evaluation of Positioning Effectiveness Based on the Preference and Similarity Data Derived from Consumers' Choice from Different Choice Sets)

  • 원지성
    • 경영과학
    • /
    • 제28권1호
    • /
    • pp.61-74
    • /
    • 2011
  • Not only the preference data but also the similarity data can be used for developing effective marketing strategies. Hahn et al.[10] proposes a methodology of representing a brand(focal brand)'s competitors in a single map called the Preference-Similarity Map, according to their relative preference to and similarity with the focal brand. They also proposes a way to derive the relative preference and similarity values from the survey collecting the choice data from differing choice sets. This study identifies the limitations of the preference and similarity measures proposed by Hahn et al.[10] and shows how these measures can be revised. This study also proposes how to implement the revised measures and analyze brands' positioning strategies. Based on the results of the previous studies on the effect of inter brand similarity on brand evaluations, this study assumes that it is important to analyze how much a specific brand is preferred to its close competitors when evaluating the effectiveness of the brand's positioning in the market. This study applies the proposed measures to the data used in Hahn et al.[10] and also show how the proposed measures are related to the parameters of the choice model proposed by Batsell and Polking[1].