• Title/Summary/Keyword: machine learning

Search Result 5,182, Processing Time 0.036 seconds

Classification of Malicious Web Pages by Using SVM (SVM을 활용한 악성 웹 페이지 분류)

  • Hwang, Young-Sup;Moon, Jae-Chan;Cho, Seong-Je
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.3
    • /
    • pp.77-83
    • /
    • 2012
  • As web pages provide various services, the distribution of malware via the web pages is being also increased. Malware can make personal information leak, system mal-function and system be zombie. To protect this damages, we should block the malicious web pages. Because the malicious codes embedded in web pages are obfuscated or transformed, it is difficult to detect them using signature-based approaches which are used by current anti-virus software. To overcome this problem, we extracted features to classify malicious web pages and benign ones by analyzing web pages. And we propose a classification method using SVM which is widely used in machine learning. Experimental results show that the proposed method is better than other methods. The proposed method could classify malicious web pages correctly and be helpful to block the distribution of malicious codes.

Malware Classification System to Support Decision Making of App Installation on Android OS (안드로이드 OS에서 앱 설치 의사결정 지원을 위한 악성 앱 분류 시스템)

  • Ryu, Hong Ryeol;Jang, Yun;Kwon, Taekyoung
    • Journal of KIISE
    • /
    • v.42 no.12
    • /
    • pp.1611-1622
    • /
    • 2015
  • Although Android systems provide a permission-based access control mechanism and demand a user to decide whether to install an app based on its permission list, many users tend to ignore this phase. Thus, an improved method is necessary for users to intuitively make informed decisions when installing a new app. In this paper, with regard to the permission-based access control system, we present a novel approach based on a machine-learning technique in order to support a user decision-making on the fly. We apply the K-NN (K-Nearest Neighbors) classification algorithm with necessary weighted modifications for malicious app classification, and use 152 Android permissions as features. Our experiment shows a superior classification result (93.5% accuracy) compared to other previous work. We expect that our method can help users make informed decisions at the installation step.

Learning Multiple Instance Support Vector Machine through Positive Data Distribution (긍정 데이터 분포를 반영한 다중 인스턴스 지지 벡터 기계 학습)

  • Hwang, Joong-Won;Park, Seong-Bae;Lee, Sang-Jo
    • Journal of KIISE
    • /
    • v.42 no.2
    • /
    • pp.227-234
    • /
    • 2015
  • This paper proposes a modified MI-SVM algorithm by considering data distribution. The previous MI-SVM algorithm seeks the margin by considering the "most positive" instance in a positive bag. Positive instances included in positive bags are located in a similar area in a feature space. In order to reflect this characteristic of positive instances, the proposed method selects the "most positive" instance by calculating the distance between each instance in the bag and a pivot point that is the intersection point of all positive instances. This paper suggests two ways to select the "most positive" pivot point in the training data. First, the algorithm seeks the "most positive" pivot point along the current predicted parameter, and then selects the nearest instance in the bag as a representative from the pivot point. Second, the algorithm finds the "most positive" pivot point by using a Diverse Density framework. Our experiments on 12 benchmark multi-instance data sets show that the proposed method results in higher performance than the previous MI-SVM algorithm.

An Emerging Technology Trend Identifier Based on the Citation and the Change of Academic and Industrial Popularity (학계와 산업계의 정보 대중성 변동과 인용 정보에 기반한 최신 기술 동향 식별 시스템)

  • Kim, Seonho;Lee, Junkyu;Rasheed, Waqas;Yeo, Woondong
    • Journal of Korea Technology Innovation Society
    • /
    • v.14 no.spc
    • /
    • pp.1171-1186
    • /
    • 2011
  • Identifying Emerging Technology Trends is crucial for decision makers of nations and organizations in order to use limited resources, such as time, money, etc., efficiently. Many researchers have proposed emerging trend detection systems based on a popularity analysis of the document, but this still needs to be improved. In this paper, an emerging trend detection classifier is proposed which uses both academic and industrial data, SCOPUS and PATSTAT. Unlike most pre-vious research, our emerging technology trend classifi-er utilizes supervised, semi-automatic, machine learning techniques to improve the precision of the results. In addition, the citation information from among the SCOPUS data is analyzed to identify the early signals of emerging technology trends.

  • PDF

The practical use with online database program of cosmetics' raw materials. (화장품원료 온라인 데이터베이스 구축과 활용)

  • Jeon Sang-hoon;Kim Ju-Duck
    • Journal of the Society of Cosmetic Scientists of Korea
    • /
    • v.29 no.2 s.43
    • /
    • pp.233-250
    • /
    • 2003
  • We often use the KCID(Korean Cosmetic Ingredient Dictionary) and ICID(International Cosmetic Ingredient Dictionary) within cosmetics research and within their export and import. so far, we do not have a database of a cosmetics' raw materials. Because of this, we consume a lot of time to find the raw material data that is needed. This study constructs a cosmetics' raw material database and develops the program to retrieve it. We used a Linux machine as the equipment for this study and we used Apache web server, MySQL database server and PHP as the tools of this study. 11,817 kinds of raw materials data were registered as ICID, 866 kinds of raw materials data were registered as KCID and 28,008 kinds of raw materials data with registered trade name into the database. Also, The database was composed of the database of the association form. The database of the online form could ultimately reduce the task time as soon as it did its purpose. The product of this study can become a good basis of data to reconfigure. In the future, it can become a good database in relation with different databases.

Automatic Classification of Advertising Restaurant Blogs Using Machine Learning Techniques (기계학습기법을 이용한 광고 외식 블로그의 자동분류)

  • Chang, Jae-Young;Lee, Byung-Jun;Cho, Se-Jin;Han, Da-Hye;Lee, Kyu-Hong
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.16 no.2
    • /
    • pp.55-62
    • /
    • 2016
  • Recently, users choosing a restaurant basedon information provided by blogs are increasing significantly. However, those of most blogs are unreliable since domestic restaurant blogs are occupied by advertising postings written by 'power bloggers'. Thus, in order to ensure the reliability of blogs, it is necessary to filter the advertising blogs which are sometimes false or exaggerated. In this paper, we propose the method of distinguishing the advertising blogs utilizing an automatic classification technique. In the proposed technique, we first manually collected advertising restaurant blogs, and then analyzed features which are commonly found in those blogs. Using the extracted features, we determined whether a given blog is advertising one applying automatic classification algorithms. Additionally, we select the features and the algorithm which guarantee optimal classification performance through comparative experiments.

Genetic Algorithm for Node P겨ning of Neural Networks (신경망의 노드 가지치기를 위한 유전 알고리즘)

  • Heo, Gi-Su;Oh, Il-Seok
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.2
    • /
    • pp.65-74
    • /
    • 2009
  • In optimizing the neural network structure, there are two methods of the pruning scheme and the constructive scheme. In this paper we use the pruning scheme to optimize neural network structure, and the genetic algorithm to find out its optimum node pruning. In the conventional researches, the input and hidden layers were optimized separately. On the contrary we attempted to optimize the two layers simultaneously by encoding two layers in a chromosome. The offspring networks inherit the weights from the parent. For teaming, we used the existing error back-propagation algorithm. In our experiment with various databases from UCI Machine Learning Repository, we could get the optimal performance when the network size was reduced by about $8{\sim}25%$. As a result of t-test the proposed method was shown better performance, compared with other pruning and construction methods through the cross-validation.

Self-diagnostic system for smartphone addiction using multiclass SVM (다중 클래스 SVM을 이용한 스마트폰 중독 자가진단 시스템)

  • Pi, Su Young
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.1
    • /
    • pp.13-22
    • /
    • 2013
  • Smartphone addiction has become more serious than internet addiction since people can download and run numerous applications with smartphones even without internet connection. However, smartphone addiction is not sufficiently dealt with in current studies. The S-scale method developed by Korea National Information Society Agency involves so many questions that respondents are likely to avoid the diagnosis itself. Moreover, since S-scale is determined by the total score of responded items without taking into account of demographic variables, it is difficult to get an accurate result. Therefore, in this paper, we have extracted important factors from all data, which affect smartphone addiction, including demographic variables. Then we classified the selected items with a neural network. The result of a comparative analysis with backpropagation learning algorithm and multiclass support vector machine shows that learning rate is slightly higher in multiclass SVM. Since multiclass SVM suggested in this paper is highly adaptable to rapid changes of data, we expect that it will lead to a more accurate self-diagnosis of smartphone addiction.

Efficient Mechanism for QFN Solder Defect Detection (QFN 납땜 불량 검출을 위한 효율적인 검사 기법)

  • Kim, Ho-Joong;Cho, Tai-Hoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.05a
    • /
    • pp.367-370
    • /
    • 2016
  • QFN(Quad Flat No-leads package) is one of the SMD(Surface Mount Device). Since there is no lead in QFN, there are many defects on solder. Therefore, we propose an efficient mechanism for QFN solder defect detection at this paper. For this, we employ Convolutional Neural Network(CNN) of the Machine Learning algorithm. QFN solder's color multi-layer images are used to train CNN. Since these images are 3-channel color images, they have a problem with applying to CNN. To solve this problem, we used each 1-channel grayscale image(Red, Blue, Green) that was separated from 3-channel color images. We were able to detect QFN solder defects by using this CNN. Later, further research is needed to detect other QFN.

  • PDF

Similar Patent Search Service System using Latent Dirichlet Allocation (잠재 의미 분석을 적용한 유사 특허 검색 서비스 시스템)

  • Lim, HyunKeun;Kim, Jaeyoon;Jung, Hoekyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.8
    • /
    • pp.1049-1054
    • /
    • 2018
  • Keyword searching used in the past as a method of finding similar patents, and automated classification by machine learning is using in recently. Keyword searching is a method of analyzing data that is formalized through data refinement. While the accuracy for short text is high, long one consisted of several words like as document that is not able to analyze the meaning contained in sentences. In semantic analysis level, the method of automatic classification is used to classify sentences composed of several words by unstructured data analysis. There was an attempt to find similar documents by combining the two methods. However, it have a problem in the algorithm w the methods of analysis are different ways to use simultaneous unstructured data and regular data. In this paper, we study the method of extracting keywords implied in the document and using the LDA(Latent Semantic Analysis) method to classify documents efficiently without human intervention and finding similar patents.