Computer Science & Electrical

Computer Science & Electrical

Assessing Sentence Similarity using Lexical and semantic Analysis for Text Summarization using Neural Network.

Pages: 5  ,  Volume: 4  ,  Issue: 1 , May   2018
Received: 13 May 2018  ,  Published: 19 May 2018
Views: 93  ,  Download: 0

Authors

# Author Name
1 Abujar Onkon
2 Md. Shahidul Islam
3 Abu Abed Md. Shohaeb

Abstract

This paper has presented sentence similarity measure using lexical and semantic similarity. Degree of similarity was mentioned and implemented in the proposed method. There are few resources available for Bengali language. More development on Bengali language is just more than essential. Bengali WordNet is not stable as like other WordNet available for English language. The key challenges of Natural language Processing is to identify the meaning of any text. Text Summarization is one of the most challenging applications in the field of Natural Language Processing. An expert Text Summarizer need proper analysis of given input text. To identify the degree of relationship among input sentences will help to reduce the inclusion of unimportant sentences in summarized text. This is the objective of this research, to identify similar sentences. Result of summarized text always may not identify by optimal functions, rather a better summarized result could be found by measuring sentence similarities. The current sentence similarity measuring methods only find out the similarity between words and sentences. These methods states only syntactic information of every sentence. There are two major problems to identify similarities between sentences; such problems were never addressed by previous proposed strategies: provide the ultimate meaning of the sentence and added the word order, approximately. In this paper, the main objective was tried to measure sentence similarities, which will help to summarize any Language text, though specially considered for English and Bengali language. The experiment exhibited a proposed method of measuring English and Bengali sentence similarity. Results will states the outstanding performances of our proposed algorithms. Text summarization follows two different methods: Extractive and Abstractive method. Sentence similarity can play a vital role in both, Abstractive and Extractive text summarization approach. Through a proper measurement of sentence similarity, centroid sentences could be extracted and considered as a main and/or leading sentence.

Keywords

  • Sentence Similarity
  • Text Summarization
  • Benglai Summarization
  • Sentence clustering
  • Deep Learning
  • References

    [1] Rafael Ferreira et al. “Assessing Sentence Scoring Techniques for Extractive Text Summarization”, Elsevier Ltd., Expert Systems with Applications 40 (2013) 5755-5764.
    [2] Abujar, Sheikh, and Mahmudul Hasan. "A comprehensive text analysis for Bengali TTS using Unicode." Informatics, Electronics and Vision (ICIEV), 2016 5th International Conference on. IEEE, 2016.
    [3] Abujar, Sheikh, et al. "A Heuristic Approach of Text Summarization for Bengali Documentation." 8th International Conference on Computing, Communication and Networking (8th ICCCNT), 2017 8th International Conference on. IEEE,2017.
    [4] Lee, Ming Che. "A novel sentence similarity measure for semantic-based expert systems." Expert Systems with Applications 38.5 (2011): 6392-6399.
    [5] Mani, Inderjeet, and Mark T. Maybury, eds. Advances in automatic text summarization. Vol. 293. Cambridge, MA: MIT press, 1999.
    [6] Oliva, Jesús, et al. "SyMSS: A syntax-based measure for short-text semantic similarity." Data & Knowledge Engineering 70.4 (2011): 390-405.
    [7] Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R., 1990. Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41 (6), 391–407.
    [8] Han, L., Kashyap, A.L., Finin, T., Mayfield, J., Weese, J., 2013. UMBC EBIQUITY-CORE: semantic textual similarity systems. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, Association for Computational Linguistics, Atlanta, Georgia, USA, June, pp. 44–52.
    [9] Miller, G.A., 1995. Wordnet: a lexical database for English. Commun. ACM 38, 39–41.
    [10] Chang, C.-C., Lin, C.-J., 2011. LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2 (May (3)), 27, 1–27.
    [11] Mihalcea, R., Corley, C., Strapparava, C., 2006. Corpus-based and knowledge-based measures of text semantic similarity. In: Proceedings of the 21st National Conference on Artificial Intelligence - Volume 1. AAAI Press, Boston, Massachusetts, pp. 775–780.
    [12] Heilman, M., Smith, N.A., 2010. Tree edits models for recognizing textual entailments, paraphrases, and answers to questions. In: Proceedings of 399 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 1011–1019.
    [13] Heilman, M., Smith, N.A., 2010. Tree edit models for recognizing textual entailments, paraphrases, and answers to questions. In: Proceedings of Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 1011– 1019.
    [14] Qiu, L., Kan, M.-Y., Chua, T.-S., 2006. Paraphrase recognition via dissimilarity significance classification. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 18–26.
    [15] Dzikovska, Myroslava O., et al. ”Intelligent tutoring with natural language support in the Beetle II system.”Sustaining TEL: From Innovation to Learning and Practice. Springer Berlin Heidelberg, 2010. 620-625.
    [16] Jurgens, David, Mohammad Taher Pilehvar, and Roberto Navigli. ”SemEval-2014 Task 3: Cross-level semantic similarity.” SemEval 2014 (2014): 17.
    [17] Mikolov, Tomas, et al. ”Extensions of recurrent neural network language model.” Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2011.
    [18] Pennington, Jeffrey, Richard Socher, and Christopher D. Manning. ”Glove: Global vectors for word represen- tation.” Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014) 12 (2014).
    [19] Rashtchian, Cyrus, et al. ”Collecting image annotations using Amazon’s Mechanical Turk.” Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk. Association for Computational Linguistics, 2010.
    [20] Socher, Richard, et al. ”Parsing natural scenes and natural language with recursive neural networks.” Proceed-ings of the 28th international conference on machine learning (ICML-11). 2011