Computer Science & Electrical

Computer Science & Electrical

Archive
Join as an Editor/Reviewer

Modified Content-Based Filtering Method Using K-Nearest Neighbors and Percentile Concept

Volume: 100  ,  Issue: 1 , May    Published Date: 05 May 2022
Publisher Name: IJRP
Views: 624  ,  Download: 415 , Pages: 20 - 33    
DOI: 10.47119/IJRP1001001520223119

Authors

# Author Name
1 Dan Michael A. Cortez
2 Nathan John J. Cordero
3 Jermaine C. Canlas
4 Khatalyn E. Mata
5 Richard C. Regala
6 Mark Christopher R. Blanco
7 Antolin J. Alipio

Abstract

In the age of information, vast amount of data is within the grasp of everyone. The availability and amount of information is absurd that it could lead to information overload. Recommender systems exist so that it could recommend information that are relevant and appropriate based on user?s preference. Content-based filtering (CBF) is a recommender system approach that focuses solely on user preference and content of an item. CBF works by recommending items that satisfies user?s interest based on user?s previously liked items. CBF suffers the problem of overspecialization or also called the serendipity problem. Overspecialization occurs when the items that are being recommended is very similar to the previously liked item of the user, thus, not being able to recommend unexpected recommendations. The researchers used a pure content-based approach in eliminating the overspecialization problem. The researchers? first method is to use K-Nearest Neighbors (KNN) algorithm to find the nearest neighbors of the top recommended items. The researchers? premise is to recommend similar items of similar items. The researchers? second method is to use the percentile concept in the cosine similarity matrix of all the items. This method lets the researchers prevent overspecialization by recommending items that are in the lower percentiles since overspecialization occurs in the higher percentiles. The result of this study shows that the first and second are effective in preventing overspecialization because these methods recommended unexpected yet relevant items.

Keywords

  • content-based filtering
  • cosine similarity matrix
  • k-nearest neighbors
  • overspecialization
  • percentile method