The dataset used in this project is the version 1: User and Item Data of the Steam Video Game and Bundle Data. This data contains User-Item data from the Steam Video game platform.
Below are the citation of these datasets:
Self-attentive sequential recommendation Wang-Cheng Kang, Julian McAuley ICDM, 2018
Item recommendation on monotonic behavior chains Mengting Wan, Julian McAuley RecSys, 2018
Generating and personalizing bundle recommendations on Steam Apurva Pathak, Kshitiz Gupta, Julian McAuley SIGIR, 2017
import ast
from pandas import DataFrame
from collections import defaultdict
f = open("australian_users_items.json", encoding = "utf8")
df = []
for line in f:
line = f.readline()
# validating python code before append to df
df.append(ast.literal_eval(line))
all_users = []
for d in df:
user = d['user_id']
all_users.append(user)
user_df = DataFrame(all_users, columns = ['user_ids'])
user_df
Observation
# total user elements vs. number of unique user elements in this dataset
len(df),len(user_df['user_ids'].unique())
Observation
all_items = []
for d in df:
length = len(d['items'])
for m in range(length):
items = d['items'][m]['item_id']
all_items.append(items)
items_df = DataFrame(all_items, columns = ['item_ids'])
items_df
# number of unique item elements in this dataset
len(items_df['item_ids'].unique())
# data example
dict(list(df[0].items())[0:4])
df gives a general information on the total item_count (number of games), steamd_id, user_url, and items (the details of each item) by user_id. For example, for the first user with user_id of 'js41637', this user is associated with 888 items in total.
Due to the hardware memory capacity constraints, I have excluded element items in the above code block. I will show a portion of the items element in the next code block.
# below shows the first 5 items and their elements
# that user_id of 'js41637' is associated with
dict(list(df[0].items()))['items'][0:5]
usersPerItem = defaultdict(set)
itemsPerUser = defaultdict(set)
itemNames = {}
for d in df:
user = d['user_id']
length = len(d['items'])
for n in range(length):
item = d['items'][n]['item_id']
usersPerItem[item].add(user)
itemsPerUser[user].add(item)
itemNames[item] = d['items'][n]['item_name']
def Jaccard(s1, s2):
numer = len(s1.intersection(s2))
denom = len(s1.union(s2))
return numer / denom
# determine what is similar within the dataset
# it takes in "item_id"
# and n which is the number of similar items we would like
def mostSimilar(ID, n):
similarities = []
users = usersPerItem[ID]
for i in usersPerItem:
if i == ID: continue
sim = Jaccard(users, usersPerItem[i])
similarities.append((sim, i))
similarities.sort(reverse = True)
return similarities[:n]
# the first item_id from the very first user's item list
query = df[0]['items'][0]['item_id']
query
# getting the item_name of the item_id of 10
itemNames[query]
# gives the Jaccard similarity measure
# and 10 items that are most similar to the input "item_id"
# outputs in most similar to least similar order
mostSimilar(query,10)
# code above gives 10 most similar item_id to "Counter-Stike"
# below shows what these item names are for these 10 item_id
[itemNames[x[1]] for x in mostSimilar(query,10)]
# the 100th item_id from the very first user's item list
query1 = df[0]['items'][100]['item_id']
query1
# the item_name of the item_id
itemNames[query1]
mostSimilar(query1,5)
[itemNames[x[1]] for x in mostSimilar(query1,5)]
# the 15th item_id from the 10th user's item list
query2 = df[10]['items'][15]['item_id']
query2
# the item_name of the item_id
itemNames[query2]
mostSimilar(query2,8)
[itemNames[x[1]] for x in mostSimilar(query2,8)]
Similarity-based recommender system recommends similar items to user based on their input of item which they already have experience with. In this case, it recommends games based on the similarities between the input game and the game in the dataset. This project utilizes the Jaccard similarity measure, but other alternatives such as cosine similarity can also be used.