Similarity-based Game Recommender System¶

Author: Zhanglin Liu

Date: 12/06/2020

Background¶

The dataset used in this project is the version 1: User and Item Data of the Steam Video Game and Bundle Data. This data contains User-Item data from the Steam Video game platform.

Below are the citation of these datasets:

Self-attentive sequential recommendation Wang-Cheng Kang, Julian McAuley ICDM, 2018

Item recommendation on monotonic behavior chains Mengting Wan, Julian McAuley RecSys, 2018

Generating and personalizing bundle recommendations on Steam Apurva Pathak, Kshitiz Gupta, Julian McAuley SIGIR, 2017

Data Exploration¶

Loading Dataset¶

import ast
from pandas import DataFrame
from collections import defaultdict

f = open("australian_users_items.json", encoding = "utf8")

df = []

for line in f:
    line = f.readline()
    # validating python code before append to df
    df.append(ast.literal_eval(line))

all_users = []
for d in df:
    user = d['user_id']
    all_users.append(user)

user_df = DataFrame(all_users, columns = ['user_ids'])
user_df

Observation

There are 44155 user_id elements in this dataset
these elements are of string type

# total user elements vs. number of unique user elements in this dataset
len(df),len(user_df['user_ids'].unique())

(44155, 44012)

Observation

there are 143 non-unique user_id elements

all_items = []
for d in df:
    length = len(d['items'])
    for m in range(length):
        items = d['items'][m]['item_id']
        all_items.append(items)

items_df = DataFrame(all_items, columns = ['item_ids'])
items_df

# number of unique item elements in this dataset
len(items_df['item_ids'].unique())

10397

# data example
dict(list(df[0].items())[0:4])

{'user_id': 'js41637',
 'items_count': 888,
 'steam_id': '76561198035864385',
 'user_url': 'http://steamcommunity.com/id/js41637'}

df gives a general information on the total item_count (number of games), steamd_id, user_url, and items (the details of each item) by user_id. For example, for the first user with user_id of 'js41637', this user is associated with 888 items in total.

Due to the hardware memory capacity constraints, I have excluded element items in the above code block. I will show a portion of the items element in the next code block.

# below shows the first 5 items and their elements 
# that user_id of 'js41637' is associated with
dict(list(df[0].items()))['items'][0:5]

[{'item_id': '10',
  'item_name': 'Counter-Strike',
  'playtime_forever': 0,
  'playtime_2weeks': 0},
 {'item_id': '80',
  'item_name': 'Counter-Strike: Condition Zero',
  'playtime_forever': 0,
  'playtime_2weeks': 0},
 {'item_id': '100',
  'item_name': 'Counter-Strike: Condition Zero Deleted Scenes',
  'playtime_forever': 0,
  'playtime_2weeks': 0},
 {'item_id': '300',
  'item_name': 'Day of Defeat: Source',
  'playtime_forever': 220,
  'playtime_2weeks': 0},
 {'item_id': '30',
  'item_name': 'Day of Defeat',
  'playtime_forever': 0,
  'playtime_2weeks': 0}]

Data Preparation¶

usersPerItem = defaultdict(set)
itemsPerUser = defaultdict(set)

itemNames = {}

for d in df:
        user = d['user_id']
        length = len(d['items'])
        for n in range(length):
            item = d['items'][n]['item_id']
            usersPerItem[item].add(user)
            itemsPerUser[user].add(item)
            itemNames[item] = d['items'][n]['item_name']

Jaccard Similarity Measure¶

def Jaccard(s1, s2):
    numer = len(s1.intersection(s2))
    denom = len(s1.union(s2))
    return numer / denom

# determine what is similar within the dataset
# it takes in "item_id" 
# and n which is the number of similar items we would like
def mostSimilar(ID, n):
    similarities = []
    users = usersPerItem[ID]
    for i in usersPerItem:
        if i == ID: continue
        sim = Jaccard(users, usersPerItem[i])
        similarities.append((sim, i))
    similarities.sort(reverse = True)
    return similarities[:n]

Getting Recommendation¶

Recommendation #1¶

# the first item_id from the very first user's item list
query = df[0]['items'][0]['item_id']
query

'10'

# getting the item_name of the item_id of 10
itemNames[query]

'Counter-Strike'

# gives the Jaccard similarity measure 
# and 10 items that are most similar to the input "item_id"
# outputs in most similar to least similar order
mostSimilar(query,10)

[(0.9064674580433892, '80'),
 (0.9064674580433892, '100'),
 (0.33818210410441896, '240'),
 (0.3342730567861458, '30'),
 (0.3333333333333333, '40'),
 (0.33265430841311877, '60'),
 (0.28612670408981555, '20'),
 (0.2855763039278815, '50'),
 (0.2852502583788572, '70'),
 (0.28520556814503073, '130')]

# code above gives 10 most similar item_id to "Counter-Stike"
# below shows what these item names are for these 10 item_id
[itemNames[x[1]] for x in mostSimilar(query,10)]

['Counter-Strike: Condition Zero',
 'Counter-Strike: Condition Zero Deleted Scenes',
 'Counter-Strike: Source',
 'Day of Defeat',
 'Deathmatch Classic',
 'Ricochet',
 'Team Fortress Classic',
 'Half-Life: Opposing Force',
 'Half-Life',
 'Half-Life: Blue Shift']

Recommendation #2¶

# the 100th item_id from the very first user's item list
query1 = df[0]['items'][100]['item_id']
query1

'22380'

# the item_name of the item_id 
itemNames[query1]

'Fallout: New Vegas'

mostSimilar(query1,5)

[(0.37277462489310426, '72850'),
 (0.3415632246623926, '22370'),
 (0.3224101479915433, '8870'),
 (0.31585220500595945, '377160'),
 (0.31223010487353486, '49520')]

[itemNames[x[1]] for x in mostSimilar(query1,5)]

['The Elder Scrolls V: Skyrim',
 'Fallout 3 - Game of the Year Edition',
 'BioShock Infinite',
 'Fallout 4',
 'Borderlands 2']

Recommendation #3¶

# the 15th item_id from the 10th user's item list
query2 = df[10]['items'][15]['item_id']
query2

'12900'

# the item_name of the item_id 
itemNames[query2]

'Audiosurf'

mostSimilar(query2,8)

[(0.19985264321237797, '40800'),
 (0.19606612261979495, '107100'),
 (0.189873417721519, '3830'),
 (0.18828828828828828, '57300'),
 (0.18545454545454546, '22000'),
 (0.17699115044247787, '50620'),
 (0.17512420156139105, '48000'),
 (0.1749508989273304, '17410')]

[itemNames[x[1]] for x in mostSimilar(query2,8)]

['Super Meat Boy',
 'Bastion',
 'Psychonauts',
 'Amnesia: The Dark Descent',
 'World of Goo',
 'Darksiders',
 'LIMBO',
 "Mirror's Edge"]

Conclusion¶

Similarity-based recommender system recommends similar items to user based on their input of item which they already have experience with. In this case, it recommends games based on the similarities between the input game and the game in the dataset. This project utilizes the Jaccard similarity measure, but other alternatives such as cosine similarity can also be used.

	user_ids
0	js41637
1	Riot-Punch
2	MinxIsBetterThanPotatoes
3	themanwich
4	Wackky
...	...
44150	76561198319916652
44151	76561198320136420
44152	76561198323066619
44153	XxLaughingJackClown77xX
44154	edward_tremethick

	item_ids
0	10
1	80
2	100
3	300
4	30
...	...
2588408	497810
2588409	497811
2588410	497812
2588411	497813
2588412	417860