Dataset Utilities

class recsyslearn.dataset.segmentations.ActivitySegmentation

Bases: Segmentation

Segmentation of users based on their number of interaction.

classmethod segment(dataset: DataFrame, proportions=None, min_interaction: int = 0) → DataFrame

Segmentation of users based on their interactions with different items.

Parameters:

dataset (pd.DataFrame) – The complete dataset.
proportions (list, default [0.8, 0.2]) – The proportion of interactions wanted for every group.
min_interaction (int, default 0) – The minimum number of interaction allowed per user. Users below this threshold will be removed.

Raises:

SegmentationNotSupportedException – If len(proportion) not in (1, 2, 3).
WrongProportionsException – If sum(proportion) is not 1, which means it doesn’t cover all the items/users.

Returns:

DataFrame with users and belonging group.

Return type:

pd.DataFrame

class recsyslearn.dataset.segmentations.DiscreteFeatureSegmentation

Bases: Segmentation

Segmentation of entities (users or items) according to one of their features (e.g., gender for users or genre for items)

classmethod segment(feature: DataFrame, fill_na: int = -1) → DataFrame

Segmentation of users/items based on one of their features. Before assigning the group, the nans are given a -1 value by default. Make sure that this is not one of the feature values, already.

Parameters:

feature (pd.DataFrame) – The feature dataframe in form of [id, feature] storing the categorical feature to be used for grouping.
fill_na (int, default -1) – The value with which to fill not assigned values. Default is -1.

Raises:

InvalidValueException – If the fill_na value is already present in the features dataframe.

Returns:

DataFrame with items and belonging group.

Return type:

pd.DataFrame

class recsyslearn.dataset.segmentations.InteractionSegmentation

Bases: Segmentation

Segmentation of items based on the number of cumulative interaction they have.

classmethod segment(dataset: DataFrame, proportions=None, min_interaction: int = 0, group='item') → DataFrame

Segmentation of items based on their cumulative interactions with different users.

Parameters:

dataset (pd.DataFrame) – The complete dataset.
proportions (list, default [0.8, 0.2]) – The proportion of interactions wanted for every group.
min_interaction (int, default 0) – The minimum number of interaction allowed for items. Items below this threshold will be removed.
group (str, default 'item') – The group which has to be segmented based on their number of interaction.

Raises:

SegmentationNotSupportedException – If len(proportions) not in (1, 2, 3).
WrongProportionsException – If sum(proportions) is not 1, which means it doesn’t cover all the items/users.
InvalidGroupException – If group is not equal to ‘user’ or ‘item’.

Returns:

DataFrame with items and belonging group.

Return type:

pd.DataFrame

class recsyslearn.dataset.segmentations.PopularityPercentage

Bases: Segmentation

Calculate item or user popularity based on the percentage of interaction they have.

classmethod segment(dataset: DataFrame, group: str = 'item') → DataFrame

Calculate item or user popularity based on the percentage of interaction they have.

Parameters:

dataset (pd.DataFrame) – The complete dataset.
group (str, default 'item') – Whether to calculate the popularity of users or items.

Returns:

DataFrame with items/user and corresponding popularity.

Return type:

pd.DataFrame

recsyslearn.dataset.utils.find_relevant_items(target_df: DataFrame) → DataFrame

Find relevant items for every user in the dataset.

Parameters:: target_df (pd.DataFrame) – Target Interaction dataframe of, i.e., items to be recommended. Columns: [‘user’, ‘item’].
Raises:: ColumnsNotExistException – If target_df does not contain columns (‘user’, ‘item’).
Returns:: The DataFrame containing all the relevant items per user in the form (‘user’, ‘pos_items’).
Return type:: pd.DataFrame