Dataset Utilities

class recsyslearn.dataset.segmentations.ActivitySegmentation

Bases: Segmentation

Segmentation of users based on their number of interaction.

classmethod segment(dataset: DataFrame, proportions=None, min_interaction: int = 0) DataFrame

Segmentation of users based on their interactions with different items.

Parameters:
  • dataset (pd.DataFrame) – The complete dataset.

  • proportions (list, default [0.8, 0.2]) – The proportion of interactions wanted for every group.

  • min_interaction (int, default 0) – The minimum number of interaction allowed per user. Users below this threshold will be removed.

Raises:
Returns:

DataFrame with users and belonging group.

Return type:

pd.DataFrame

class recsyslearn.dataset.segmentations.DiscreteFeatureSegmentation

Bases: Segmentation

Segmentation of entities (users or items) according to one of their features (e.g., gender for users or genre for items)

classmethod segment(feature: DataFrame, fill_na: int = -1) DataFrame

Segmentation of users/items based on one of their features. Before assigning the group, the nans are given a -1 value by default. Make sure that this is not one of the feature values, already.

Parameters:
  • feature (pd.DataFrame) – The feature dataframe in form of [id, feature] storing the categorical feature to be used for grouping.

  • fill_na (int, default -1) – The value with which to fill not assigned values. Default is -1.

Raises:

InvalidValueException – If the fill_na value is already present in the features dataframe.

Returns:

DataFrame with items and belonging group.

Return type:

pd.DataFrame

class recsyslearn.dataset.segmentations.InteractionSegmentation

Bases: Segmentation

Segmentation of items based on the number of cumulative interaction they have.

classmethod segment(dataset: DataFrame, proportions=None, min_interaction: int = 0, group='item') DataFrame

Segmentation of items based on their cumulative interactions with different users.

Parameters:
  • dataset (pd.DataFrame) – The complete dataset.

  • proportions (list, default [0.8, 0.2]) – The proportion of interactions wanted for every group.

  • min_interaction (int, default 0) – The minimum number of interaction allowed for items. Items below this threshold will be removed.

  • group (str, default 'item') – The group which has to be segmented based on their number of interaction.

Raises:
Returns:

DataFrame with items and belonging group.

Return type:

pd.DataFrame

class recsyslearn.dataset.segmentations.PopularityPercentage

Bases: Segmentation

Calculate item or user popularity based on the percentage of interaction they have.

classmethod segment(dataset: DataFrame, group: str = 'item') DataFrame

Calculate item or user popularity based on the percentage of interaction they have.

Parameters:
  • dataset (pd.DataFrame) – The complete dataset.

  • group (str, default 'item') – Whether to calculate the popularity of users or items.

Returns:

DataFrame with items/user and corresponding popularity.

Return type:

pd.DataFrame

recsyslearn.dataset.utils.find_relevant_items(target_df: DataFrame) DataFrame

Find relevant items for every user in the dataset.

Parameters:

target_df (pd.DataFrame) – Target Interaction dataframe of, i.e., items to be recommended. Columns: [‘user’, ‘item’].

Raises:

ColumnsNotExistException – If target_df does not contain columns (‘user’, ‘item’).

Returns:

The DataFrame containing all the relevant items per user in the form (‘user’, ‘pos_items’).

Return type:

pd.DataFrame