Dataset Utilities
- class recsyslearn.dataset.segmentations.ActivitySegmentation
Bases:
SegmentationSegmentation of users based on their number of interaction.
- classmethod segment(dataset: DataFrame, proportions=None, min_interaction: int = 0) DataFrame
Segmentation of users based on their interactions with different items.
- Parameters:
dataset (pd.DataFrame) – The complete dataset.
proportions (list, default [0.8, 0.2]) – The proportion of interactions wanted for every group.
min_interaction (int, default 0) – The minimum number of interaction allowed per user. Users below this threshold will be removed.
- Raises:
SegmentationNotSupportedException – If len(proportion) not in (1, 2, 3).
WrongProportionsException – If sum(proportion) is not 1, which means it doesn’t cover all the items/users.
- Returns:
DataFrame with users and belonging group.
- Return type:
pd.DataFrame
- class recsyslearn.dataset.segmentations.DiscreteFeatureSegmentation
Bases:
SegmentationSegmentation of entities (users or items) according to one of their features (e.g., gender for users or genre for items)
- classmethod segment(feature: DataFrame, fill_na: int = -1) DataFrame
Segmentation of users/items based on one of their features. Before assigning the group, the nans are given a -1 value by default. Make sure that this is not one of the feature values, already.
- Parameters:
feature (pd.DataFrame) – The feature dataframe in form of [id, feature] storing the categorical feature to be used for grouping.
fill_na (int, default -1) – The value with which to fill not assigned values. Default is -1.
- Raises:
InvalidValueException – If the fill_na value is already present in the features dataframe.
- Returns:
DataFrame with items and belonging group.
- Return type:
pd.DataFrame
- class recsyslearn.dataset.segmentations.InteractionSegmentation
Bases:
SegmentationSegmentation of items based on the number of cumulative interaction they have.
- classmethod segment(dataset: DataFrame, proportions=None, min_interaction: int = 0, group='item') DataFrame
Segmentation of items based on their cumulative interactions with different users.
- Parameters:
dataset (pd.DataFrame) – The complete dataset.
proportions (list, default [0.8, 0.2]) – The proportion of interactions wanted for every group.
min_interaction (int, default 0) – The minimum number of interaction allowed for items. Items below this threshold will be removed.
group (str, default 'item') – The group which has to be segmented based on their number of interaction.
- Raises:
SegmentationNotSupportedException – If len(proportions) not in (1, 2, 3).
WrongProportionsException – If sum(proportions) is not 1, which means it doesn’t cover all the items/users.
InvalidGroupException – If group is not equal to ‘user’ or ‘item’.
- Returns:
DataFrame with items and belonging group.
- Return type:
pd.DataFrame
- class recsyslearn.dataset.segmentations.PopularityPercentage
Bases:
SegmentationCalculate item or user popularity based on the percentage of interaction they have.
- classmethod segment(dataset: DataFrame, group: str = 'item') DataFrame
Calculate item or user popularity based on the percentage of interaction they have.
- Parameters:
dataset (pd.DataFrame) – The complete dataset.
group (str, default 'item') – Whether to calculate the popularity of users or items.
- Returns:
DataFrame with items/user and corresponding popularity.
- Return type:
pd.DataFrame
- recsyslearn.dataset.utils.find_relevant_items(target_df: DataFrame) DataFrame
Find relevant items for every user in the dataset.
- Parameters:
target_df (pd.DataFrame) – Target Interaction dataframe of, i.e., items to be recommended. Columns: [‘user’, ‘item’].
- Raises:
ColumnsNotExistException – If target_df does not contain columns (‘user’, ‘item’).
- Returns:
The DataFrame containing all the relevant items per user in the form (‘user’, ‘pos_items’).
- Return type:
pd.DataFrame