Module Reference¶
churn_model.py
Churn prediction model pipeline for E-Commerce customer data. Includes data loading, preprocessing, modeling, evaluation, and MLflow logging.
Author: DROSSIG - AHPHAM Year: 2025
- notebook.balance_data(X: DataFrame, y: Series, random_state: int = 42)¶
Balance dataset using SMOTETomek oversampling/undersampling.
- Args:
X: Feature DataFrame. y: Target Series.
- Returns:
tuple: Resampled (X, y)
- notebook.evaluate_model(model, x_train, y_train, x_test, y_test)¶
Evaluate classifier performance.
- Args:
model: Trained classifier. x_train: Scaled training features. y_train: Training labels. x_test: Scaled test features. y_test: Test labels.
- Returns:
dict: Metrics
- notebook.load_data(filename: str, sheet_name: str = 'E Comm') DataFrame¶
Load E-Commerce customer data from an Excel file.
- Args:
filename: Path to the Excel file. sheet_name: Name of the sheet containing data.
- Returns:
pd.DataFrame: Raw data.
- notebook.mlflow_load_predict(run_id: str, x_test_scaled)¶
Load a model from MLflow and return predictions.
- Args:
run_id: MLflow run ID. x_test_scaled: Scaled test features.
- Returns:
np.ndarray: Model predictions
- notebook.mlflow_log_model_run(model, x_train_scaled, x_test_scaled, y_train, y_test, experiment_name='MLflow Project Management')¶
Log a model training run to MLflow.
- Args:
model: Trained model. x_train_scaled: Scaled training data. x_test_scaled: Scaled test data. y_train: Training labels. y_test: Test labels. experiment_name: MLflow experiment name.
- notebook.plot_confusion_roc(model, x_test, y_test)¶
Plot confusion matrix and ROC curve.
- Args:
model: Trained classifier. x_test: Test features. y_test: Test labels.
- notebook.plot_correlation(df: DataFrame, target: str = 'Churn') None¶
Plot correlation between features and the target.
- Args:
df: Preprocessed dataframe. target: The target column.
- notebook.preprocess_data(df: DataFrame) DataFrame¶
Preprocess the raw data: - Merges similar categories. - Imputes missing values. - Converts categorical columns. - Drops unused columns. - Encodes categorical features as integers.
- Args:
df: Raw data.
- Returns:
pd.DataFrame: Preprocessed data ready for modeling.
- notebook.scale_data(x_train, x_test)¶
Scale features using MinMaxScaler.
- Args:
x_train: Training features. x_test: Test features.
- Returns:
tuple: (x_train_scaled, x_test_scaled)
- notebook.show_data_summary(df: DataFrame) None¶
Print duplicates, nulls, and unique value counts.
- notebook.train_xgboost_classifier(x_train, y_train)¶
Train an XGBoost classifier.
- Args:
x_train: Scaled training features. y_train: Training labels.
- Returns:
XGBClassifier: Trained classifier.