Module Reference

churn_model.py

Churn prediction model pipeline for E-Commerce customer data. Includes data loading, preprocessing, modeling, evaluation, and MLflow logging.

Author: DROSSIG - AHPHAM Year: 2025

notebook.balance_data(X: DataFrame, y: Series, random_state: int = 42)

Balance dataset using SMOTETomek oversampling/undersampling.

Args:

X: Feature DataFrame. y: Target Series.

Returns:

tuple: Resampled (X, y)

notebook.evaluate_model(model, x_train, y_train, x_test, y_test)

Evaluate classifier performance.

Args:

model: Trained classifier. x_train: Scaled training features. y_train: Training labels. x_test: Scaled test features. y_test: Test labels.

Returns:

dict: Metrics

notebook.load_data(filename: str, sheet_name: str = 'E Comm') DataFrame

Load E-Commerce customer data from an Excel file.

Args:

filename: Path to the Excel file. sheet_name: Name of the sheet containing data.

Returns:

pd.DataFrame: Raw data.

notebook.mlflow_load_predict(run_id: str, x_test_scaled)

Load a model from MLflow and return predictions.

Args:

run_id: MLflow run ID. x_test_scaled: Scaled test features.

Returns:

np.ndarray: Model predictions

notebook.mlflow_log_model_run(model, x_train_scaled, x_test_scaled, y_train, y_test, experiment_name='MLflow Project Management')

Log a model training run to MLflow.

Args:

model: Trained model. x_train_scaled: Scaled training data. x_test_scaled: Scaled test data. y_train: Training labels. y_test: Test labels. experiment_name: MLflow experiment name.

notebook.plot_confusion_roc(model, x_test, y_test)

Plot confusion matrix and ROC curve.

Args:

model: Trained classifier. x_test: Test features. y_test: Test labels.

notebook.plot_correlation(df: DataFrame, target: str = 'Churn') None

Plot correlation between features and the target.

Args:

df: Preprocessed dataframe. target: The target column.

notebook.preprocess_data(df: DataFrame) DataFrame

Preprocess the raw data: - Merges similar categories. - Imputes missing values. - Converts categorical columns. - Drops unused columns. - Encodes categorical features as integers.

Args:

df: Raw data.

Returns:

pd.DataFrame: Preprocessed data ready for modeling.

notebook.scale_data(x_train, x_test)

Scale features using MinMaxScaler.

Args:

x_train: Training features. x_test: Test features.

Returns:

tuple: (x_train_scaled, x_test_scaled)

notebook.show_data_summary(df: DataFrame) None

Print duplicates, nulls, and unique value counts.

notebook.train_xgboost_classifier(x_train, y_train)

Train an XGBoost classifier.

Args:

x_train: Scaled training features. y_train: Training labels.

Returns:

XGBClassifier: Trained classifier.