Module Reference¶

churn_model.py

Churn prediction model pipeline for E-Commerce customer data. Includes data loading, preprocessing, modeling, evaluation, and MLflow logging.

Author: DROSSIG - AHPHAM Year: 2025

notebook.balance_data(X: DataFrame, y: Series, random_state: int = 42)¶

Balance dataset using SMOTETomek oversampling/undersampling.

Args:: X: Feature DataFrame. y: Target Series.
Returns:: tuple: Resampled (X, y)

notebook.evaluate_model(model, x_train, y_train, x_test, y_test)¶

Evaluate classifier performance.

Args:: model: Trained classifier. x_train: Scaled training features. y_train: Training labels. x_test: Scaled test features. y_test: Test labels.
Returns:: dict: Metrics

notebook.load_data(filename: str, sheet_name: str = 'E Comm') → DataFrame¶

Load E-Commerce customer data from an Excel file.

Args:: filename: Path to the Excel file. sheet_name: Name of the sheet containing data.
Returns:: pd.DataFrame: Raw data.

notebook.mlflow_load_predict(run_id: str, x_test_scaled)¶

Load a model from MLflow and return predictions.

Args:: run_id: MLflow run ID. x_test_scaled: Scaled test features.
Returns:: np.ndarray: Model predictions

notebook.mlflow_log_model_run(model, x_train_scaled, x_test_scaled, y_train, y_test, experiment_name='MLflow Project Management')¶

Log a model training run to MLflow.

Args:: model: Trained model. x_train_scaled: Scaled training data. x_test_scaled: Scaled test data. y_train: Training labels. y_test: Test labels. experiment_name: MLflow experiment name.

notebook.plot_confusion_roc(model, x_test, y_test)¶

Plot confusion matrix and ROC curve.

Args:: model: Trained classifier. x_test: Test features. y_test: Test labels.

notebook.plot_correlation(df: DataFrame, target: str = 'Churn') → None¶

Plot correlation between features and the target.

Args:: df: Preprocessed dataframe. target: The target column.

notebook.preprocess_data(df: DataFrame) → DataFrame¶

Preprocess the raw data: - Merges similar categories. - Imputes missing values. - Converts categorical columns. - Drops unused columns. - Encodes categorical features as integers.

Args:: df: Raw data.
Returns:: pd.DataFrame: Preprocessed data ready for modeling.

notebook.scale_data(x_train, x_test)¶

Scale features using MinMaxScaler.

Args:: x_train: Training features. x_test: Test features.
Returns:: tuple: (x_train_scaled, x_test_scaled)

notebook.show_data_summary(df: DataFrame) → None¶: Print duplicates, nulls, and unique value counts.

notebook.train_xgboost_classifier(x_train, y_train)¶

Train an XGBoost classifier.

Args:: x_train: Scaled training features. y_train: Training labels.
Returns:: XGBClassifier: Trained classifier.

Module Reference¶

Churn Prediction ML

Navigation

Related Topics