Virtual Labs

Telecom Customer Churn Prediction using Decision Tree & Random Forest

Follow these steps to complete the Customer Churn Prediction simulation using Decision Tree and Random Forest:

Step 1: Dataset Exploration

Review the Feature Dictionary to understand the attributes (Tenure, Contract, Monthly Charges) used to predict customer behavior.
Explore the Raw Dataset table to identify patterns in the target variable (Churn).
Click "Next Step: Preprocessing" at the bottom of the navigation panel to continue.

Step 2: Data Preprocessing

Click "Run Full Preprocessing" on the left panel to begin the automated data transformation pipeline.
The simulation will sequentially execute the following steps:
- Handle Missing Values: Removes or imputes missing data points to ensure dataset integrity.
- Label Encoding: Converts categorical variables (like "Contract") into numerical representations.
- Feature Standardization: Scales numerical features to a uniform range for optimal model performance.
Observe the Data Transformation Pipeline animation to visualize these changes in real-time.
Click "Next Step: Data Splitting" once the process is 100% complete.

Step 3: Dataset Splitting

Use the Split Ratio Slider to divide your data into Training and Testing sets.
A common ratio is 80:20 (80% for training the model and 20% for evaluating it).
Observe the Live Calculation results to see how the records are distributed between the two sets.
Click "Next Step: DT Parameters" to begin model configuration.

Step 4: Decision Tree Configuration

Configure the Maximum Depth and other Splitting Criteria like Gini or Entropy.
Adjust the Tree Depth and Pruning controls to manually tune model complexity and reduce overfitting risks.
Observe the Tree Structure Preview to see how your depth selection affects the potential branching of the tree.
Click "Next Step: DT Training" to proceed to the training phase.

Step 5: Decision Tree Training

Click "Start Training" to execute the recursive partitioning algorithm on your training data.
Visualize the resulting Decision Tree Structure and see how the model makes specific decisions at each node.
Review the Feature Importance scores to identify which factors (like "Contract" or "Tenure") impact churn the most.
Click "Next Step: RF Parameters" to compare this with ensemble methods.

Step 6: Random Forest Configuration

Set the Number of Trees (n_estimators) and the Maximum Depth per individual tree.
Select a Feature Strategy (like SQRT) to ensure feature diversity and improve the stability of the ensemble.
Observe the Random Forest Structure diagram to understand how bootstrap samples are assigned to different trees.
Click "Next Step: RF Training" to begin the training process.

Step 7: Random Forest Training

Click "Start Training" to begin the ensemble voting process for the forest.
Observe the Tree Voting Simulation as multiple trees independently predict outcomes for a specific test customer.
Check the Prediction Confidence and the probability breakdown between "Stay" and "Churn" classes.
Click "Next Step: Final Evaluation" to view the summary results.

Step 8: Model Comparison & Evaluation

Review the Accuracy Comparison and analyze the Confusion Matrix Legend (TP, TN, FP, FN) to identify the top performer.
Compare the Performance Metrics (Precision and Recall) to evaluate the prediction quality of both models.
Click "Finished" to complete the laboratory session and review the final conclusion.