Machine Learning Report

Dual Model Architecture for Real Estate Predictions & Recommendations

MAPE ±20% Dual Models XGBoost 3.0.0
Project Overview
  • Problem Type: Regression
  • ML Approach: Supervised
  • Business Objective: ±20% MAPE
System Architecture

Data → Preprocessing → Model → Prediction (API)

Model Details
  • Version 3.0.0
  • Predictor Model XGBoost
  • Training Date: April 5,2025

About Dataset

Data Sources
  • Primary: Kaggle
  • Dataset: Link
  • 168K+ Property Listings
Data Statistics
  • Total Samples 168,446
  • Features 20
  • Data Validity 2018-2019

Data Analysis & EDA

Key Findings
  • Bimodal Price Distribution
  • 72% For Sale Properties & 28% Rental Properties
  • Strong Correlation of Purpose with Target Feature
Data Cleaning
Outliers Removed 3.78%
Unvalid Data Removed 24.5%
Duplicate Data Removed 42%

Data Preparation & Preprocessing

Raw Data → Duplicated Data Removal → Outlier Detection + Removal Using Z-Method→ Unvalid Data Removal → Cleaned Data for Model Building

Key Transformations
  • Feature Engineering
  • Ordinal Encoding + Target Encoding for Categorical Features
  • Standard Scaling for Numerical Features

Model Selection

Sale Prediction Model
Model MAE
XGBoost 4.5M 0.86
Random Forest 4.6M 0.85
Gradient Boosting 4.7M 0.84
Linear Regression 5.3M 0.78
Rent Prediction Model
Model MAE
XGBoost 22.0K 0.82
Random Forest 24.7K 0.81
Gradient Boosting 27.5K 0.80
Linear Regression 38.0K 0.72

Training Pipeline

Training Pipeline

Dual Prediction Models

Sale Prediction
XGBoost Regressor MAPE 23%
  • n_estimators: 126
  • max_depth: 11
  • learning_rate: 0.123
Rent Prediction
XGBoost Regressor MAPE 21%
  • n_estimators: 137
  • max_depth: 10
  • learning_rate: 0.047

Model Evaluation

Sale Prediction Model
MAE
4.5M
PKR
MAPE
23%
Error
0.86
Score
Success: Acheived Approx MAPE of ±20%
Rent Prediction Model
MAE
22.0K
PKR
MAPE
21%
Error
0.82
Score
Success: Acheived Approx MAPE of ±20%
Sale Recommender Model
AVG Similarity
92.7%
Coverage
13.1%
Novelty
52.6%
Success Rate: 100%
Rent Recommender Model
Avg Similarity
81.9%
Coverage
32.6%
Novelty
46.9%
Success Rate: 100%

Hyperparameter Tuning Using Optuna

Sale Prediction Model
  • Optimization Method Bayesian
  • Iterations 100
  • Best Parameters n_estimators: 126 max_depth: 11 learning_rate: 0.123
  • Best Accuracy 87%
Rent Prediction Model
  • Optimization Method Bayesian
  • Iterations 100
  • Best Parameters n_estimators: 137 max_depth: 10 learning_rate: 0.047
  • Best Accuracy 83%

Deployment

API Endpoints
Production-ready ML models deployed as RESTful API endpoints with JSON payload support.

Example Request

POST < api-url >/predict
Content-Type: application/json

{
    "purpose": "rent",
    "property_type": "House",
    "location": "DHA Defence Islamabad",
    "city": "Islamabad",
    "province_name": "Islamabad Capital",
    "baths": [3],
    "bedrooms": [4],
    "area": [8],
    "Area Category": "5-10 Marla"
}

Success Response


{
    "predicted_price": 47243,
    "recommendations": [
        {
            "Area Category": "1-5 Kanal",
            "area": 80.0,
            "baths": 3,
            "bedrooms": 3,
            "city": "Islamabad",
            "location": "F-7",
            "price": 200000,
            "property_type": "House",
            "province_name": "Islamabad 
        },
        {
            "Area Category": "1-5 Kanal",
            "area": 72.0,
            "baths": 4,
            "bedrooms": 4,
            "city": "Islamabad",
            "location": "F-8",
            "price": 170000,
            "property_type": "House",
            "province_name": "Islamabad Capital"
        },
        {
            "Area Category": "1-5 Kanal",
            "area": 72.0,
            "baths": 4,
            "bedrooms": 4,
            "city": "Islamabad",
            "location": "E-11",
            "price": 50000,
            "property_type": "Upper Portion",
            "province_name": "Islamabad Capital"
        }
    ]
}