About Dataset
Data Sources
- Primary: Kaggle
- Dataset: Link
- 168K+ Property Listings
Data Statistics
- Total Samples 168,446
- Features 20
- Data Validity 2018-2019
Data Analysis & EDA
Key Findings
- Bimodal Price Distribution
- 72% For Sale Properties & 28% Rental Properties
- Strong Correlation of Purpose with Target Feature
Data Cleaning
Outliers Removed
3.78%
Unvalid Data Removed
24.5%
Duplicate Data Removed
42%
Data Preparation & Preprocessing
Raw Data → Duplicated Data Removal → Outlier Detection + Removal Using Z-Method→ Unvalid Data Removal → Cleaned Data for Model Building
Key Transformations
- Feature Engineering
- Ordinal Encoding + Target Encoding for Categorical Features
- Standard Scaling for Numerical Features
Model Selection
Sale Prediction Model
| Model | MAE | R² |
|---|---|---|
| XGBoost | 4.5M | 0.86 |
| Random Forest | 4.6M | 0.85 |
| Gradient Boosting | 4.7M | 0.84 |
| Linear Regression | 5.3M | 0.78 |
Rent Prediction Model
| Model | MAE | R² |
|---|---|---|
| XGBoost | 22.0K | 0.82 |
| Random Forest | 24.7K | 0.81 |
| Gradient Boosting | 27.5K | 0.80 |
| Linear Regression | 38.0K | 0.72 |
Training Pipeline
Dual Prediction Models
Sale Prediction
XGBoost Regressor
MAPE 23%
- n_estimators: 126
- max_depth: 11
- learning_rate: 0.123
Rent Prediction
XGBoost Regressor
MAPE 21%
- n_estimators: 137
- max_depth: 10
- learning_rate: 0.047
Model Evaluation
Sale Prediction Model
MAE
4.5M
PKR
MAPE
23%
Error
R²
0.86
Score
Success: Acheived Approx MAPE of ±20%
Rent Prediction Model
MAE
22.0K
PKR
MAPE
21%
Error
R²
0.82
Score
Success: Acheived Approx MAPE of ±20%
Sale Recommender Model
AVG Similarity
92.7%
Coverage
13.1%
Novelty
52.6%
Success Rate: 100%
Rent Recommender Model
Avg Similarity
81.9%
Coverage
32.6%
Novelty
46.9%
Success Rate: 100%
Hyperparameter Tuning Using Optuna
Sale Prediction Model
- Optimization Method Bayesian
- Iterations 100
-
Best Parameters
n_estimators: 126max_depth: 11learning_rate: 0.123 - Best Accuracy 87%
Rent Prediction Model
- Optimization Method Bayesian
- Iterations 100
-
Best Parameters
n_estimators: 137max_depth: 10learning_rate: 0.047 - Best Accuracy 83%
Deployment
API Endpoints
Production-ready ML models deployed as RESTful API endpoints with JSON payload support.
Example Request
POST < api-url >/predict
Content-Type: application/json
{
"purpose": "rent",
"property_type": "House",
"location": "DHA Defence Islamabad",
"city": "Islamabad",
"province_name": "Islamabad Capital",
"baths": [3],
"bedrooms": [4],
"area": [8],
"Area Category": "5-10 Marla"
}
Success Response
{
"predicted_price": 47243,
"recommendations": [
{
"Area Category": "1-5 Kanal",
"area": 80.0,
"baths": 3,
"bedrooms": 3,
"city": "Islamabad",
"location": "F-7",
"price": 200000,
"property_type": "House",
"province_name": "Islamabad
},
{
"Area Category": "1-5 Kanal",
"area": 72.0,
"baths": 4,
"bedrooms": 4,
"city": "Islamabad",
"location": "F-8",
"price": 170000,
"property_type": "House",
"province_name": "Islamabad Capital"
},
{
"Area Category": "1-5 Kanal",
"area": 72.0,
"baths": 4,
"bedrooms": 4,
"city": "Islamabad",
"location": "E-11",
"price": 50000,
"property_type": "Upper Portion",
"province_name": "Islamabad Capital"
}
]
}