Car Price Prediction — Subhankar Mandal

Back to Projects

Project Overview

Regression model benchmarking Linear Regression, Random Forest, XGBoost, and Gradient Boosting for used car price prediction. Deployed as an interactive Streamlit app where users can enter car details and get an estimated market price instantly.

0.92Best R² (XGBoost)

₹54KAvg RMSE

4Models Compared

LiveStreamlit App

Feature Engineering

Raw used car data needed significant cleaning and feature engineering before modeling:

Age: current year − manufacturing year
Brand encoding: Target encoding (brand average price) — better than one-hot for 40+ brands
Mileage transformation: Log transform to handle right-skewed distribution
Fuel type & transmission: Ordinal encoding
Engine + Power: Extracted numeric values from "1498 CC" and "102 bhp" strings

Python

import pandas as pd, numpy as np
from sklearn.preprocessing import LabelEncoder

df['car_age']   = 2026 - df['year']
df['log_km']    = np.log1p(df['kms_driven'])
df['engine_cc'] = df['engine'].str.extract(r'(\d+)').astype(float)
df['power_bhp'] = df['max_power'].str.extract(r'(\d+\.?\d*)').astype(float)

# Target encoding for brand
brand_avg = df.groupby('brand')['selling_price'].mean()
df['brand_enc'] = df['brand'].map(brand_avg)

df['fuel_enc']  = LabelEncoder().fit_transform(df['fuel'])
df['trans_enc'] = LabelEncoder().fit_transform(df['transmission'])

features = ['car_age','log_km','engine_cc','power_bhp',
            'brand_enc','fuel_enc','trans_enc','seats']

Model Comparison

Python

from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from xgboost import XGBRegressor
from sklearn.metrics import r2_score, mean_squared_error

models = {
    'Linear Regression':    LinearRegression(),
    'Random Forest':        RandomForestRegressor(n_estimators=200, max_depth=15),
    'Gradient Boosting':    GradientBoostingRegressor(n_estimators=200, learning_rate=0.05),
    'XGBoost':              XGBRegressor(n_estimators=300, learning_rate=0.05, max_depth=6),
}
for name, model in models.items():
    model.fit(X_train, y_train)
    preds = model.predict(X_test)
    print(f"{name}: R²={r2_score(y_test,preds):.4f}  RMSE={mean_squared_error(y_test,preds,squared=False):,.0f}")

Results

0.71Linear Regression R²

0.88Random Forest R²

0.90Gradient Boosting R²

0.92XGBoost R² ✅

XGBoostRandom ForestScikit-learn PandasStreamlitPython

All Projects Mosquito Detection Inventory System

🚗 Car Price Prediction

Project Overview

Feature Engineering

Model Comparison

Results