About Skills Experience Open Source Projects Contact
Rudra Prasad Bhuyan

Rudra Prasad Bhuyan

AI & Data Scientist who loves Finance & Business.

Machine Learning Engineer with hands-on experience in data analysis, machine learning, and end-to-end project execution.

Focused on practical, system thinking solutions — not just models. I solve real problems using data.

v2.0.0
Skills & Expertise
Python, SQL
Pandas, NumPy
Plotly, Matplotlib, Seaborn
Power BI, Looker Studio
Scikit-learn, XGBoost, CatBoost
GitHub, MLflow, CI/CD
Pydantic, FastAPI
PySpark, Polars
AWS
PostgreSQL, Snowflake, BigQuery, Databricks
PyTorch, TensorFlow, Keras
LLMs, ChatGPT, Groq, Llama, Claude, HuggingFace
LangChain, LangGraph, LangSmith
Probability, Statistical Modeling
Hypothesis Testing, A/B Testing
Git, Jupyter Notebook, VS Code
Time Series Forecasting, PCA
Hyperparameter Tuning
Storytelling, Stakeholder Communication
Problem Solving, Analytical Thinking
Work Experience

SBC Labs

Jr. Data Scientist

Nov 2025 – Feb 2026

Built ETL pipelines and interactive dashboards for analyzing socioeconomic and healthcare datasets.

Read more

Developed Python and SQL ETL pipelines for 15 multi-level HCES datasets with 400+ features to streamline large-scale batch processing.

Analyzed 500+ socioeconomic and healthcare variables to identify trends in finance, healthcare, consumption, and savings behavior.

Automated data validation and duplicate-handling workflows using SQL procedures, reducing data inconsistencies by 20%.

Built reusable data preprocessing scripts in Python, reducing manual model-preparation effort and saving ~1 hour daily.

Designed scalable system architecture, PRD documentation, and 30+ analytical reports to standardize workflows and business planning.

Built interactive KPI dashboards across 50+ parameters and collaborated with stakeholders to deliver actionable insights and summaries.

Tata Power WODL

Quality Analyst

June 2025 – July 2025

Monitored smart meter data workflows and collaborated with field operations teams.

Read more

Observed and analyzed smart meter data workflows for 60,000+ consumers, centralized database operations and issue-tracking processes.

Assisted in field-to-database survey operations with a 20-member team, supporting digital meter updates, and complaint-handling workflows.

Gained exposure to image-processing and grid monitoring systems used to identify human errors, detect irregularities operational planning.

Open Source

show-file-tree

A small, fast CLI tool to display styled file/folder trees with rich options, colors, icons, and metadata.

find-my-joint

A utility to find potential join keys (matching columns) across multiple pandas DataFrames.

Featured Projects

Vehicle Insurance Risk Prediction

End-to-end MLOps pipeline predicting vehicle insurance risk in real time.

Python • Flask • AWS • Docker

Vehicle Insurance
  • Built end-to-end MLOps pipeline with CI/CD via GitHub Actions & Docker
  • Integrated MongoDB for data ingestion and AWS S3 for model storage
  • Flask web app provides real-time insurance risk predictions
  • Automated model training, evaluation, and deployment workflows

SQL Modern Data Warehouse

Medallion architecture ETL pipeline unifying ERP & CRM data for analytics.

PostgreSQL • SQL • ETL • Star Schema • Power BI

Data Warehouse
  • Built Medallion architecture (Bronze → Silver → Gold) for data transformation
  • Developed ETL pipelines integrating ERP & CRM systems into PostgreSQL
  • Designed Star Schema for optimized analytical queries
  • Created Power BI reports for sales performance and customer insights

Yelp Big Data Analysis

High-performance Polars pipeline processing millions of Yelp records without memory issues.

Python • Polars • JSON • Parquet

Yelp Big Data
  • Built high-performance pipeline using Polars — 5–10× faster than Pandas
  • Converted raw JSON to Parquet format for optimized columnar storage
  • Performed large-scale analysis on business reviews, check-ins & tips
  • Identified top business categories and sentiment patterns across cities

Breast Cancer Prediction App

Real-time tumor classification app built with Logistic Regression and Streamlit.

Python • Scikit-learn • Streamlit

Breast Cancer App
  • Built Logistic Regression model achieving high accuracy on diagnostic features
  • Interactive Streamlit web app with real-time probability output
  • Features include radar chart visualization for feature comparison
  • Trained on the Wisconsin Breast Cancer Dataset (569 samples)

Transportation & Logistic Dashboard

Power BI dashboard analyzing logistics efficiency, delays, and supplier performance.

Power BI • KPI Development

Transport Dashboard
  • Designed KPI dashboard tracking on-time delivery, delays & costs
  • Analyzed supplier vs customer performance across 3 shipping modes
  • Identified top delay contributors and operational inefficiencies
  • Submitted for ZoomCharts Power BI challenge — recognized as top entry

Smart Transaction Ledger

AI-powered financial transaction cleaner with fraud detection and real-time SQL queries.

Python • FastAPI • AI • SQL

Smart Ledger
  • Automated data cleaning with validation rules and duplicate detection
  • Fraud detection & real-time monitoring with alert system
  • AI chatbot assistant for natural language SQL queries
  • Interactive analytics dashboard with FastAPI backend
More Projects
👋

Hi, I am Rudra Assistants!