Academic Projects
Computer Vision: Pneumonia_ Detection from X-ray images
The project comprises of 26,684 X-ray images classified as Normal, Lung Opacity and No Lung Opacity/ Not normal. The objective is to design and develop an object detection model to classify patients suffering from Pneumonia and predict its bounding boxes.
ML libraries/Architectures: TensorFlow, CNN, R-CNN, YOLOv8, Transfer learning (VGG16, ResNET50)
Develop NLP-based classifier to predict the blog topic
Blogs, posts, articles, etc. are published constantly by various authors in different genres. This blog authorship corpus dataset comprises of 681,288 posts from 19,320 bloggers. Given a text, the task is to create a multi label classifier that could predict the topic the blogger intends to write without knowing about him/her.
ML libraries/Architectures: NLTK, Gensim (Word2Vec), Scikit-learn (CountVectorizer, TfidfVectorizer), TensorFlow, Keras, MultinomialNB, RandomForestClassifier, Embedding, LSTM
Design a semi-rule based chatbot using NLP
Great Learning has a an academic support department which receives numerous support requests every day throughout the year. Based on a sample corpus(and using our linguistics skills), designed and developed an interactive ML/DL semi - rule based chatbot which can interact with the user, understand the problem and display the resolution procedure for generic request or redirect the request.
ML libraries/Architectures: NLTK, Scikit-learn (CountVectorizer), TensorFlow (Tokenizer), Keras, Neural Network
Implement face recognition model
The Face Aligned Face Dataset from Pinterest contains 10,770 images for 100 people. So the objective is to implement a face recognition model for a company, which intends to recognize celebrity faces from images. Generated an embeddings vector on each face in the dataset and built a classifier in order to map each image to its right person.
ML libraries/Architectures: TensorFlow, Scikit-learn, Convolutional Neural network
Image classifier to detect plant and flower species
University is doing research to understand the characteristics of plant and plant seedlings at various stages of growth and also the characteristics of flowers. Their dataset comprises of 4750 images from 12 plant species and 1360 images from 17 flower categories (OxFlower dataset). The objective is to create a classifier that can determine plant's and flower's species from an image. The task was accomplished by developing multiple models (CNN, transfer learning, hyper-parameter tuning, image augmentation) and then selecting the best model capable of classifying the plant's species and flower species.
ML libraries/Architectures: TensorFlow, Scikit-learn, Neural Network, Convolutional Neural Network, Transfer learning (VGG16), ImageDataGenerator
Digit classifier on the SVHN (Street View Housing Number) dataset
The SVHN is a real-world image dataset comprising over 600,000 digit images for developing machine learning and object recognition algorithms. The variability in the visual appearance of text is due to range of fonts, colours, styles, orientations, and character arrangements. The recognition problem is complicated by environmental factors such as lighting, shadows, specularity, and occlusions as well as by image acquisition factors such as resolution, motion, and focus blurs. In this project, the objective is to build a digit classifier which can predict the multi-digit numbers in the photographs captured at street level from a subset of data over 100,000.
ML libraries/Architectures: TensorFlow, Scikit-learn, Keras, Neural network
Develop smartphone recommendation system
This project replicates a real time use case of an e-commerce company, which can recommend mobile phones to a user, which are most popular and personalized respectively. The project was accomplished by employing recommendation techniques such as popularity-based recommendation and collaborative filtering methods to recommend a mobile handset to its users based on the individual consumer’s behavior/choices.
ML libraries: Surprise (SVD, KNNWithMeans,GridSearchCV)
Build classifier to predict the yield in a semiconductor manufacturing company
A semiconductor manufacturing company has sensor data comprising of 1567 datapoints with 591 features. Identification of the yield of manufacturing process using these many features is time consuming and may incur significant loss if gone wrong. The goal of the project was to determine key factors contributing to yield excursions downstream in the process and will enable an increase in process throughput, decreased time to learn and reduce per-unit production costs. The project was accomplished by employing supervised learning, ensemble modeling, and unsupervised learning techniques to build and train a prediction model to identify Pass/Fail yield of a particular process entity for a semiconductor manufacturing company.
ML Techniques: KNeighborsClassifier, LogisticRegression, GaussianNB, SVC, DecisionTreeClassifier, RandomForestClassifier, PCA
Develop a ML model to address customer churn issue
This project is based on a telecommunication company which is facing a customer churn issue. The project aims at understanding the pattern of the data and predicting customers who are going to churn based on multiple variables to help the company in retaining their existing customers. The project was accomplished by building a machine learning model that will help identify potential customers who have a high probability to churn.
ML Techniques: XGBoost
Classification of orthopedic patients based on biomechanical features
Medical research university is undergoing a deep research on patients with certain conditions. The objective is to classify orthopedic patients belonging to one out of three categories: Normal (100 patients), Disk Hernia (60 patients) or Spondylolisthesis (150 patients). The patient's condition is analysed based on biomechanical features, which includes (pelvic incidence, pelvic tilt, lumbar lordosis angle, sacral slope, pelvic radius, grade of spondylolisthesis).
ML Techniques: KNeighborsClassifier, LogisticRegression
Recommend basketball team using statistical analysis
A company wants to invest on proposals on managing some of the best teams in the league. Hence the company wants to understand, which teams they can approach, which will be a deal win for them. In this project, past tournament information was analysed to make informative decisions. Statistical and visualisation techniques such as histograms, barcharts, piecharts, heatmap, pair-wise scatterplot, joint-plot, box-plot etc are used to perform exploratory data analysis and come up with performance metrics to help in decision making.
ML libraries: Pandas, NumPy, Seaborn, Matplotlib, SciPy