Stock Price Prediction with K-nearest Neighbors (KNN) and Random Forest

This project focus on the price action and aims to predict stock price on the next day whether it is up or down, utilizing KNN, random forest, and financial indicators to maximize the accuracy of the model.
The final accuracy of the model is 0.6893. The model may not having a very good performance with only using classification models and price momentum indicators. Time series analysis, such as Long Short Term Memory and ARIMA could be applied to develop a more precise model on the stock price prediction.

Model Accuracy
KNN 0.668
Random Forest 0.689


Language R
Toolkits/Library tidyverse, caret, data.table, readr, rpart, randomForest, TTR, yardstick

Movie Rating Prediction with Regularized Multivariate Regression Model

Rating prediction is important for movie recommendation system on item specific recommendation. Different factors including movie effects, user effect, movie released year and genre of the movie could affect the movie rating. The project aims to predict movie rating on a given movie data set (movielen). The model is built on Multivariate Regression methods. A regularization approach is used on improving the accuaracy. The evaluation metrics, Resdiual Mean Square Error (RMSE), is used. For a lower RMSE, the predicted result is more close to the actual rating. The final proposed model is in the following:

Model RMSE
Movie + User + Year + Genre effect model after regularized 0.865


Language R
Toolkits/Library tidyverse, caret, data.table, stringr, lubridate

Classifying European Money Denominations with CNN (Resnet18 and DenseNet 121)

This project aims to classify european money denominations. The neutral network is built on pytorch and Convolutional Neural Network (CNN) is used as the neutral network on the classification. Both Resnet18 and DenseNet 121 is used for performance comparison. Both of the model is well trained and having an accuaracy of 0.95 above.

Model Accuaracy
CNN (Resnet18) 0.986
CNN (DenseNet 121) 0.957


Language Python
Toolkits/Library Pytorch, pandas, numpy, matplotlib, random, time

Predicting Top Complaint Type in New York City with KNN, Logistic Regression and SVM Model

This project predicts the complaint that a more likely to be happened on the residential flat though numbers of selected features, including numbers of floors, locations, built year,etc.. Would heat/hot water complaint or heating complaint are more likely to happen. This classification problem is solved using KNN, SVM model and Logistic regression.


Language Python
Toolkits/Library Pandas, numpy, matplotlib, sklearn, seaborn

Get In Touch

Should you need more information, feel free to reach me via email at Thank you.