NYP AI Summer Camp 2021

NYP AI Summer Camp 2021 is the second AI Summer Camp hosted by NYP AI, held from 20th September - 24th September, 2pm - 4:30pm daily. This time, we dabbled into applicable and interesting AI, including AI Stock Prediction, Face Detection & Tweets impersonation...

GitHub - NYP-AI/Learning-Materials: A Public Repository containing all of NYP AI’s past event materials
A Public Repository containing all of NYP AI’s past event materials - GitHub - NYP-AI/Learning-Materials: A Public Repository containing all of NYP AI’s past event materials

Day 1: Computer Vision

Diving into Day 1, we covered the Open-CV library, including how to read images and capture live frames from our computer's camera...

Using the VideoCapture object from cv2

We went on to Object Detection, utilizing CV2's inbuilt CascadeClassifier. From converting images to grayscale, to outputting the bounding boxes, the entire pipeline was coded out.

Bounding Box for our Object Detection Algorithm

Finally, we utilized Streamlit to host our model online. Using Github to store our code, and the Streamlit Sharing app, we were able to place our website on the World Wide Web...

Website hosted on streamlit's platform

Day 2: Natural Language Processing

Starting off Day 2, two preprocessing techniques for text was covered: Tokenization & Word Stemming

Tokenization in Python

We then went on to explore the capabilities of NLP models, using the Huggingface library for NLP tasks including Text Summarization, Text Generation, Sentiment Analysis, Masked Language Modeling.

Huggingface's Transformers library
Masked Language Modeling 

Named Entity Recognition using spaCy was also covered

NER 

Diving deeper, we moved on to Word Embeddings, which most State-Of-The-Art Machine Learning NLP models utilize.

Word embeddings explained

We utilized Tensorflow's Embedding Projector (https://projector.tensorflow.org/) to visualize embeddings in 3D space.

Embedding Projector

Pretrained Embedding Layers were obtained from Tensorflow Hub, for us to obtain our very own word embeddings.

Downloading a Pretrained Word Embedding layer from Tensorflow Hub

Finally, to measure the similarity of two words, we utilized the Cosine Similarity function on the word embeddings.

Obtaining the Cosine Similarity between two words (embeddings)

Day 3: Time Series

Stock Prediction with Deep Learning... We used Yahoo Finance to obtain the past 5 Years of financial Data for AAPL stock. We would then predict the closing price.

Today, we introduced preprocessing techniques for Time Series, most notably the Sliding Window method.

Sliding Window method visualized

The Sliding Window method would allow us to create datasets for Time Series Prediction

For Loop to create our Dataset

In our case, we had a window size of 60, meaning that we would use the past 60 days worth of closing prices to predict the next closing price.

Visualizing our x (input) and y (output)

Moving on to Deep Learning, we covered concepts like Layer Types and Activation Functions.

Level of our Deep Learning Layers
Deep Learning Activation Functions

We then dissected the Tensorflow Deep Learning Pipeline: The 4-step process of Defining, Compiling, Training & Evaluating

Deep Learning Pipeline
Compilation of our Model

Finally, we made predictions on our Validation set and compared them to the actual prices (Funnily enough, this was during the Evergrande crisis.. so you could see a steep descent in global prices...) 🤪

Using Matplotlib to visualize our predictions (Total of 120 days)

Day 4

Here, we explore the malicious use cases of Artificial Intelligence. Our task for the day: Learn to impersonate someone's tweets.

We utilized Snscrape for our Tweets Scraping. By specifying the username, we could extract a specified number of tweets from that person and write them out to a .txt file.

Code for retrieving tweets

For Text Generation, we utilized Markov Chains.

Markov Chains as Stochastic Model

For our Malicious Use Case, we would replace all URLs in the tweets with our own malicious URL. We would then fit our Model on these preprocessed tweets.

Regex Function for replacing all URLs with our modified URL

By fitting our model on these tweets, we are able to capture the linguistic style of that user, hence "impersonating" his/her tweets...

A generated tweet containing our malicious URL...

Day 5: Recommender System

Plunging into Recommender System, we covered two forms of Recommender Systems: Simple recommender system & Content-Based recommender system

Simple Recommender System using Mathematical Formula

For our Data Ingestion, we utilized the Kaggle API to download our Dataset: The MovieLens Dataset

Overview of the MovieLens Dataset

For our Content-Based recommender system, we covered Text Preprocessing techniques including Bag-of-Words and TF-IDF. We then briefly touched on Matrix Factorization, before moving on to Cosine Similarity.

Bag of words with Cosine Similarity

Finally, we trained our Content-Based recommender system using Cosine Similarity & used it to provide recommendations.

Getting the recommendations for "Guardians of the Galaxy")

Afterword

We'd like to thank all our participants for making NYP AI Summer Camp 2021 a huge success! Not forgetting our Planning Team: Lim Jing Kai, Loy Jun Cheng, Tony Yu, Dylan Kok, Nuzul Firdaly & Alex Chien. Without them, the Camp wouldn't have been as exciting as we'd envisioned...

We'll continually come up with more exciting & updated AI content, so do keep a lookout for our future events 👀

Until then~