xavier assignment 1
Natural Language Processing
Q1 Review the python script in Q1 Folder – NLTK_Text_Analysis.py
Use text below to apply the same process
Text= “â€â€Backgammon is one of the oldest known board games. Its history can be traced back nearly 5,000 years to archeological discoveries in the Middle East. It is a two player game where each player has fifteen checkers which move between twenty-four points according to the roll of two dice.â€â€â€
a. Text Analysis Operations using NLTK
b. Tokenization
c. Stopwords removal
d. Lexicon Normalization such as Stemming and Lemmatization
e. POS Tagging
Q2 Analyze the customer reviews in the file Restaurant_Reviews.tsv
Explain each step for the following text clean-up commands
a. Explain each step for the following text clean-up commands
review = dataset[‘Review’][0]
review = re.sub(‘[^a-zA-Z]’, ‘ ‘, dataset[‘Review’][0])
review = review.lower()
review = review.split()
ps = PorterStemmer()
review = [ps.stem(word) for word in review if not word in set(stopwords.words(‘english’))]
review = ‘ ‘.join(review)
b. What is the classification question?
c. The example uses the Naïve Bayes classifier to classify the sentiments. Calculate the confusion matrix:
TP = # True Positives,
TN = # True Negatives,
FP = # False Positives,
FN = # False Negatives):
Accuracy = (TP + TN) / (TP + TN + FP + FN)
d. Apply the logistic regression classifier to the problem – recalculate “c†i.e. TP, TN, FP, FN, Accuracy
Q3 NLTK Corpus on Movie Reviews
Q3a Use the following reference analyze sentiment analysis on Movie Review “Q3 Movie Reviews.pyâ€
https://www.nltk.org/book/ch06.html
Q3b – Explain how the Bag of Words model help in sentiment analysis
http://blog.chapagain.com.np/python-nltk-sentiment…
Summarize the entire code in NLTKMovieReview.py file as a part of the solution
Q4 Twitter Analysis sentiment140
Perform a Twitter sentiment analysis –
– who interact by retweeting and responding?
– Twitter employs a message size restriction of 280 characters or less
– forces the users to stay focused on the message they wish to disseminate.
– Twitter data is great for Machine Learning (ML) task of sentiment analysis.
– Sentiment Analysis falls under Natural Language Processing (NLP)
– made up of about 1.6 million random tweets
– with corresponding binary labels. 0 for Negative sentiment and 1 for Positive sentiment.
https://towardsdatascience.com/the-real-world-as-s…
Q5 Analyze Clothing Reviews
https://www.kaggle.com/nicapotato/womens-ecommerce…
A women’s Clothing E-Commerce site revolving around the reviews written by customers. This dataset includes 23486 rows and 10 feature variables. Each row corresponds to a customer review, and includes the variables:
Class Name: Categorical name of the product class name
Perform
a. Text extraction & creating a corpus
b. Text Pre-processing
c. Create the DTM & TDM from the corpus
d. Exploratory text analysis
e. Feature extraction by removing sparsity
f. Build the Classification Models and compare Logistic Regression to Random Forest regression
https://medium.com/analytics-vidhya/customer-revie…
Q1 Review the python script in Q1 Folder – NLTK_Text_Analysis.py
Use text below to apply the same process
Text= “â€â€Backgammon is one of the oldest known board games. Its history can be traced back nearly 5,000 years to archeological discoveries in the Middle East. It is a two player game where each player has fifteen checkers which move between twenty-four points according to the roll of two dice.â€â€â€
a. Text Analysis Operations using NLTK
b. Tokenization
c. Stopwords removal
d. Lexicon Normalization such as Stemming and Lemmatization
e. POS Tagging
Q2 Analyze the customer reviews in the file Restaurant_Reviews.tsv
Explain each step for the following text clean-up commands
a. Explain each step for the following text clean-up commands
review = dataset[‘Review’][0]
review = re.sub(‘[^a-zA-Z]’, ‘ ‘, dataset[‘Review’][0])
review = review.lower()
review = review.split()
ps = PorterStemmer()
review = [ps.stem(word) for word in review if not word in set(stopwords.words(‘english’))]
review = ‘ ‘.join(review)
b. What is the classification question?
c. The example uses the Naïve Bayes classifier to classify the sentiments. Calculate the confusion matrix:
TP = # True Positives,
TN = # True Negatives,
FP = # False Positives,
FN = # False Negatives):
Accuracy = (TP + TN) / (TP + TN + FP + FN)
d. Apply the logistic regression classifier to the problem – recalculate “c†i.e. TP, TN, FP, FN, Accuracy
Q3 NLTK Corpus on Movie Reviews
Q3a Use the following reference analyze sentiment analysis on Movie Review “Q3 Movie Reviews.pyâ€
https://www.nltk.org/book/ch06.html
Q3b – Explain how the Bag of Words model help in sentiment analysis
http://blog.chapagain.com.np/python-nltk-sentiment…
Summarize the entire code in NLTKMovieReview.py file as a part of the solution
Q4 Twitter Analysis sentiment140
Perform a Twitter sentiment analysis –
– who interact by retweeting and responding?
– Twitter employs a message size restriction of 280 characters or less
– forces the users to stay focused on the message they wish to disseminate.
– Twitter data is great for Machine Learning (ML) task of sentiment analysis.
– Sentiment Analysis falls under Natural Language Processing (NLP)
– made up of about 1.6 million random tweets
– with corresponding binary labels. 0 for Negative sentiment and 1 for Positive sentiment.
https://towardsdatascience.com/the-real-world-as-s…
Q5 Analyze Clothing Reviews
https://www.kaggle.com/nicapotato/womens-ecommerce…
A women’s Clothing E-Commerce site revolving around the reviews written by customers. This dataset includes 23486 rows and 10 feature variables. Each row corresponds to a customer review, and includes the variables:
Class Name: Categorical name of the product class name
Perform
a. Text extraction & creating a corpus
b. Text Pre-processing
c. Create the DTM & TDM from the corpus
d. Exploratory text analysis
e. Feature extraction by removing sparsity
f. Build the Classification Models and compare Logistic Regression to Random Forest regression
https://medium.com/analytics-vidhya/customer-revie…
HW11.docx
Q2 Restaurant Reviews.zip
Q1 NLP Basics.zip
Leave a Reply
Want to join the discussion?Feel free to contribute!