Stock Market Prediction

 International Journal of Scientific Research in Computer Science, Engineering and Information Technology ISSN : 2456-3307 (www.ijsrcseit.com) 

doi : https://doi.org/10.32628/CSEIT217475 

Stock Market Prediction Using Twitter Sentiment Analysis 

Padmanayana, Varsha, Bhavya K  

Department of Computer Science, Srinivas Institute of Technology, Mangalore, Karnataka, India 

  

ABSTRACT  

  

Article Info  

Volume 7, Issue 4 Page Number: 265-270   

Publication Issue :  July-August-2021 

  

Article History  

Accepted : 15 July 2021 Published : 22 July 2021 

Stock market prediction is an important topic in financial engineering especially  since new techniques and approaches on this matter are gaining value  constantly. In this project, we investigate the impact of sentiment expressed  through Twitter tweets on stock price prediction. Twitter is the social media  platform which provides a free platform for each individual to express their  thoughts publicly. Specifically, we fetch the live twitter tweets of the particular  company using the API. All the stop words, special characters are extracted from  the dataset. The filtered data is used for sentiment analysis using Naïve bayes classifier. Thus, the tweets are classified into positive, negative and neutral  tweets. To predict the stock price, the stock dataset is fetched from yahoo  finance API. The stock data along with the tweets data are given as input to the machine learning model to obtain the result. XGBoost classifier is used as a  model to predict the stock market price. The obtained prediction value is  compared with the actual stock market value. The effectiveness of the proposed  project on stock price prediction is demonstrated through experiments on  several companies like Apple, Amazon, Microsoft using live twitter data and  daily stock data. The goal of the project is to use historical stock data in conjunction with sentiment analysis of news headlines and Twitter posts, to  predict the future price of a stock of interest. The headlines were obtained by  scraping the website, FinViz, while tweets were taken using Tweepy. Both were  analyzed using the Vader Sentiment Analyzer.  

Keywords : Sentiment Analysis, Stock market prediction, Machine Learning, Twitter 

I. INTRODUCTION  

Stock market prediction is an important topic in  financial engineering especially since new techniques  and approaches on this matter are gaining value  constantly. Predicting the stock market price is the  main challenge for many of the researchers today as  

it has the complexity for predicting the accurate  value which can match the actual stock market price.  Stock market prediction is the process of evaluating  the future value of the stock of particular company,  thus giving an idea of gain or loss to the investors to  invest on that particular company stock. Social media  plays an important role in predicting the stock price 

265  

Copyright: © the author(s), publisher and licensee Technoscience Academy. This is an open-access article distributed under the  terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use,  distribution, and reproduction in any medium, provided the original work is properly cited 

Padmanayana et al Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol, July-August-2021, 7 (4) : 265-270 

namely Twitter. Twitter is the social media platform  where around millions of tweets are sent daily.  Newspaper headlines also provide information  related to stock market which can also use for the  prediction purpose. Using the twitter data, prediction  process can be performed. Various tweets related to  different companies are obtained in the Twitter API.  There may be many tweets which is not used for  prediction purpose. Live twitter data can be extracted  from the twitter API and analysed using the classifier.  Stock data can be fetched using Yahoo finance API to  analyses the value. Various machine learning  algorithms are used to train the model to predict the  stock price. XGBoost and Naïve bayes are the  important classifiers that are used as the training  model to provide the accurate value after the  prediction. 

II. METHODS AND MATERIAL  

2.1. Literature Review 

Over the past two decades many important changes  have taken place in the environment of stock markets.  The development of powerful communication and 

leading facilities has enlarged the scope of selection  for investors as well as for users. Sentiment Analysis  is an information extraction task that aims to obtain  writer’s feelings expressed in positive, negative or  neutral comments. 

Agarwal and Apoorv in [3] examine the various  machine learning techniques on providing a positive  or negative sentiment on a tweet. The author uses  different techniques are Naïve Bayes, support vector  machine etc. Naïve Bayes classifier used to analyze  sentiment in the tweet data set and the support  vector machine techniques would be used for  predicting market movement. 

Fazel Zarandi M.H, Rezaee B, Turksen I.B and Neshat  E [6] used a type 2 fuzzy rule based expert system is  

Volume 7, Issue 4, July-August-2021 | http://ijsrcseit.com 

developed for stock price analysis. The proposed type  2 fuzzy model applies the technical and fundamental  indexes as the input variables. The type 1 method was used for inferences and to increasing the robustness  of the system, flexibility and error minimization. 

2.2 Algorithms 

In this project we used two main algorithms, Naïve  Bayes classifier and XGBoost. Naïve Bayes classifier is  used for sentiment analysis. This algorithm is  structured to provide either of the three classes:  positive, negative and neutral from the news  headlines and twitter tweets. Naïve Bayes classifier is  one of the simple and most effective classification  algorithms which helps in building the fast machine  learning models that can make quick productions. It  is a probabilistic classifier, which means it predicts on  the basis of probability of an object. In sentiment  analysis we figure out, if a text express negative or  positive feeling. Written reviews are great datasets  for doing sentiment analysis because they often write  a score that can be used to train an algorithm. Naïve  Bayes classification algorithm tends to be a baseline  solution for sentiment analysis task. The basic idea of  Naïve Bayes technique is to find the probabilities of  classes assigned to texts by rising the joint  probabilities of words and classes. To avoid  underflow, log probabilities can be issued. 

XGBoost algorithm is used for stock price prediction.  After the sentiment analysis process, we combine it  to most recent and in trend algorithm to process with  stock data to predict stock price. XGBoost is a most  powerful machine learning algorithm today. XGBoost  stands for gradient boosted trees and that means it’s a  big machine learning algorithm with lots of parts  remember boosting is an ensemble method. Every  tree within or boosting seen here is going to boost the  attributes that led to misclassification of previous tree.  In boosting, different model get train one after  another, so first model gets trained, then the second 

266  

Padmanayana et al Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol, July-August-2021, 7 (4) : 265-270 

model, then the third model and then many models  combine to give you a better result. XGBoost is  routinely wins Kaggle competitions. It is very easy to  use and it is very effective computationally. XGBoost  can automatically handle the missing values.  Regularized boosting or prevents overfitting, parallel  processing, tree pruning some of the features of  XGBoost algorithm. 

III. RESULTS AND DISCUSSION  

1. Proposed System 

Figure 1. High level structure 

Figure 1 illustrates the high-level structure of the  Stock market prediction using twitter sentiment  analysis. The user is given the option to get the  predicted stock price of respective company in the  stock market. The user needs to input the name of  the company whose stock price has to be predicted.  The user can also view the active stocks in the market  and also the weekly analysis of the stock market. 

Volume 7, Issue 4, July-August-2021 | http://ijsrcseit.com 

In this project we use two main datasets. We are  fetching data from twitter, but for accuracy purpose  we are collecting data from newspapers and yahoo  finance. So, we have taken data from newspaper  headlines related to stock of that company and also  live twitter data from twitter. These are the two  dataset we have taken. From this data we have  removed all the special characters including emojis,  hashtags (#) and @. These are not necessary for  sentiment, so these special characters are removed  and we have considered only the plain sentences.  When we are performed sentiment analysis in  machine learning, the tweets are classified in to three  classes: positive, negative and neutral. If you are  investing in stock market there is Bullish and Bearish  market behaviour. Bullish means the market is going  up and Bearish means the market is going down.  Neutral is something which is some certain sentences  are there which are kind of neutral, either they are  positive nor they are negative. Those are like middle  sentences, very rare cases that happens we will get  like this sentence. Naïve Bayes classifier takes all  these data and perform sentiment analysis on this. It  fetches the lexical file data line by line and twitter,  newspaper headline data what we are taking and it  will fetch together. It will classify these in to three  classes positive, negative and neutral and we will get  the output in dictionary format in python. 

2. Procedure for Workflow of The Project 

Step 1: Loading the main class which is responsible  for training and prediction

Step 2: Data processing of live twitter data fetched via  respective tweets through API

Step 3: Pre-processing of the fetched data which is  done to remove special characters, stop words and  perform tokenization. 

Step 4: Perform sentiment analysis of the obtained  new tweet data using naïve bayes Classifier.

267  

Padmanayana et al Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol, July-August-2021, 7 (4) : 265-270 

Step 5: Respective company stock data (open, close,  adj close) factors taken into consideration along with  sentiment analysis data is fed to XGBoost algorithm.  Step 6: Stock price of respective company displaying  in between time window of 30 Minutes. Thus, the  predicted value will be obtained. 

Figure 2. Flowchart for workflow of the project 

3. Procedure for User Interface 

Step 1: Sign up or login to open the home page Step 2: Input the name of the company of which the  stock price should be predicted. 

Volume 7, Issue 4, July-August-2021 | http://ijsrcseit.com 

Step 3: View the stock price by clicking on the check  button. 

Step 4: Have the option to view the active stocks,  analysis of stock data based on week. 

Step 5: Logout 

Figure 3. Flowchart for user interface

268  

Padmanayana et al Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol, July-August-2021, 7 (4) : 265-270 

4. Processing Steps 

There are some standard methods involved in this  technique. Those are as follows: 

4.1 Data Collection 

The data collection from twitter, news headline and  yahoo finance are collected for analysis. 

4.2 Data Preprocessing 

Stock data is extracted is not completely  understandable because of public holidays and  weekends where the stock market does not function.  There are missing in the stock value. These empty  values can be approximately using simple way.  Consider, the stock values on a day is x and the next  value present is y with some missing in between. So,  the first value is estimated as (y+2)/2 and the same  method is used to fill the missing values. 

Extracted tweets contains many stop words,  unnecessary data like special character, URLs,  pictures. These tweets are pre-processed to obtain the  emotion of the public. For pre-processing of data, we  employ three steps of filtering: 

Tokenization: Each tweet is split into individual  words called tokens. This process is done to break the  text, separated by whitespace character. 

Removal of stop words: Words like “a”, “an”, “the”,  “he”, “she”, “by”, “on”, etc are not required for  sentiment analysis. These are called stop words,  which is removed before sentiment analysis process.  Regex Matching: Special characters such as “URL”, “!”,  

“#”, “@” are all removed and replaced by whitespaces. 

4.3 Classification 

Use a bag of words containing information on  sentiment (positions, negative, neutral) along with  sentiment scores. After this, we adopt negation  detection measures to differentiate between “good”  and “not good”. In this blog we will be trying to do  sentiment analysis on twitter dataset and categorizing  them into positive, negative and neutral behaviour of  

Volume 7, Issue 4, July-August-2021 | http://ijsrcseit.com 

people. If the entire review has a positive, joyful  attitude on if something is mentioned with positive  connections. So, it is considered as a positive  statement. If the entire comment has a negative, sad  or if something mentioned with negative connections.  

So, it is considered as a negative statement. If the  review expresses no personal opinion in the  comments and reviews transmits information. 

After the feature extraction we perform sentiment  analysis using naïve bayes classifier

4.4 Stock market prediction 

The obtained sentiment analysis data along with  stock market data are combined and given as input to  the training model. the stock market values are  fetched using yahoo finance. The XGBoost classifier  evaluates both the data and predicts the stock market  value. 

IV. CONCLUSION  

In this paper we investigated how sentiment analysis  of the twitter data is correlated to the prediction of  the stock market price for all the companies which  are taken. The result obtained after the prediction  process clearly specifies that, we have obtained the  accurate value which matches with the actual stock  price appropriately. The accuracy obtained is 89.8%.  Thus, social media such as twitter can be used as a  source to predict the stock market price with  maximum accuracy. Furthermore, the machine  learning model XGBoost provides more accurate  values compared with other models. Thus, using  sentiment analysis of twitter data and stock data from  yahoo finance API, we predict the stock market price  which is helpful for predicting future stock price. 

In the future, we plan to further improve the work in  the following areas. First, our analysis is limited to 16 companies. An expansion to broader set of  companies or all Twitter data might yield more 

269  

Padmanayana et al Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol, July-August-2021, 7 (4) : 265-270 

insights into the data, leading to more effective  application in stock price prediction. Second, we use  the optional sentiment labels provided by Twitter  users as the ground truth data for model training. As  measured, this data has only 89.8% accuracy, getting  better training data is expected to improve the quality of the sentiment analyser. Finally, the current project  examines correlation at daily granularity because the  stock data are only available at the daily level. It will  be interesting to study correlations at a finer  granularity such as hourly. 

V. REFERENCES  

  

[1]. R. Ahuja, H. Rastogi, A. Choudhuri and B.  Garg, “Stock market forecast using sentiment  analysis”, 2nd International Conference on  Computing for Sustainable Global  Development, pp. 1008-1010, 2015. 

[2]. a. Mittal and a. Goel. “Stock Prediction Using  Twitter Sentiment Analysis.” Tomx.Inf.  Elte.Hu, (June), 2012. 

[3]. Agarwal, Apoorv, et al. “Sentiment analysis of  twitter data.” Proceedings of the Workshop on  Languages in Social Media. Association for  Computational Linguistics, 2011. 

[4]. W. Antweiler and M. Frank. Do US stock  markets typically overreact to corporate news  stories? Working Paper, (1998):1–22, 2006. 

[5]. Jabaseeli, A. Nisha, and E. Kirubakaran. “A  Survey on Sentiment Analysis of (Product)  Reviews.” International Journal of Computer  Applications 47.11, 2012. 

[6]. Fazel Zarandi M.H, Rezaee B, Turksen I.B and  Neshat E. “A Type-2 Fuzzy Model for Stock  Market Analysis.”, 2007. 

[7]. International Journal of Computer Applications  (0975-8887) Volume 121 – No.20, July 2015  Sentiment Analysis on Social Media and Online  Review. 

Volume 7, Issue 4, July-August-2021 | http://ijsrcseit.com 

[8]. L. A. Gallagher and M. P. Taylor, “Permanent  and temporary components of stock prices:  evidence from assessing macroeconomic  shocks,” Southern Economic Journal, vol. 69,  pp. 345-362, 2002. 

[9]. S. Urolagin, “Text mining of tweet for  sentiment classification and association with  stock prices,” Proceedings of 2017 International  Conference on Computer and Applications, pp.  384-388, 2017. 

[10]. V. S. Pagolu, K. N. Reddy, G. Panda and B.  Majhi, “Sentiment analysis of Twitter data for  predicting stock market movements,  “Proceedings of 2016 International Conference  on Signal Processing, Communication, Power  and Embedded System, pp. 1345-1350, 2016. 

[11]. B. Qian and K. Rasheed, “Stock market  prediction with multiple classifiers,” Applied  Intelligence, vol. 26, pp. 25-33, 2007. 

  

  

Cite this article as : 

Padmanayana, Varsha, Bhavya K, "Stock Market  Prediction Using Twitter Sentiment Analysis",  International Journal of Scientific Research in  Computer Science, Engineering and Information  Technology (IJSRCSEIT), ISSN : 2456-3307, Volume  7 Issue 4, pp. 265-270, July-August 2021. Available at  doi : https://doi.org/10.32628/CSEIT217475 

Journal URL : https://ijsrcseit.com/CSEIT217475270 

No comments:

Post a Comment

Notes 3-18-25

https://uconn-sa.blogspot.com/  We were able to launch an app engine program from our compute engine instance.   I'd like to get all wo...