By Group "Function Words"

Codes and Blogs By Liu Qing Yuan, Wang Shu Yu, Xiong Zhen Zhe, and Zhou Zi Qi. This is a First Blog post. If you are looking for more progress, please move to our Second Blog [1].

[1] We are still working on it!

Abstract

In case you don't remember who we are or what we're up to, we threw together this little recap to help you out.

We are group "Funtion Words", passionating about cryptocurrency and data analysis. Our goal was to leverage Python programming to scrape and analyze Twitter content from key opinion leaders (KOLs) in the crypto space. By extracting and evaluating their tweets, we aimed to provide data-driven insights to help investors make informed decisions on whether to buy or sell meme coins. Our project combines web scraping, natural language processing, and financial analysis to navigate the volatile world of meme cryptocurrencies.

You can click Here to access our PowerPoint presented in class.

As part of our NLP project, we initially wanted to scrape the sentiment of Dogecoin tweets on Twitter. However, we quickly realized that Twitter's API is too costly, especially for large data scraping. This forced us to look for a less expensive alternative, and this prompted us to explore Reddit's API.

Our Approach

We were interested in knowing what sort of sentiment people are using in social media comments and posts about Dogecoin and whether it correlates with the price movements of Dogecoin. We first thought of using Twitter to get social media sentiment since it's a very popular platform for talking about financial topics. However, Twitter's API is too costly, so we figured we'd do something different and do Reddit instead, which has a free API and lots of discussion data.

The code we use is as follows:

# List to store the fetched results
data = []

# Iterate through each selected post
for idx, submission in enumerate(selected_submissions):
    print(f"Fetching submission {idx + 1}/{len(selected_submissions)}: {submission.title}") # Print the title of the post being fetched

    # Handle the post's comments to avoid loading too many redundant comments
    submission.comments.replace_more(limit=0) # Replace "MoreComments" with empty, to avoid fetching excessive comments
    comments = submission.comments.list() # Get all comments (excluding redundant parts)

    # Get the text of the comments, limited to 'comment_limit' number of comments
    comment_texts = [comment.body for comment in comments[:comment_limit]]

    # Get the post creation date and format it as a string (Year-Month-Day Hour:Minute:Second)
    created_date = datetime.utcfromtimestamp(submission.created_utc).strftime('%Y-%m-%d %H:%M:%S')

    # Add the current post and its comment data to the result list
    data.append({
        'headline': submission.title, # The title of the post
        'comments': comment_texts, # The content of the comments
        'date': created_date # The date the post was created
    })

    # Sleep for 'sleep_time' seconds after fetching each post to avoid making requests too quickly
    time.sleep(sleep_time)

return data

We started by extracting data from the "CryptoCurrency" subreddit, as well as posts and comments mentioning Dogecoin. We extracted Dogecoin posts using Reddit's search API and examined the titles and comments for sentiment. Sentiment analysis was done by implementing the TextBlob library, which provides a sentiment score for each post and comment between -1 (Negative) and 1 (Positive).

Dataset

  1. The dataset of reddit comments:
Headline Comment Date
Hacker exploits DOGECOIN flaw, crashing 69% of nodes and exposing a vulnerability that could have taken down the entire network. [deleted] 2024/12/12 14:18
Hacker exploits DOGECOIN flaw, crashing 69% of nodes and exposing a vulnerability that could have taken down the entire network. Surviving hackers is bullish 2024/12/12 14:18
Hacker exploits DOGECOIN flaw, crashing 69% of nodes and exposing a vulnerability that could have taken down the entire network. As long as degens can still buy and sell DOGE they don't care what it actually does or doesn't do lol 2024/12/12 14:18
Hacker exploits DOGECOIN flaw, crashing 69% of nodes and exposing a vulnerability that could have taken down the entire network. "69%? Nice" 2024/12/12 14:18
Hacker exploits DOGECOIN flaw, crashing 69% of nodes and exposing a vulnerability that could have taken down the entire network. Doge investors: pump the news 2024/12/12 14:18
Hacker exploits DOGECOIN flaw, crashing 69% of nodes and exposing a vulnerability that could have taken down the entire network. Already been patched on most networks. price isn't changing 2024/12/12 14:18
Hacker exploits DOGECOIN flaw, crashing 69% of nodes and exposing a vulnerability that could have taken down the entire network. Why didn't hacker just click and drag the price to $1,000? 2024/12/12 14:18
Hacker exploits DOGECOIN flaw, crashing 69% of nodes and exposing a vulnerability that could have taken down the entire network. So buy the dip? 2024/12/12 14:18
Hacker exploits DOGECOIN flaw, crashing 69% of nodes and exposing a vulnerability that could have taken down the entire network. Wait, isn't DOGE's source code basically a fork of Litecoin? Does that mean LTC has this kind of flaw too? 2024/12/12 14:18
Hacker exploits DOGECOIN flaw, crashing 69% of nodes and exposing a vulnerability that could have taken down the entire network. Who would imagine that a meme coin without proper development team and security updates is vulnerable to attacks... keep on buying, it will pump tomorrow for sure! 2024/12/12 14:18
Hacker exploits DOGECOIN flaw, crashing 69% of nodes and exposing a vulnerability that could have taken down the entire network. DOGE hodlers have no clue what any of this means so they'll just buy more and keep dick riding Elmo 2024/12/12 14:18
Hacker exploits DOGECOIN flaw, crashing 69% of nodes and exposing a vulnerability that could have taken down the entire network. So, Moon Soon? 2024/12/12 14:18
Hacker exploits DOGECOIN flaw, crashing 69% of nodes and exposing a vulnerability that could have taken down the entire network. All this does is improve DOGE in the long run and works as a non-security short? 2024/12/12 14:18
Hacker exploits DOGECOIN flaw, crashing 69% of nodes and exposing a vulnerability that could have taken down the entire network. Blockchair's node count is very inaccurate. https://what-is-dogecoin.com/nodes/ shows 14563 nodes. There is no evidence of any flaw. Some rando shorted hard, made a tweet and everyone is taking it as facts. If there was a node crash it would have blown up during the crash supposedly on Dec 4. I for one run an older version node, never taken offline, never crashed. 2024/12/12 14:18
Hacker exploits DOGECOIN flaw, crashing 69% of nodes and exposing a vulnerability that could have taken down the entire network. Seems like FUD. Based on other discussion threads there was no significant change in the number of nodes. Anyone got proof of this besides this tweet? 2024/12/12 14:18
Hacker exploits DOGECOIN flaw, crashing 69% of nodes and exposing a vulnerability that could have taken down the entire network. Glad DOGE people don't interest themselves with anything informative! 2024/12/12 14:18
  1. The dataset of price:
Date Price Open High Low Vol. Change %
02/21/2025 0.246771 0.254737 0.261063 0.246746 1.31B -3.13%
02/20/2025 0.254740 0.254866 0.257420 0.250098 1.06B -0.05%
02/19/2025 0.254876 0.251209 0.255351 0.248996 913.84M 1.42%
02/18/2025 0.251300 0.258189 0.259660 0.242390 1.55B -2.67%
02/17/2025 0.258182 0.265716 0.268627 0.254015 1.26B -2.84%
02/16/2025 0.265728 0.271776 0.274075 0.263873 842.34M -2.23%
02/15/2025 0.271794 0.271835 0.282934 0.268399 1.50B -0.01%

Challenges Encountered

  1. Issue to Access Influential Comments: The first issue that we faced was that Reddit has numerous common users whose comments might not be as influential in their effect on the market as those of professionals or influential figures on sites such as Twitter. Thus, some of the comments that we fetched were not very insightful or informative regarding sentiment analysis for Dogecoin price changes. This restriction became increasingly obvious as we examined the data.

  2. API for Reddit has a restriction in that you can only do 300 requests at once. Because we wanted much more data, we had to execute many requests to get sufficient information. This limitation slowed down our data collection, but we had to do it in order to ensure we had enough information for some good analysis.

  3. Observing a Weak Relationship Between Sentiment and Price: Once we executed the sentiment analysis using TextBlob, we merged the sentiment scores with the historical price data of Dogecoin.

overtimetrend

However, we found that the sentiment scores were not strongly correlated with Dogecoin's price movements. The simple polarity-based sentiment analysis did not seem to pick up on how the discussion topics in the subreddit influenced the trends in the market. This result brought us to the conclusion that our current sentiment analysis approach is too simplistic.

Looking for a Better Model: DeepSeek for Emotional Understanding

Because rudimentary sentiment analysis has its limitations, we sought to explore more advanced tools. DeepSeek is one of them: it's a free API offering rich emotional analysis that we can use to analyze comments and rate their emotional tone of text into feelings like happiness, sadness, anger, and so on. For example, we can simply start a request of" rating this comment about how surprised they are from 0-10". We believed this would help us capture the full spectrum of sentiment on Reddit postings and ascertain more precisely how emotion might affect the price of Dogecoin. So, we're thinking of analyzing daily comments from a range of different emotional angles, with a score from 0 to 10 to indicate how strong each emotion is in DeepSeek. With that data, we might be able to apply a regression model to see if there is any relationship between those emotional environments and how Dogecoin's price moves.

We're not really looking to devise a profitable trading strategy, but we think this is a fascinating research project into how social media sentiment relates to cryptocurrency prices.

Conclusion

Despite these difficulties, we have gained a great deal of insight into the nuances of sentiment analysis, especially regarding social media data. Our next steps involve leveraging more advanced tools like DeepSeek in order to further refine our sentiment analysis and investigate potential correlations with price movement. Though the current goal is research-oriented and exploration in nature, we are excited to continue to investigate how sentiment from platforms like Reddit may influence market trends, especially in the volatile world of cryptocurrency. We're aiming to dive deeper and help everyone better understand how social media moods can impact market movements.


Published

Category

Reflective Report

Tags

Contact