Using Natural Language Processing to Explore Mental Health Insights From UK Tweets During the COVID-19 Pandemic: Infodemiology Study

Background There is need to consider the value of soft intelligence, leveraged using accessible natural language processing (NLP) tools, as a source of analyzed evidence to support public health research outputs and decision-making. Objective The aim of this study was to explore the value of soft intelligence analyzed using NLP. As a case study, we selected and used a commercially available NLP platform to identify, collect, and interrogate a large collection of UK tweets relating to mental health during the COVID-19 pandemic. Methods A search strategy comprised of a list of terms related to mental health, COVID-19, and lockdown restrictions was developed to prospectively collate relevant tweets via Twitter’s advanced search application programming interface over a 24-week period. We deployed a readily and commercially available NLP platform to explore tweet frequency and sentiment across the United Kingdom and identify key topics of discussion. A series of keyword filters were used to clean the initial data retrieved and also set up to track specific mental health problems. All collated tweets were anonymized. Results We identified and analyzed 286,902 tweets posted from UK user accounts from July 23, 2020 to January 6, 2021. The average sentiment score was 50%, suggesting overall neutral sentiment across all tweets over the study period. Major fluctuations in volume (between 12,622 and 51,340) and sentiment (between 25% and 49%) appeared to coincide with key changes to any local and/or national social distancing measures. Tweets around mental health were polarizing, discussed with both positive and negative sentiment. Key topics of consistent discussion over the study period included the impact of the pandemic on people’s mental health (both positively and negatively), fear and anxiety over lockdowns, and anger and mistrust toward the government. Conclusions Using an NLP platform, we were able to rapidly mine and analyze emerging health-related insights from UK tweets into how the pandemic may be impacting people’s mental health and well-being. This type of real-time analyzed evidence could act as a useful intelligence source that agencies, local leaders, and health care decision makers can potentially draw from, particularly during a health crisis.


Introduction
COVID-19 was identified as a new type of coronavirus in early January 2020 [1]. Since then, the disease has rapidly spread to and affected almost all parts of the world. In the United Kingdom, the first outbreak was reported on January 31, 2020, with a national lockdown following on March 26, 2020. Shortly before this, COVID-19 was declared a global pandemic by the World Health Organization (WHO) on March 11, 2020 [2,3].
The COVID-19 pandemic continues to have a profound effect on mental health [4]. In a key position paper published in June 2020, the authors explored the current and future potential psychological, social, and neuroscientific effects of COVID-19 and set out a series of priorities and longer-term strategies for mental health research [4]. One of the immediate research priorities presented in the paper was "surveillance." In particular, the authors suggested that finding useful ways to monitor and analyze data on the mental health effects of the COVID-19 pandemic across the whole population, as well as vulnerable subgroups, was essential [4].
With over 300 million active monthly users, Twitter is one of the most popular social media platforms available. Twitter is a free microblogging service that enables its users to post, read, and respond to each other's "tweets" (ie, short messages limited to 280 characters). Social media data are being increasingly used as a data source to inform health-related research, with the potential for offering a more efficient means of data collection over traditional, time-consuming, and costly survey-based methods [5]. In particular, Twitter has been used to monitor, track trends, and disseminate health information during past viral pandemics [6][7][8][9]. Further, previous studies have successfully leveraged Twitter data for the assessment of public sentiments, attitudes, and opinions concerning health-related issues [10,11].
Channels of soft intelligence like Twitter, leveraged using novel artificial intelligence (AI) techniques (including natural language processing [NLP]), offer an opportunity for real-time analysis of public attitudes, sentiments, and key topics of discussion [12]. As aforementioned, previous case studies have shown that applying NLP can aid health researchers in gaining insights from large, unstructured data sets, such as Twitter. However, the true value of this type of work, including the data set itself, analysis methods, and how it might be integrated into more formal public health research outputs, is still uncertain. For example, a lot of previous work so far has focused on the use of internally developed, bespoke tools or packages, which tend to require a certain level of technical expertise around machine learning (ML) in order operate effectively. However, as methods continue to mature, we are seeing a growing number of "off-the-shelf" solutions become available, which appear to be more accessible and require less technical understanding of the underlying ML concepts.
The aim of this study was to further explore the value of soft intelligence as a meaningful source of evidence, which, when analyzed using an accessible NLP platform, can support public health research activity. In this article, we report the findings from a case study that examined a large collection of tweets relating to mental health posted from the United Kingdom during the COVID-19 pandemic.

Data Collection
An advanced AI-based, text analytics platform using NLP was used to initially analyze the tweets. The analytics platform, "Wordnerds," is described by its developers as a "text analysis and insights platform using machine learning techniques" [13]. In particular, this off-the-shelf platform supports analysis of metadata, topic, and sentiment to understand the context of a tweet and to group tweets together into topic clusters that contain tweets relating to each other or discussing similar issues. This facilitates a more accurate and sophisticated insight into the vaccine conversation on Twitter compared with methodologies that rely solely on a qualitative count of single words, phrases, or hashtags [14].
We developed a search strategy comprising a list of terms related to COVID-19, the lockdown, and mental health to search (or "scrape") for relevant tweets (see Textbox 1). Search terms were identified through discussion within the research team and scanning recent literature around mental health. Once the strategy had been agreed upon, it was reviewed by a topic expert and information specialist. We then began prospectively searching for and scraping relevant tweets using Twitter's advanced search application interface [15]. A geolocation filter was applied to the search strategy to limit the collection of tweets to those posted in the United Kingdom only.
In this article, we report the findings from our analysis of relevant tweets in the United Kingdom collected over a 24-week period, from July 23, 2020 to January 6, 2021.

Preparing and Cleaning the Data
All collated tweets were anonymized. Before analyzing the data, the retrieved results were run through a final keyword filter. This filter was comprised of a series of terms and keywords associated with mental health problems to help ensure a more relevant, cleaner, and less noisy data set for analysis. For example, general terms such as "isolation" and "well-being" were filtered out. Further, terms associated with eating disorders were added alongside the original terms.
The final keyword filter applied is summarized in Textbox 2.

Data Analysis
We used Wordnerds to interrogate the tweets. The developers state that their platform uses a range of different technologies in order to deliver its various analyses, including contextual word embeddings and collocation methods [13]. To date, we have found no other published studies coordinated by an academic research group that have used this specific tool.
Using the platform, we were able to track and determine the weekly frequency of tweets relevant to our initial search strategy from the United Kingdom between July 23, 2020 and January 6, 2021. We also tracked the frequency of subsets of these tweets that incorporated terms for specific mental health problems, as listed in Textbox 3.
The NLP platform was then used to explore the sentiment (ie, positive, neutral, or negative) of the whole corpus of tweets. Sentiment was determined using contextual word embedding techniques, including classification of grammar to understand how words interact [16]. Textbox 3. Keyword filters used to identify specific mental health problems.
Anxiety: "anxious," "anxiety" Depression: "depression," "depressed," "depressing" Stress: "stress," "stressful," "stressed" Loneliness: "loneliness," "lonely," "alone" Following sentiment analysis, the platform's topic analysis feature was used to identify and cluster key emerging topics of discussion, both with positive and negative underlying sentiment. For this analysis, the platform automatically clustered key topics of positive and negative discussion using topic collocation methods. This is a probabilistic method of identifying interesting sentence fragments and words that occur frequently together within a data set. The results of the platform's topic analysis were examined, and its findings were summarized by 2 of the authors (KL and RG). These summaries were checked by 2 further authors (CM and GCW).
Due to the high volume of tweets collected, the topic analysis was split between 2 equal time periods. The first covered summer 2020 to autumn 2020, when lockdown restrictions were relaxed. The second covered the autumn to winter period in 2020, when regional and then further national lockdown restrictions were introduced.

Ethical Considerations
Institutional review board approval was not sought as this study used only publicly available data. All posts were de-identified, and there was no direct interaction with Twitter users.

Tweet Volume
We captured and collated 286,902 tweets posted by users in the United Kingdom from July 23, 2020 to January 6, 2021. The volume of tweets by week, together with key events taking place during this study period, is visualized in Figure 1. Further notable events or issues that occurred throughout the study period are summarized in Table 1 (weeks 1 to 12) and Table 2 (weeks 13 to 24).
As shown in Figure 1, the highest volume of tweets occurred week commencing (w/c) October 29, 2020, with 51,340 tweets. The lowest volume was observed w/c December 3, 2020, with 12,622 tweets. The data show a fairly consistent baseline trend over the study period. Spikes in the volume of tweets occurred in September, October, and December, typically during periods leading up to (or during) a major change in social distancing and lockdown measures across the United Kingdom.
The first peak was observed w/c September 17, 2020, the week after the introduction of the "rule of 6," whereby a mix of 6 people from any household could meet indoors or outdoors. A similar peak was observed w/c October 8, 2020. This was the week leading up to the government's introduction of a new tiered system, whereby regions across the United Kingdom were allocated to 1 of 3 tiers (and later a fourth tier) based on prevalence of COVID-19. Higher tiers corresponded with tighter restrictions, including closing nonessential businesses and limits placed on social gatherings.
The largest peak was observed w/c October 29, 2020, the week before the second national lockdown began. The final peak occurred w/c December 31, 2020, the week leading up to the start of a third national lockdown. For all 3 national lockdowns, all nonessential businesses were closed, and UK residents were restricted from meeting anyone outside of their "social bubble" (ie, their household or, for people living alone, themselves plus one other household). Volume of tweets from July 23, 2020 to January 6, 2021. Key events during the study period included the (1) rule of 6 (up to 6 people from any number of households could meet indoors or outdoors), (2) tier system (regions across England were assigned a tier from 1 to 3 based on epidemiological indicators, and these tiers dictated the restrictions in that area, such as which businesses could open and how many individuals could meet in a group during national lockdown-nonessential businesses were closed and people were prohibited from meeting outside of their support bubble).   Table 3 summarizes the volume of tweets that utilized at least one of the terms related to anxiety, depression, stress, or loneliness. In total, 113,312 (39.50%) of the 286,902 tweets scraped through the initial search strategy related to anxiety, depression, stress, or loneliness. Figure 2 presents the volume of tweets utilizing terms related to anxiety, depression, stress, or loneliness around each keyword filter over the study period.

Mental Health Problems
Across all of the mental health problems that were focused on here, the highest volume of tweets was observed w/c October 29, 2020, and the lowest volume of tweets occurred w/c December 3, 2020. The "Anxiety" filter returned the highest total number of tweets, and the "Loneliness" filter returned the lowest (see Table 3). The trend in tweet frequency for each filter mirrored the trend reported for the overall data set over the study period.
During the first half of the analysis period (w/c July 23, 2020 to w/c October 22, 2020), tweet volume between the 4 mental health problem filters varied. In particular, tweets related to the "Anxiety" filter were consistently posted most often, and tweets relating to the "Loneliness" filter were consistently posted least. There was a spike in volume across all 4 filters w/c October 29, 2020, the start of the second national lockdown. Following the spike, the volume of tweets across all of the filters was broadly similar for the remainder of the analysis period. Table 3. Keyword filters and resultant volume of tweets for specific mental health problems.

Sentiment Analysis
Of the 286,902 tweets, 34,347 (11.97%) were identified as having positive sentiment, 217,728 (75.89%) as having neutral sentiment, and 34,827 (12.14%) as having negative sentiment, with an overall sentiment score of 50% assigned. A score below 50% suggests negative sentiment, and a score greater than 50% suggests positive sentiment. Textbox 4 presents a selection of example tweets that the NLP platform classified as both positive and negative.
Here, the overall score of 50% indicates neutral sentiment across all tweets over the study period. Figure 3 visualizes the weekly change in sentiment over the study period from July 23, 2020 to January 6, 2021.   Sentiment remained neutral or positive throughout most of the study period. The highest assigned sentiment score, 52%, occurred in weeks 1, 4 to 6, 16, and 18 to 20. The lowest assigned sentiment score, 49%, occurred in week 9 (w/c September 17, 2020).
Overall, the data show a relatively consistent trend in sentiment over the study period. When sentiment fluctuation did occur, it was similar to the trend observed with tweet frequency and coincided with major changes to lockdown or social distancing rules.

Results of the Topic Analysis for Weeks 1 to 12
This section presents the results of a topic analysis based on 115,700 scraped tweets posted from July 23, 2020 to October 14, 2020.

Summary of Clustered Topics With Positive Sentiment
"Mental health" emerged as a key topic of discussion underpinned with positive sentiment. The importance of mental health throughout the "coronavirus pandemic," as a critical health issue, was shared widely by people on Twitter. During this period, people openly discussed their mental health and how they had been coping. People also shared praise for specific local and national mental health services, as well as key public figures (eg, Marcus Rashford).
There was considerable discussion based around "World Mental Health Day" and "World Suicide Prevention Day." People were sharing helpful strategies (eg, videos, charities, help lines, exercise regimes, healthy eating advice) others could use to protect and maintain their mental health. There were also calls from people to be particularly vigilant and make sure they are checking in with any "vulnerable people" in their life.
Positive discussion was observed around "working from home." Various users were sharing helpful resources to support working from home effectively, including strategies that people had found useful during the previous national lockdown. Some people reported that mandatory working from home had helped them to achieve a better work-life balance and reduced their anxiety.
Specific examples of mental health tweets underpinned with positive sentiment from weeks 1 to 12 are presented in Textbox 5.

Summary of Clustered Topics With Negative Sentiment
"Second lockdown" emerged as a key topic of discussion underpinned with negative sentiment. People were sharing their fears, concerns, and anxieties over the prospect of a second national lockdown and the impact this would have on theirs (and other's) mental health. People recalled and spoke openly about how their mental health had suffered during the previous lockdown, referencing specific problems such as "anxiety," "depression," and posttraumatic stress disorder. Some people shared that they had been diagnosed with depression for the first time due to the previous national lockdown.
Many people were angry that not enough had been done by the government to protect "vulnerable people" during the previous national lockdown. Suicide was also discussed. People were claiming that suicide rates had increased during lockdown, particularly among younger people. There was widespread sharing of warnings from key educational figures that the pandemic would have long-lasting negative effects on "children." Further, some argued that people were using "mental health" as an excuse to avoid further lockdown restrictions. Many people were concerned that a second lockdown would be much worse for people's mental health than the first (the lockdown coinciding with winter and students returning to university were both seen as contributing negative factors). Those who refused to wear masks were a further source of anxiety for some people, with a high proportion of tweets calling on others to "wear a mask." People also discussed COVID-19 tests during this period. In particular, some people shared how stressful and anxiety-inducing taking the test, and also waiting for the results, can be.
Tweets contained within some of the other topic clusters generated by the platform, such as "care homes" and "covid deaths," did not appear to be related to mental health.
Specific examples of mental health tweets underpinned with negative sentiment from weeks 1 to 12 are presented in Textbox 6. Textbox 6. Sample of clustered tweets from weeks 1 to 12 with negative sentiment" to be consistent with the other textboxes.

Results of the Topic Analysis for Weeks 13 to 24
This section presents the results of a topic analysis of 171,202 scraped tweets posted from October 15, 2020 to January 6, 2021. Tables 4 and 5 present the top 10 most discussed topics that occurred throughout the study period.

Summary of Clustered Topics With Positive Sentiment
During this period, a tweet suggesting suicide rates had risen by 200% since lockdown was shared widely. The tweet contained contact details for a registered UK charity, Samaritans, urging people to reach out for support if needed. This viral tweet resulted in our analysis platform recognizing "suicide lockdown" as a key topic of discussion. The information being reported by this tweet was not accurate [17,18].
As with the previous 12 weeks, "mental health" remained a key topic of discussion underpinned with positive sentiment during this period. People discussed how the lockdown had, in some ways, had a positive impact on their mental health. Various people and organizations continued to share practical tips on how to support one's mental health, particularly around strategies to reduce "stress and anxiety." Many people continued to express concern for the mental health of perceived "vulnerable people" during lockdown, including children, young people, disabled people, those with learning difficulties, and those with any pre-existing mental health problems. Some people were calling on the government to provide further support for these groups as lockdown restrictions tightened. Many people were encouraging those that that were struggling to stay connected with others and reach out to "ones you love" and "friends and family." "Sleep" emerged as a key topic of discussion with positive sentiment. Many people discussed how important getting a good "night's sleep" was for their mental health, particularly during these "difficult times." People shared relaxation techniques they had used, which had helped them to fall asleep, and which may help others too.
Specific examples of mental health tweets underpinned with positive sentiment from weeks 13 to 24 are presented in Textbox 7.

My mental health is sooo much better now it's a real lockdown again. However I know this isn't a universal experience. For those who find lockdown harder for whatever reason, I'm a) sending virtual hugs if wanted but b) reminding you it's STRONG to reach out to a helpline! As we start lockdown 3, a reminder that your situation does not have to be the worst for it to suck and for you to get help! Reach out to your loved ones & professional mental health support if you need it. Stay safe.
Such a tough time at the moment for everyone, another lockdown especially in winter can be devastating for mental health. My dm's are always open for anyone who needs a chat -be kind and check on your loved ones.
Thanks to local organisations, community groups and faith institutions that have provided vital services, human support and companionship, in person and online, to many vulnerable people during #Covid-19. We will keep working with you all to build stronger and united communities.
If anyone is struggling with lockdown (or wants to reduce stress/anxiety). I would really recommend trying gratitude journaling. I've shared some tips below -I hope they'll be helpful #COVID19 wellbeing tip: make sure you get a good night's sleep! A good rest is so important for your mental and physical health, managing stress and much more. If you're struggling with sleep, try these tips and check out our Sleep self-help guide.

Summary of Clustered Topics With Negative Sentiment
"Mental health" continued to be widely discussed during this final 12-week period. Many people reflected on the negative impact that lockdowns had had on their mental health. There were concerns from some that any progress they had made with their mental health would be lost with another lockdown. There was continued anger toward the UK government about a perceived lack of support for those struggling with their mental health.
There was a lot of discussion around the looming national lockdown announced for January 4, 2021. Many people expressed concern about how long this lockdown would last and their hope that this would be the final lockdown. Some shared that they would be defying restrictions in order to prioritize their mental health. Others continued to argue that people were using their mental health as an excuse for not following the rules. There was continued worry about how the lockdown would affect perceived "vulnerable people." During this period, people shared their thoughts about the vaccine rollout. In particular, people were concerned about the length of time between jabs and the number of canceled vaccination appointments being reported by the media.
"Sleep" continued to be a key topic of discussion with many people sharing how they had not been sleeping well. Some people shared how they had been increasing their alcohol intake in an effort to help them sleep.
Specific examples of mental health tweets underpinned with negative sentiment from weeks 13 to 24 are presented in Textbox 8.

Principal Findings
In this study, we identified and analyzed 286,902 geolocated tweets posted from users in the United Kingdom from July 23, 2020 to January 6, 2021 using a commercially available NLP platform. The findings showed that there was a fairly consistent trend in the volume of tweets over the study period, with spikes typically occurring during (or leading up to) a major change in social distancing measures in the United Kingdom. The NLP platform calculated an overall sentiment score of 50% indicating neutral sentiment across all tweets over the study period. Similar to volume, major fluctuations in sentiment appeared to coincide with major changes to lockdown rules.
Key topics of discussion that emerged consistently throughout the study period included (1) the impact that the pandemic and resulting lockdowns had been having on people's mental health, both positive and negative; (2) fear and anxiety around the prospect of prolonged and subsequent lockdowns and how this might (or continue to) affect people's mental health; and (3) anger and mistrust toward the government concerning a perceived lack of support for people struggling with their mental health.
Later in (and less consistently discussed throughout) the study period, other topics linked with mental health emerged, including sleep difficulties, increased alcohol intake, and anxieties concerning testing and the vaccine rollout.
Before the study, we anticipated that topics of discussion relating to mental health would be mostly underpinned with negative sentiment. It was therefore surprising that the findings of the topic analysis revealed higher levels of positive sentiment across posts associated with mental health. Consistently over the study period, people took to Twitter to share practical tips, strategies, and resources that could be used to support one's mental health, and the platform was effective in clustering these types of posts with positive sentiment.
The viral spread of misinformation and "fake news" has represented a critical issue generating mass confusion, fear, and insecurity surrounding COVID-19 [19]. The WHO has repeatedly used the term "infodemic" to describe the sheer overabundance of misinformation being shared throughout the pandemic [20]. Our findings provide further example and insight into the rapid spread of health-related misinformation, particularly via channels of soft intelligence like social media. Specifically, in the case of this study, a copy-and-paste tweet campaign falsely claiming that suicide rates had increased by 200% since the first lockdown was shared widely by users. This particular cluster of tweets ranked as the topmost discussion topic during weeks 13 to 24 over the study period.
Overall, the results of this study demonstrate that using NLP to mine and analyze sources of soft intelligence (like Twitter) can yield useful health-related insights, which agencies, local leaders, and health care decision makers can potentially draw from. These findings contribute to a growing body of literature examining the value of this type of analyzed evidence and how it might support, link to, and (where appropriate) replace more traditional survey-based methods and data [15,[21][22][23][24].

Limitations
Several limitations can be attributed to this study. First, there are still considerable limitations concerning the reliability, accuracy, and transparency of the technologies in play. As an example, on examining the results of the NLP platform's topic analysis, some of the tweets collated were not relevant to mental health (despite being identified as such). For example, tweets contained within clustered topics like "care homes" and "covid deaths" were expressing anger at the government, rather than negatively discussing mental health problems.
Some of the tweets included by the platform in its analysis were posted by businesses or charitable organizations, rather than members of the public. Such tweets, which often advertised local or national mental health services or shared self-improvement strategies, were typically classified by the platform as having positive sentiment. This created a large amount of background noise and skewed the overall sentiment toward positive. Further, tweets were not deduplicated by the platform, nor was there any formal analysis accounting for potential bot traffic. These factors will also have impacted the results.
In addition, despite the popularity of Twitter as a social networking tool, its users are not an accurate representation of the overall demographic of a population. Therefore, if we are to consider using Twitter (and similar resources) as a potential intelligence source, we must be mindful of bias concerning the key demographic information among its users (such as age, gender, and socioeconomic status).
There was also a number of limitations specific to the NLP platform that we selected. We had originally planned to run the topic analysis using the platform across all of the tweets as a single corpus. However, the platform was not able to process and analyze such a large volume of tweets in one go. Therefore, we had to split and run the topic analysis over 2 time periods (weeks 1 to 12 and weeks 13 to 24). Further, it was not possible to retrospectively search for and collect historic tweets, thus restricting possible options for analysis. Finally, although the broader methodologies that power the platform are touched on by its developers, the finer technical detail is not shared publicly due to commercial reasons.

Conclusions
In this work, we analyzed a large collection of UK tweets relating to mental health during the COVID-19 pandemic to further explore the value of soft intelligence leveraged using NLP. Using a specialist, off-the-shelf, NLP platform, we collated a large corpus of tweets over a 24-week period and carried out various analyses to explore the volume, sentiment, and key trends and topics of discussion.
Our findings provide further evidence that this type of research is potentially a highly useful and efficient means to gain a rapid understanding of the key messages, concerns, and issues people are facing at scale. In the case of this reported study, we were able to draw insights into how the pandemic may be impacting people's mental health and well-being by examining both the topic and sentiment specific to the UK population. This type of real-time analysis and intelligence may be particularly useful in helping shape rapid and reactive public health engagement and communication strategies during a health crises like COVID-19.