Published on in Vol 4, No 1 (2018): Jan-Jun

Preprints (earlier versions) of this paper are available at, first published .
Use of Social Media in the Assessment of Relative Effectiveness: Explorative Review With Examples From Oncology

Use of Social Media in the Assessment of Relative Effectiveness: Explorative Review With Examples From Oncology

Use of Social Media in the Assessment of Relative Effectiveness: Explorative Review With Examples From Oncology

Original Paper

1National Health Care Institute, Diemen, Netherlands

2Department of Pharmacoepidemiology and Clinical Pharmacology, Utrecht University, Utrecht, Netherlands

3Department of Health Sciences, VU University Amsterdam, Amsterdam, Netherlands

Corresponding Author:

Rachel RJ Kalf, MSc

National Health Care Institute

Eekholt 4

Diemen, 1112 XH


Phone: 31 20797 ext 8188

Fax:31 207978500


Background: An element of health technology assessment constitutes assessing the clinical effectiveness of drugs, generally called relative effectiveness assessment. Little real-world evidence is available directly after market access, therefore randomized controlled trials are used to obtain information for relative effectiveness assessment. However, there is growing interest in using real-world data for relative effectiveness assessment. Social media may provide a source of real-world data.

Objective: We assessed the extent to which social media-generated health data has provided insights for relative effectiveness assessment.

Methods: An explorative literature review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines to identify examples in oncology where health data were collected using social media. Scientific and grey literature published between January 2010 and June 2016 was identified by four reviewers, who independently screened studies for eligibility and extracted data. A descriptive qualitative analysis was performed.

Results: Of 1032 articles identified, eight were included: four articles identified adverse events in response to cancer treatment, three articles disseminated quality of life surveys, and one study assessed the occurrence of disease-specific symptoms. Several strengths of social media-generated health data were highlighted in the articles, such as efficient collection of patient experiences and recruiting patients with rare diseases. Conversely, limitations included validation of authenticity and presence of information and selection bias.

Conclusions: Social media may provide a potential source of real-world data for relative effectiveness assessment, particularly on aspects such as adverse events, symptom occurrence, quality of life, and adherence behavior. This potential has not yet been fully realized and the degree of usefulness for relative effectiveness assessment should be further explored.

JMIR Cancer 2018;4(1):e11



Within the context of rising health care costs, limited budgets, and the onslaught of innovative yet expensive medications, the value of health technology assessment (HTA) for decision-makers, regulators, pharmaceutical companies and patients is becoming increasingly important. HTA is defined as “the systematic evaluation of the properties and effects of a health technology” [1]. Health technologies are defined as “interventions developed to prevent, diagnose or treat medical conditions, promote health, provide rehabilitation, or organize health care delivery” [2]. An important element of HTA is relative effectiveness, ie, the extent to which an intervention – provided under routine clinical conditions – does more good than harm in comparison to one or more alternatives [1]. Traditionally, a relative effectiveness assessment (REA) conducted directly after-market authorization of a new drug is extrapolated using health outcomes (eg, mortality) obtained from randomized controlled trials (RCTs), which are often considered the gold standard for this type of analysis. However, the tightly-controlled conditions and highly selective patient groups within RCTs may result in findings that are not generalizable to routine clinical settings where patients are more heterogeneous. In routine practice, pregnant women, children, elderly people and patients with comorbidities may eventually receive the new drugs examined in RCTs, while these patient populations are generally excluded from such RCTs. Therefore, researchers may additionally resort to real-world data (RWD) as a supplementary source of evidence to assess relative effectiveness. Real-world data can be defined as “an umbrella term for data regarding the effects of health interventions that are not collected in the context of conventional randomized controlled trials” [1]. Patient registries and electronic health records are established examples of RWD sources, but another potential source of RWD may be social media.

Social media are often used by patients as a source to search for information on their health conditions, share their experiences and find social support [3,4]. For example, many patients use Twitter to stay up to date with the latest health care developments and increase their knowledge on their disease, while Facebook is more often used for social support and exchanging experiences [3]. Social media users who have a chronic condition are more likely to use the internet for such purposes than are healthy social media users [5]. By assessing the content viewed, generated and exchanged by patients through social media, a considerable amount of information on patient perspectives and experiences can be gathered. Although social media have been used for different aspects of research, such as patient recruitment [6-8], dissemination of interventions [9,10] and education [11], little is known about its contribution to REA.

In 2008 a study showed that blogs could be used to collect patient experiences regarding diabetes and diabetes management to provide information for HTA by enhancing the evidence available in published literature [12]. More recently, several pharmaceutical companies have begun to make use of social media to gain insight into patient perspectives on adverse events (AEs) [13,14] and to assess their switching behaviors [15]. Similarly, the Association of the British Pharmaceutical Industry (ABPI) has published guidelines on best practices for the monitoring and management of AEs through such sources [16]. Moreover, the Food and Drug Administration (FDA) is increasingly focusing on the use of health data from social media by collaborating with PatientsLikeMe; a platform where patients can share their health data online to gain insight into patient perspectives on adverse events [17,18]. Considering these initiatives, it may become possible for health data reported by patients on social media to contribute to the REA of new therapies.

The aim of this article is to assess the extent to which health data generated from social media have provided insights for REA. We conducted an explorative review to identify examples in oncology where health data were collected using social media. Oncology was chosen due to the considerable number of innovative drugs being developed at a rapid pace in this area. For example, the European Medicines Agency reported in 2015 that one-third of the medicines with a new active substance recommended for market access were for cancer treatment [19]. As mentioned earlier, REAs of drugs are traditionally based on health outcomes such as overall survival and progression-free survival. However, considering the often-marginal differences in overall survival and progression-free survival for oncological drugs, information on AEs, adherence and quality of life is becoming even more important in REA [20]. Collecting these aspects from RCTs can be difficult, therefore other data sources such as social media may be useful. For the purposes of this explorative review, social media were defined as “a group of Internet-based applications that allow the creation and exchange of user-generated content” [21].

An explorative review was performed based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines [22]. To identify scientific literature, a search for peer-reviewed published articles was carried out in MEDLINE through the PubMed interface for the period between 1 January 2010 and 28 June 2016. The following search query was used: (Facebook[tiab] OR Twitter[tiab] OR blog[tiab] OR blogging[mesh] OR “social media”[tiab] OR ehealth[tiab] OR e-health[tiab] OR “online community”[tiab] OR “online communities”[tiab] OR “online patient”[tiab] OR “health data”[tiab] OR (online [tiab] AND research[tiab] AND platform*[tiab]) OR (personal*[tiab] AND health[tiab] AND record*[tiab]) OR (online[tiab] AND patient[tiab] AND communit*[tiab]) OR (online[tiab] AND data[tiab] AND shar*[tiab])) AND (oncolog*[tiab] OR cancer[tiab] OR carcinoma[tiab] OR metast*[tiab] OR neoplasms[mesh] OR melanoma[tiab] OR tumor[tiab] OR tumour[tiab]). The reference lists from the literature, which were included based on title and abstract, were hand-searched to identify additional literature. To extend the literature search, the top four health informatics journals according to SCImago Journal and Country Rank [23] were included, namely GigaScience, BMC Medical Research Methodology, Open Bioinformatics Journal, and Journal of Medical Internet Research. The websites of these health informatics journals were hand-searched by assessing theme issues and by using the following keywords: “oncology, cancer, carcinoma, metastasis, neoplasm, tumor, tumour, blog, blogging, social media, e-health, online or health data”.

A Google search was conducted in July and August 2016 to identify grey literature, such as relevant websites, by combining the following keywords: “social media”, “online patient”, “online research platform”, “relative effectiveness”, “health research”, “effectiveness research”, “pharmacovigilance”, “adherence”, and “to measure quality of life”. Before each search, the history of the browser was cleared to ensure findings would not be influenced by previous search queries. Due to the vast number of websites retrieved through the Google search, only websites that collect health data online, focus on patient-reported outcomes, or provide online information on drugs and conditions were deemed relevant for further analysis. The selection of relevant websites was also based on consensus between the authors RK and RtH. These websites were hand-searched to identify grey literature by browsing through the website in search of relevant reports or documents and by using the following keywords: “social media”, “internet”, “Facebook”, “Twitter”, “pharmacovigilance” or “health research”. These keywords were different from those used for the Google search due to the character of the platform (ie, a Google search is inherently different from searching a website). The following websites were included: PatientsLikeMe, Microsoft HealthVault, Dossio, CureTogether, WhatNext, MyGly, Drug Information Association, WEB-RADR, National Patient-Centered Clinical Research Network, College ter Beoordeling van Geneesmiddelen, Handle My Health, European Alliance for Personalized Medicine, Lareb, WHO Monitoring Centre for Pharmacovigilance Uppsala, PEW Research Center, Social Media Research Foundation, Treato, MediGuard,, and iVitality.

The review was conducted by four reviewers (RK, AM, RtH and KM) and the resulting literature was independently screened by the reviewers for eligibility. The titles and abstracts from scientific literature were assessed by RK, AM and KM, while grey literature was assessed by RK and RtH. Literature was considered eligible for inclusion when it was: 1) published between 1 January 2010 and 28 June 2016, 2) available in English, 3) examples were provided where social media were used to collect health data, 4) literature focused on cancer or cancer treatment, and 5) literature was either a peer-reviewed original research article or a report that was available in the public domain. We excluded literature that did not meet all inclusion criteria. Relevant full articles and reports were retrieved and reviewed for inclusion.

Two reviewers (RK and AM) independently extracted data from all included articles and reports using a predefined data abstraction form. Information on study characteristics (eg, study design, study period, type of social media used), and the strengths, limitations and acceptability of using social media to generate health data were extracted. Disagreements in data extracted were resolved by consensus amongst RK and AM.

A descriptive qualitative analysis of the extracted data was carried out, since the topics, methods and outcomes of included literature were notably diverse.

A total of 2351 citations were identified from scientific literature (n=879), a hand search of reference lists from scientific literature (n=56), grey literature (n=97), and a hand search of health informatics journals (n=1319). From these, a total of 2290 citations were excluded based on title or abstract, additionally 26 duplicates were excluded. Of the 35 full scientific publications and documents assessed, 27 were excluded: 15 citations did not provide an example of health data collection, 9 were not oncology-specific, and 3 provided insufficient information on the collection of health data. Data were abstracted from a total of 8 scientific publications (Figure 1).

Table 1 provides an overview of the eight scientific publications included. Different types of cancer and medications were assessed in each of the publications. The focus of all eight articles was testing the feasibility and added value of generating health data from social media, such as AEs, QoL, adherence, symptom occurrence and experience from social media.

Table 2 shows that publications differed substantially in study design, study period, the number of posts analyzed and the number of respondents included in the analysis. Forum topics and discussions were assessed in four papers, in two studies a survey was posted on the Facebook page of either a patient community or support group, in one study Twitter conversations were assessed and in one study an online patient platform was used to disseminate a survey. Of the eight studies, a total of four studies collected health data on AEs [24,25,28,30]. More specifically, three of these publications presented the AEs identified on the forums included [24,28,30], while the fourth publication focused on comparing AEs mentioned online to AEs reported to the FDA [25]. Another three studies collected health data on quality of life (QoL) [26,27,31]. Each study used different QoL instruments, such as the Concerns About Recurrence Scale scores [31], and short form-36 health survey [26]. Finally, one study focused on identifying symptom (co-) occurrence [29]. In addition to the main outcome measures, van der Heijden et al, McCarrier et al, and Zaid et al [26,27,31] collected data on socio-demographic factors and disease specific characteristics. Furthermore, Beusterien et al collected health data on physical functioning and emotional impacts [24], and Mao et al collected information on adherence by mapping decisions about continuing or stopping treatment [28].

The four publications that used forums to collect health data varied substantially in the explanation for their forum selection (Table 3). For example, Beusterien et al used two search engines and two different computers for their forum search which they repeated every other day for two weeks. Additionally, they used selection criteria to include the two forums (ie, site active >5 years, >12,000 posts on forum, >20 individuals currently browsing, and >10 new posts per day) [24]. Meanwhile, Marshall et al selected one forum without clarifying selection criteria for the selected forum [29]. The other four publications, making use of Twitter, Facebook or an online patient platform, selected this social media platform due to the access of a large volume of health data [25] or access to a patient community [26,27,31].

Regarding the use of automated processes to collect health data from social media, two publications specifically indicated to have used a web crawler [28,29] and one publication made use of the Twitter application programming interface [25]. Two of the included publications indicated to have collected all the forum posts related to search terms without specifically indicating the collection method used [24,30] and three publications used the social media platform to distribute a survey [26,27,31]. Automated techniques were used by Freifeld et al, Mao et al and Marshall et al to analyze the health data collected [25,28,29]. Freifeld et al used a tree-based dictionary-matching algorithm to identify specific text from the forum posts collected, and furthermore used a Natural Language Processing (NLP) semi-automated classifier was used to identify AEs [25]. Mao et al also used NLP to identify AEs [28], and Marshall et al used NLP in a data mining algorithm to identify symptoms [29]. The remaining five publications made use of content analysis [24,27], descriptive or quantitative analysis (eg, chi-squared test) [26,31], or labelled forum posts manually [30].

Figure 1. Flowchart of the literature review process.
View this figure
Table 1. Overview of included scientific publications.
StudyAimCancer TypeDrug
Beusterien et al 2013 [24]To better understand patient experience with colorectal cancer chemotherapies in the real-world settingColorectal cancerChemo-therapeutic agents
Freifeld et al 2014 [25]To evaluate the level of concordance between Twitter posts mentioning AEa-like reactions and spontaneous reports received by a regulatory agencyN/AbMethotrexatec
van der Heijden et al 2016 [26]To investigate whether we could use crowdsourcing via Facebook and online surveys for medical research purposes on pigmented villonodular synovitisPigmented villonodular synovitisN/A
McCarrier et al 2016 [27]To explore the feasibility of using social media-based patient networks to gather qualitative data on patient-reported outcome concepts relevant to chronic lymphocytic leukaemiaChronic lymphocytic leukaemiaN/A
Mao et al, 2013 [28]To understand frequency and content of AE’s and associated adherence behaviors discussed by breast cancer patients related to using aromatase inhibitorsBreast CancerAromatase inhibitors
Marshall et al, 2015 [29]To identify and examine symptom patterns generated by data extracted from a breast cancer forum, and compare these findings to an analysis of symptoms reported by breast cancer survivors enrolled in a research study and who responded to a symptom checklistBreast CancerN/A
Pages et al, 2014 [30]To describe the characteristics of AE’s reported by patients exposed to oral antineoplastic agents in an online discussion, and compare these with those reported by health professionals as recorded in the French pharmacovigilance databaseCancerOral antineoplastic agents
Zaid et al, 2014 [31]To determine the feasibility of using social media to perform cross-sectional epidemiologic and quality of life research on patients with rare gynaecologic tumoursNeuroendocrine carcinoma of the cervixN/A

aAE: adverse events.

bN/A: not applicable.

cThis study assessed adverse events reported in social media for a total of 23 drugs and 4 vaccines, including 1 drug (methotrexate) specific for oncology.

In Table 4 the strengths and limitations of health data generated through social media that were identified in the eight included publications are presented. Five publications identified the ability to assess patient perspectives as an important strength [24,25,28-30]. The ability to access patients who have rare diseases or are distributed over wide geographic areas was considered a major strength by five publications [26-29,31]. Furthermore, Freifeld et al, Marshall et al and Pages et al emphasized that social media should complement conventional (pharmacovigilance) methods, since a difference between results from social media and conventional methods may be present [25,29,30]. For example, patients were shown to report different AEs compared to health professionals who traditionally provide this information [30]. Other strengths identified included the efficient collection of patient-reported outcomes [24], the short time-period needed to survey patients [29,31], and the identification of new or unlabelled AEs [30].

Limitations of social media-generated health data mainly focused on validating authenticity, selection bias, information bias, and the inability to actively probe patients for responses. Validating authenticity focuses on the difficulty of verifying the accuracy of information provided through social media [26,29], such as verifying whether posters have the disease [27,31] or are indeed on the drugs [24,27] they discuss. Regarding selection bias, publications reported differences in the patient population that use social media compared to those who do not; for example, patients using social media are conventionally more highly educated [24,29], are more likely to be female [26,27], may have a different symptom experience [28], and are generally younger [27,29,31]. With regards to information bias, Freifeld et al and Pages et al reported duplication of posts [25,30], Mao et al reported multiple posts by the same patients [28], and Freifeld et al indicated that patients may not identify AEs correctly [25]. Finally, several publications mentioned the inability of using social media to actively probe patients for responses [24,27,29]. For example, patients may use alternative wording than that which researchers anticipate, which could lead to misclassifying symptom experiences [29].

Regarding the acceptability of using social media to generate health data, Pages et al indicated that pharmaceutical companies are already using this type of data to gather information on AEs from patient perspectives [30]. Furthermore, Beusterien et al indicated that in patient-reported outcomes research, patient perspectives are commonly accepted with regards to disease and treatment impact [24], and both Freifeld et al and van der Heijden et al noted the importance of insights into the patient perspective provided by social media research for regulatory authorities [25,26]. However, Freifeld et al was also cautious on the use of social media to generate health data [25]. Reasons for their caution was the need to still establish its role in pharmacovigilance as social media are not yet used in routine surveillance. Additionally, they indicated that data acquisition from social media and automation need to be improved.

Table 2. Study characteristics of included scientific publications that use social media to collect health data.
StudyStudy designStudy
RespondentsType of social media
used to collect health
Type of health data collected
Beusterien et al 2013 [24]Cross-sectional

52 days15222642 disease-specific forumsAdverse events, physical functioning & emotional impacts
Freifeld et al 2014 [25]Retrospective7 months6,900,000N/AaTwitterAdverse events
van der Heijden et al 2016 [26]Prospective70 monthsN/A272Facebook (patient community)Socio-demographic factors, disease-specific characteristicsb, functional outcome, and QoLc
McCarrier et al 2016 [27]Cross-sectional4 monthsN/A50Online patient platformSocio-demographic factors, disease-specific characteristicsd, experience of symptoms, perceptions about treatment, and QoL
Mao et al 2013 [28]Retrospective

8 years1,235,400N/A12 disease-specific forumsAdverse events and adherence
Marshall et al 2015 [29]Retrospective8 years50,42612,9911 disease-specific forumSymptom occurrence, co-occurrence, and similarity index of 25 preselected symptoms.
Pages et al 2014 [30]Retrospective1 year111665 health forumsAdverse events
Zaid et al 2014 [31]Cross-sectional30 daysN/A57Facebook (support group)Socio-demographic factors, disease-specific characteristicse, and QoL

aN/A: not applicable.

bDisease-specific characteristics include clinical presentation, findings on imaging and biopsy material, type and localization of disease, surgical and adjuvant treatment, local recurrences, and post-operative complications.

cQoL: quality of life.

dDisease-specific characteristics include self-reported current chronic lymphocytic leukaemia stage, performance status, and past and current treatment.

eDisease-specific characteristics include clinical presentation, initial work-up, treatments, past and current disease status, follow-up, and recurrence pattern.

Table 3. Selection of social media platform and use of automated techniques by included literature that use social media to collect health data.
StudyClear explanation for selection of social media platformWeb crawler used for collecting social media health dataAutomated technique used for analysis of health data
Beusterien et al 2013 [24]YesNoNo
Freifeld et al 2014 [25]YesNoaYes
van der Heijden et al 2016 [26]YesNobNo
McCarrier et al 2016 [27]YesNobNo
Mao et al 2013 [28]YesYesYes
Marshall et al 2015 [29]NoYesYes
Pages et al 2014 [30]YesNoNo
Zaid et al 2014 [31]YesNobNo

aThe Twitter application programming interface (API) was used to identify relevant tweets.

bA survey was distributed via the social media platform.

Table 4. Strengths and limitations specific to the use of social media to generate health data.
Beusterien et al 2013 [24]Patient perspective; efficient and comprehensive collection of PROMSaValidating authenticity: selection bias; no active probing of patient responses; incomplete information of sample
Freifeld et al 2014 [25]Patient perspective; complementary to pharmacovigilance; rapid information on AEsbInformation bias; volume of posts; noisy data
van der Heijden et al 2016 [26]Access to patients with rare diseases; collection of PROMS; convenient to fill in; long-term follow-upValidating authenticity; selection bias; low participation rate
McCarrier et al 2016 [27]Alternative approaches to qualitative data collection; support development of PROc instruments; access to patients with rare diseases; motivated patients; lower costs per enrolled patientValidating authenticity; selection bias; no active probing of patient responses; not achieving concept saturation; larger sample sizes needed
Mao et al 2013 [28]Patient perspective; access to patients distributed over wide geographic areas; increased generalizability due to more diverse patient population; observed frequency key AEs reflected those reported in traditional studiesSelection bias; information bias; frequency data is not an indication of prevalence AEs
Marshall et al 2016 [29]Vast quantities of data; easily accessible information; short time-period; access to patients with rare diseases; low costs; patient perspective; complementary to traditional studiesValidating authenticity; selection bias; noisy data; no active probing of patient responses; incomplete information of sample; data quality or format inadequate; ethical considerations; misinterpretation of posts
Pages et al 2014 [30]Patient perspective; complementary to pharmacovigilance; identification new or unlabelled AEsInformation bias
Zaid et al 2014 [31]Access to patients with rare diseases and that are distributed over wide geographic areas; short time-period; motivated patientsValidating authenticity; selection bias

aPROMS: patient-reported outcome measures.

bAE: adverse event.

cPRO: patient-reported outcome.

This explorative review demonstrates that, within the field of oncology, social media could be used for assessing AEs by collecting health data from forums and to evaluate QoL through Facebook or online patient platforms. Social media provides an opportunity to efficiently assess patient perspectives and collect health data from patients with rare diseases that are distributed over wide geographic areas. However, validating the authenticity of health data from social media is difficult, and is prone to selection and information bias. Furthermore, this type of data should be used complementary to traditional forms of research. Finally, this review provides additional insights, compared to reviews that focus on social media to inform pharmacovigilance [32,33], by focusing on the use of social media to inform relative effectiveness assessments.

Arguably, the results found in this review on social media-generated data in oncology may not be generalizable to other fields of medicine, since different types of health data, social media or analysis may be of importance in other fields of medicine. However, many studies conducted in fields of medicine other than oncology similarly focused on identifying AEs [32-38], suggesting our results are at least partially generalizable. Although little is known about assessing QoL through social media in other fields of medicine, there is potential for this mode of health data collection since QoL is often difficult to measure in RCTs and observational studies [20]. Finally, as our results show, another aspect of relative effectiveness that may be assessed through social media is treatment-switching and adherence behavior. A few pharmaceutical companies have been assessing this aspect already, thus demonstrating its potential [14,15,39]. Given the possibility of social media to generate data on AEs, QoL, and treatment-switching and adherence behavior, there is a great potential for social media-generated health data to enrich REA by incorporating information on these aspects.

One caveat of using social media to collect health data that requires special attention is the lack of clear methodological guidance. Standardized approaches to collecting health data from social media are necessary to ensure comparability and reproducibility between studies. For example, posts may either be extracted manually or by automated processes. The interpretation of these posts could also be done manually or by automated processes. However, some argue that automated processes may be unable to successfully interpret sarcasm in text posted on social media [25], while others argue that automated natural language processing could assist in analyzing the vast amounts of data available on social media [33,40,41]. Another methodological issue involves the use of correct search terms, as posts may include misspellings, non-medical terms, and slang [25,33,42]. Additionally, several studies reported important methodological limitations to consider when assessing data from social media, which include validating authenticity (eg, posts may be not genuine) [43-45], selection bias (eg, social media users may differ in age, gender, ethnicity and physical location compared to non-users) [42,44,45] and information bias (eg, patients may be taking a specific drug but fail to report the drug or its effects) [43,45]. To manage these methodological limitations, it is important to systematically assess the risk of bias to determine the quality of the health data collected through social media. Extracting relevant health data from social media may be difficult and challenging due to the issues described above. Clear and uniform methodological guidance may improve the extraction, interpretation and subsequent use of social media to collect health data. An additional caveat that may hamper the use of social media for collecting health data for REA is the perceived risk of easy manipulation. A recent example of manipulation in social media was the circulation of fake news on social media during the 2016 elections in the United States of America [46-48]. These kind of examples affects the ability of social media users to discern what is true and correct information. However, although manipulation may occur, many still use social media to find information and to exchange experiences. Therefore, harnessing and analysing the vast amount of health data available on social media remains important.

Although caveats can be recognized in the use of social media-generated health data, the added value of collecting information on patients’ perspectives and experiences towards relative effectiveness (eg, AEs, quality of life, switching-behavior) should be highlighted. For example, health data collected through social media may uncover AEs that occur after long-term use of new drugs, or they may detect AEs earlier compared to traditional methods [44,49], or provide insights that are not available in published literature (eg, diabetes patient experiences with laser therapy) [12]. Additionally, social media may be a better source to identify AEs that are mild or symptom-related compared to more traditional methods [44]. However, health data collected through social media should be used in conjunction with traditional methods to ensure the collection of a comprehensive overview of aspects that can provide information for REA.

Important for the comprehensiveness of this review is that we assessed both academic and grey literature, which minimizes the possibility of missing important insights. Additionally, we ensured the quality of the review through data abstraction conducted by two authors, which allowed a better substantiation of deductions made.

One limitation of this review was the focus on oncology, which may have resulted in missing literature on other aspects related to REA that could potentially be collected using social media. For example, PatientsLikeMe, an online patient platform that allows patients to share health data or exchange experiences on conditions and medications, published a few studies on the effectiveness of off-label drug use [43,50]. Additionally, PatientsLikeMe published a study focused on assessing the impact of menopause on disease severity in patients with multiple sclerosis. [51] These types of data may contribute to providing information for REA. The focus on oncology in this review was deemed appropriate since many new drugs are developed in the field of oncology, studies that assess these new drugs can be small and incomplete, and the European Medicines Agency and the European Network for Health Technology Assessment are also putting focus on the assessment of oncological drugs.

A second limitation relates to the search strategy employed in this explorative review. Firstly, the broad definition of social media that was used in this review may not allow for differentiating between passively collecting data (eg, by collecting posts from a forum) and actively collecting data (eg, by posting a survey on Facebook). There may be a difference in the information available from passively collecting information that patients discuss and post on social media, compared to actively posing questions to these patients in a survey. Secondly, by employing one database for our scientific and grey literature search we may have missed studies published in relevant journals that are not indexed by PubMed or grey literature that was not identified by the Google search engine. To overcome this limitation to some extent, we hand-searched the reference lists of included studies, based on title and abstract, and identified a few articles that had not been captured in the PubMed and Google search.

Social media may be a potential source of RWD for REA, particularly on aspects such as AEs, occurrence of disease-specific symptoms, adherence behavior, and QoL. This potential has not yet been fully realized due to methodological limitations that accompany social media-generated health data, like information bias and selection bias, as well as the limited acceptability of such data. However, the degree of usefulness of such data for relative effectiveness assessment should be further explored. Moreover, methodological guidelines and tools should be developed to address the limitations mentioned above.


The work leading to these results has received support from the Innovative Medicines Initiative Joint Undertaking under grant agreement number (115546), resources of which are composed of financial contribution from the European Union's Seventh Framework Programme (FP7/2007-2013) and EFPIA companies’ in-kind contribution. The research leading to these results was conducted as part of the GetReal consortium. For further information please refer to the consortium's website [52]. This paper reflects the personal views of the stated authors.

Conflicts of Interest

None declared.

  1. Makady A, Goettsch W. GetReal. 2015. Glossary of definitions of common terms   URL: http:/​/www.​​Portals/​1/​Documents/​01%20deliverables/​D1.​3%20-%20Revised%20GetReal%20glossary%20-%20FINAL%20updated%20version_25Oct16_webversion.​pdf [accessed 2018-05-05] [WebCite Cache]
  2. HTAi. HTA Glossary: Health Technology   URL: [accessed 2018-05-05] [WebCite Cache]
  3. Antheunis ML, Tates K, Nieboer TE. Patients' and health professionals' use of social media in health care: motives, barriers and expectations. Patient Educ Couns 2013 Sep;92(3):426-431. [CrossRef] [Medline]
  4. Tsuya A, Sugawara Y, Tanaka A, Narimatsu H. Do cancer patients tweet? Examining the twitter use of cancer patients in Japan. J Med Internet Res 2014;16(5):e137 [FREE Full text] [CrossRef] [Medline]
  5. Pew Research Center. 2013. The diagnosis difference   URL: [accessed 2018-05-05] [WebCite Cache]
  6. Kapp JM, Peters C, Oliver DP. Research recruitment using Facebook advertising: big potential, big challenges. J Cancer Educ 2013 Mar;28(1):134-137. [CrossRef] [Medline]
  7. Khatri C, Chapman SJ, Glasbey J, Kelly M, Nepogodiev D, Bhangu A, on behalf of the STARSurg Committee. Social media and internet driven study recruitment: evaluating a new model for promoting collaborator engagement and participation. PLoS One 2015;10(3):e0118899 [FREE Full text] [CrossRef] [Medline]
  8. Lane TS, Armin J, Gordon JS. Online Recruitment Methods for Web-Based and Mobile Health Studies: A Review of the Literature. J Med Internet Res 2015;17(7):e183 [FREE Full text] [CrossRef] [Medline]
  9. Cavallo DN, Chou WS, McQueen A, Ramirez A, Riley WT. Cancer prevention and control interventions using social media: user-generated approaches. Cancer Epidemiol Biomarkers Prev 2014 Sep;23(9):1953-1956. [CrossRef] [Medline]
  10. Laranjo L, Arguel A, Neves AL, Gallagher AM, Kaplan R, Mortimer N, et al. The influence of social networking sites on health behavior change: a systematic review and meta-analysis. J Am Med Inform Assoc 2015 Jan;22(1):243-256. [CrossRef] [Medline]
  11. Dizon DS, Graham D, Thompson MA, Johnson LJ, Johnston C, Fisch MJ, et al. Practical guidance: the use of social media in oncology practice. J Oncol Pract 2012 Sep;8(5):e114-e124 [FREE Full text] [CrossRef] [Medline]
  12. Street JM, Braunack-Mayer AJ, Facey K, Ashcroft RE, Hiller JE. Virtual community consultation? Using the literature and weblogs to link community perspectives and health technology assessment. Health Expect 2008 Jun;11(2):189-200 [FREE Full text] [CrossRef] [Medline]
  13. Powell GE, Seifert HA, Reblin T, Burstein PJ, Blowers J, Menius JA, et al. Social Media Listening for Routine Post-Marketing Safety Surveillance. Drug Saf 2016 May;39(5):443-454. [CrossRef] [Medline]
  14. Sukkar E. The Pharmaceutical Journal. 2015. Searching social networks to detect adverse reactions   URL: http:/​/www.​​news-and-analysis/​features/​searching-social-networks-to-detect-adverse-reactions/​20067624.​article [accessed 2018-05-05] [WebCite Cache]
  15. Risson V, Saini D, Bonzani I, Huisman A, Olson M. Patterns of Treatment Switching in Multiple Sclerosis Therapies in US Patients Active on Social Media: Application of Social Media Content Analysis to Health Outcomes Research. J Med Internet Res 2016;18(3):e62 [FREE Full text] [CrossRef] [Medline]
  16. ABPI Pharmacovigilance Expert Network. 2013. Guidance notes on the management of adverse events and product complaints from digital media   URL: [accessed 2018-05-05] [WebCite Cache]
  17. PatientsLikeMe. 2015. PatientsLikeMe and the FDA Sign Research Collaboration Agreement   URL: http:/​/blog.​​2015/​06/​15/​patientslikeme-and-the-fda-sign-research-collaboration-agreement/​ [accessed 2018-05-05] [WebCite Cache]
  18. Wicks P, Massagli M, Frost J, Brownstein C, Okun S, Vaughan T, et al. Sharing health data for better outcomes on PatientsLikeMe. J Med Internet Res 2010;12(2). [CrossRef] [Medline]
  19. European Medicines Agency. 2015. Annual Report 2015   URL: [accessed 2018-05-05] [WebCite Cache]
  20. Kleijnen S, Lipska I, Leonardo AT, Meijboom K, Elsada A, Vervölgyi V, et al. Relative effectiveness assessments of oncology medicines for pricing and reimbursement decisions in European countries. Ann Oncol 2016 Sep;27(9):1768-1775. [CrossRef] [Medline]
  21. Kaplan AM, Haenlein M. Users of the world, unite! The challenges and opportunities of Social Media. Business Horizons 2010 Jan;53(1):59-68. [CrossRef]
  22. Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Int J Surg 2010;8(5):336-341 [FREE Full text] [CrossRef] [Medline]
  23. SCImago. 2017. SJR SCImago Journal & Country Rank   URL: [WebCite Cache]
  24. Beusterien K, Tsay S, Gholizadeh S, Su Y. Real-world experience with colorectal cancer chemotherapies: patient web forum analysis. Ecancermedicalscience 2013;7:361 [FREE Full text] [CrossRef] [Medline]
  25. Freifeld CC, Brownstein JS, Menone CM, Bao W, Filice R, Kass-Hout T, et al. Digital drug safety surveillance: monitoring pharmaceutical products in twitter. Drug Saf 2014 May;37(5):343-350 [FREE Full text] [CrossRef] [Medline]
  26. van der Heijden L, Piner SR, van de Sande MAJ. Pigmented villonodular synovitis: a crowdsourcing study of two hundred and seventy two patients. Int Orthop 2016 Dec;40(12):2459-2468. [CrossRef] [Medline]
  27. McCarrier KP, Bull S, Fleming S, Simacek K, Wicks P, Cella D, et al. Concept Elicitation Within Patient-Powered Research Networks: A Feasibility Study in Chronic Lymphocytic Leukemia. Value Health 2016 Jan;19(1):42-52 [FREE Full text] [CrossRef] [Medline]
  28. Mao JJ, Chung A, Benton A, Hill S, Ungar L, Leonard CE, et al. Online discussion of drug side effects and discontinuation among breast cancer survivors. Pharmacoepidemiol Drug Saf 2013 Mar;22(3):256-262 [FREE Full text] [CrossRef] [Medline]
  29. Marshall SA, Yang CC, Ping Q, Zhao M, Avis NE, Ip EH. Symptom clusters in women with breast cancer: an analysis of data from social media and a research study. Qual Life Res 2016 Mar;25(3):547-557 [FREE Full text] [CrossRef] [Medline]
  30. Pages A, Bondon-Guitton E, Montastruc JL, Bagheri H. Undesirable effects related to oral antineoplastic drugs: comparison between patients' internet narratives and a national pharmacovigilance database. Drug Saf 2014 Aug;37(8):629-637. [CrossRef] [Medline]
  31. Zaid T, Burzawa J, Basen-Engquist K, Bodurka DC, Ramondetta LM, Brown J, et al. Use of social media to conduct a cross-sectional epidemiologic and quality of life survey of patients with neuroendocrine carcinoma of the cervix: a feasibility study. Gynecol Oncol 2014 Jan;132(1):149-153 [FREE Full text] [CrossRef] [Medline]
  32. Lardon J, Abdellaoui R, Bellet F, Asfari H, Souvignet J, Texier N, et al. Adverse Drug Reaction Identification and Extraction in Social Media: A Scoping Review. J Med Internet Res 2015 Jul 10;17(7):e171 [FREE Full text] [CrossRef] [Medline]
  33. Sarker A, Ginn R, Nikfarjam A, O'Connor K, Smith K, Jayaraman S, et al. Utilizing social media data for pharmacovigilance: A review. J Biomed Inform 2015 Apr;54:202-212 [FREE Full text] [CrossRef] [Medline]
  34. Abou TM, Rossard C, Cantaloube L, Bouscaren N, Roche G, Pochard L, et al. Analysis of patients' narratives posted on social media websites on benfluorex's (Mediator® ) withdrawal in France. J Clin Pharm Ther 2014 Feb;39(1):53-55. [CrossRef] [Medline]
  35. Liu X, Chen H. Identifying adverse drug events from patient social media: A case study for diabetes. IEEE Intell. Syst 2015 May;30(3):44-51. [CrossRef]
  36. Topaz M, Lai K, Dhopeshwarkar N, Seger DL, Sa'adon R, Goss F, et al. Clinicians' Reports in Electronic Health Records Versus Patients' Concerns in Social Media: A Pilot Study of Adverse Drug Reactions of Aspirin and Atorvastatin. Drug Saf 2016 Mar;39(3):241-250. [CrossRef] [Medline]
  37. Wu H, Fang H, Stanhope SJ. Exploiting online discussions to discover unrecognized drug side effects. Methods Inf Med 2013;52(2):152-159. [CrossRef] [Medline]
  38. Hughes S, Cohen D. Can online consumers contribute to drug knowledge? A mixed-methods comparison of consumer-generated and professionally controlled psychotropic medication information on the internet. J Med Internet Res 2011;13(3):e53 [FREE Full text] [CrossRef] [Medline]
  39. Petersen C. PM360 The essential resource for pharma marketers. 2015. Concrete Examples of Social Listening's Impact on Pharma   URL: [WebCite Cache]
  40. Baldwin T, Cook P, Lui M, MacKinlay A, Wang L. How noisy social media text, how diffrnt social media sources? 2013 Presented at: The 6th International Joint Conference on Natural Language Processing (IJCNLP); 2013; Nagoya, Japan.
  41. Reyes A, Rosso P, Buscaldi D. From humor recognition to irony detection: The figurative language of social media. Data & Knowledge Engineering 2012 Apr;74:1-12. [CrossRef]
  42. Sloane R, Osanlou O, Lewis D, Bollegala D, Maskell S, Pirmohamed M. Social media and pharmacovigilance: A review of the opportunities and challenges. Br J Clin Pharmacol 2015 Oct;80(4):910-920 [FREE Full text] [CrossRef] [Medline]
  43. Frost J, Okun S, Vaughan T, Heywood J, Wicks P. Patient-reported outcomes as a source of evidence in off-label prescribing: analysis of data from PatientsLikeMe. J Med Internet Res 2011;13(1):e6 [FREE Full text] [CrossRef] [Medline]
  44. Golder S, Norman G, Loke YK. Systematic review on the prevalence, frequency and comparative value of adverse events data in social media. Br J Clin Pharmacol 2015 Oct;80(4):878-888. [CrossRef] [Medline]
  45. Leng HK. Methodological issues in using data from social networking sites. Cyberpsychol Behav Soc Netw 2013 Sep;16(9):686-689. [CrossRef] [Medline]
  46. Allcott H, Gentzkow M. Social Media and Fake News in the 2016 Election. Journal of Economic Perspectives 2017 May;31(2):211-236. [CrossRef]
  47. Silverman C. Buzzfeed. 2016. This Analysis Shows How Viral Face Election News Stories Outperformed Real News On Facebook   URL: https:/​/www.​​craigsilverman/​viral-fake-election-news-outperformed-real-news-on-facebook?utm_term=.​ymmRRxdMPk#.​mgpLLo5NM9 [accessed 2018-05-05] [WebCite Cache]
  48. Perrott K. ABC News. 2016. 'Fake new' on social media influences US election voters, experts say   URL: http:/​/www.​​news/​2016-11-14/​fake-news-would-have-influenced-us-election-experts-say/​8024660 [accessed 2018-05-05] [WebCite Cache]
  49. Pierce CE, Bouri K, Pamer C, Proestel S, Rodriguez HW, Van LH, et al. Evaluation of Facebook and Twitter Monitoring to Detect Safety Signals for Medical Products: An Analysis of Recent FDA Safety Alerts. Drug Saf 2017 Apr;40(4):317-331 [FREE Full text] [CrossRef] [Medline]
  50. Wicks P, Vaughan TE, Massagli MP, Heywood J. Accelerated clinical discovery using self-reported patient data collected online and a patient-matching algorithm. Nat Biotechnol 2011 May;29(5):411-414. [CrossRef] [Medline]
  51. Bove R, Healy BC, Secor E, Vaughan T, Katic B, Chitnis T, et al. Patients report worse MS symptoms after menopause: findings from an online cohort. Mult Scler Relat Disord 2015 Jan;4(1):18-24. [CrossRef] [Medline]
  52. IMI-GetReal. New Methods for RWE Collection and Synthesis   URL: [WebCite Cache]

HTA: health technology assessment
NLP: Natural Language Processing
QoL: quality of life
RCT: randomized controlled trial
REA: relative effectiveness assessment
RWD: real-world data

Edited by G Eysenbach; submitted 02.05.17; peer-reviewed by M Lambooij, H Narimatsu, T Kass-Hout, S Golder, A Sarker, L Laranjo, M Colder Carras; comments to author 01.08.17; revised version received 31.10.17; accepted 16.03.18; published 08.06.18


©Rachel R.J. Kalf, Amr Makady, Renske M.T. ten Ham, Kim Meijboom, Wim G. Goettsch, On Behalf Of IMI-GetReal Workpackage 1. Originally published in JMIR Cancer (, 08.06.2018.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Cancer, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.