This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Cancer, is properly cited. The complete bibliographic information, a link to the original publication on https://cancer.jmir.org/, as well as this copyright and license information must be included.
Patients with breast cancer have a variety of worries and need multifaceted information support. Their accumulated posts on social media contain rich descriptions of their daily worries concerning issues such as treatment, family, and finances. It is important to identify these issues to help patients with breast cancer to resolve their worries and obtain reliable information.
This study aimed to extract and classify multiple worries from text generated by patients with breast cancer using Bidirectional Encoder Representations From Transformers (BERT), a context-aware natural language processing model.
A total of 2272 blog posts by patients with breast cancer in Japan were collected. Five worry labels, “treatment,” “physical,” “psychological,” “work/financial,” and “family/friends,” were defined and assigned to each post. Multiple labels were allowed. To assess the label criteria, 50 blog posts were randomly selected and annotated by two researchers with medical knowledge. After the interannotator agreement had been assessed by means of Cohen kappa, one researcher annotated all the blogs. A multilabel classifier that simultaneously predicts five worries in a text was developed using BERT. This classifier was fine-tuned by using the posts as input and adding a classification layer to the pretrained BERT. The performance was evaluated for precision using the average of 5-fold cross-validation results.
Among the blog posts, 477 included “treatment,” 1138 included “physical,” 673 included “psychological,” 312 included “work/financial,” and 283 included “family/friends.” The interannotator agreement values were 0.67 for “treatment,” 0.76 for “physical,” 0.56 for “psychological,” 0.73 for “work/financial,” and 0.73 for “family/friends,” indicating a high degree of agreement. Among all blog posts, 544 contained no label, 892 contained one label, and 836 contained multiple labels. It was found that the worries varied from user to user, and the worries posted by the same user changed over time. The model performed well, though prediction performance differed for each label. The values of precision were 0.59 for “treatment,” 0.82 for “physical,” 0.64 for “psychological,” 0.67 for “work/financial,” and 0.58 for “family/friends.” The higher the interannotator agreement and the greater the number of posts, the higher the precision tended to be.
This study showed that the BERT model can extract multiple worries from text generated from patients with breast cancer. This is the first application of a multilabel classifier using the BERT model to extract multiple worries from patient-generated text. The results will be helpful to identify breast cancer patients’ worries and give them timely social support.
Breast cancer is the most diagnosed female cancer worldwide, and treatment can last for 5 to 10 years, making this a familiar disease that women will live with for a long time [
Currently, many patients use social media as a source of medical information [
Document classification by NLP can be used to extract information from text. This technique is useful for automatically identifying worries from patient-generated text and helping patients with breast cancer obtain appropriate information to resolve their worries. Although there are many NLP studies on portals for patients with breast cancer, most of them are content analyses that objectively analyze the contents of media. Although content analysis research can find multiple worries, the extracted worries cannot be defined. In contrast, document classification can set target worries and find them, but so far, there have been few document classification studies [
There has been much research on using NLP to extract topics and worries from patient-generated text automatically. Many studies used rule-based, bag-of-words, and topic models such as latent Dirichlet allocation (LDA) [
The purpose of this study was to develop a multilabel classification model using BERT to automatically extract multifaceted worries from text generated by patients with breast cancer.
In this study, blog articles on Life Palette [
Overview of data processing and model function. (A) Data selection criteria and model training and testing process; (B) post label prediction model functions and outputs. *In Japanese sentences, the object is sometimes omitted, so the presumed object was judged from the context and added in parentheses. BERT: Bidirectional Encoder Representations From Transformers.
This study was approved by the ethics committee of the Keio University Faculty of Pharmacy (approval No 191218-2, 190301-1). All procedures were performed in accordance with the Ethical Guidelines for Medical and Health Research Involving Human Subjects (settled by the Ministry of Education, Culture, Sports, Science and Technology and the Ministry of Health, Labour and Welfare in Japan) and the Declaration of Helsinki and its later amendments. Consent to use the data from Life Palette for research purposes was obtained at the time of user registration. In this study, all data were analyzed anonymously and informed consent for this research was waived due to the retrospective observational design of the study.
The annotation criteria were defined based on previous studies [
Based on the “Shizuoka Classification” [
In this study, a multilabel classifier was built from the annotated multilabel data set to deal with multiple descriptions of worries. To develop the classifier, BERT, a state-of-the-art NLP model that can take context into account, was used. BERT is trained via a two-step learning process. The first step is pretraining using a large amount of text data and the second step is fine-tuning the model from new data.
The model was built by fine-tuning the pretrained Japanese BERT model of the Inui and Suzuki Laboratory, Tohoku University [
The [CLS] token and [SEP] token were added at the beginning of the sentence and at the end of the sentence, respectively. This was used as input to the BERT model. The model consists of a pretrained BERT and a fully connected layer, and the activation function was a sigmoid function that outputs five labeled positive/negative results. The model was built with reference to the previous study [
In the BERT model, it is possible to incorporate a self-attention method that allows indicating which part of the output text has been paid attention to. Visualizing the attentions can be useful in interpreting the results of “black box” machine learning models. Therefore, in this study, the attention parts of each blog post were visualized and used as a reference for interpreting the labeling results.
Model structure developed in this study. The input is the post sentence with [CLS] token and [SEP] token added at the beginning and at the end, respectively. The output is 0/1, corresponding to negative/positive of each label. BERT: Bidirectional Encoder Representations From Transformers; dim: dimension.
A multilabel task was performed to classify five labels simultaneously. The performance was evaluated in terms of precision,
Moreover, to examine the effect of the upper limit of the number of input words on the model performance, the performance for blog posts with over 512 words, that for all posts, and that for posts with 512 words or less were compared.
The mean number of words per blog post in the data set was 464.9, the median was 357, and the maximum was 6746. The number of documents with more than 512 words was 723 (31.8% of all blog posts; Figure S1 in
The IAA values were the highest for “physical” and the lowest for “psychological” (
The number of blog posts was highest for “physical” and lowest for “family/friends” (
The IAAa values and the number of posts for the five labels (N=2272).
Label | IAAb | Posts, n |
Treatment | 0.67 | 477 |
Physical | 0.76 | 1138 |
Psychological | 0.56 | 673 |
Work/financial | 0.73 | 312 |
Family/friends | 0.73 | 283 |
aIAA: interannotator agreement.
bAnnotation agreement was evaluated using Cohen kappa.
The number of labels per blog post (N=2272).
Number of labels | Posts, n (%) |
0 | 544 (23.9) |
1 | 892 (39.3) |
2 | 578 (25.4) |
3 | 199 (8.8) |
4 | 57 (2.5) |
5 | 2 (0.1) |
The precision was 0.59 for “treatment,” 0.82 for “physical,” 0.64 for “psychological,” 0.67 for “work/financial,” and 0.58 for “family/friends.” Both the precision and the
The performances of posts with more than 512 words and posts with 512 words or less are presented in
Performance of the model.
Label | Accuracy (SD) | Precision (SD) | Recall (SD) | |
Treatment | 0.81 (0.01) | 0.59 (0.09) | 0.39 (0.15) | 0.44 (0.09) |
Physical | 0.81 (0.01) | 0.82 (0.02) | 0.80 (0.02) | 0.81 (0.01) |
Psychological | 0.77 (0.03) | 0.64 (0.04) | 0.54 (0.08) | 0.58 (0.04) |
Work/financial | 0.88 (0.02) | 0.67 (0.10) | 0.28 (0.05) | 0.38 (0.03) |
Family/friends | 0.88 (0.02) | 0.58 (0.11) | 0.33 (0.07) | 0.41 (0.07) |
Macro average | 0.83 (0.01) | 0.66 (0.04) | 0.47 (0.05) | 0.52 (0.03) |
This is the first report of a multilabel classifier using the BERT model to extract multiple types of worries in patient-generated text, and our results indicate that BERT is effective for this purpose.
Our model can extract multiple worries from a single post. There have been some NLP studies that have dealt with multiple worries in patient-generated text [
A multilabel classifier may be useful for patients with breast cancer because they may have multiple worries and the nature of their worries may change over time. This study has demonstrated that documents with multiple worries can be handled using BERT. As another approach, a lot of content analysis research has been done using topic models such as LDA for unsupervised learning [
The reliability of the data set was inferred from the annotation results: the IAA was above 0.61, which was “substantial” for all labels except “psychological,” indicating a high degree of agreement. The “psychological” label tended to be judged differently among researchers, compared with the other labels. However, it is considered that the data set was reliable enough as training data because the IAA values exceeded 0.41, which indicates “moderate” reliability. In the data set of posts written by patients with breast cancer, more than one worry was actually described in about 40% of the posts (
To evaluate the reliability of the model, error analysis was conducted. Many of the false-positive cases were descriptions of changes in “physical,” which had the highest precision, and dealt with conditions that were not covered by the annotation guidelines. They were similar to the “physical” descriptions, such as postoperative recovery, chest discomfort before diagnosis, and changes in physical condition that seemed unrelated to cancer (eg, “I was surprised that I could lift my arms more than before surgery!” “One day, I was surprised at the size of the difference between my left and right breasts,” or “I drank a little wine and sake and felt dizzy”). Although there is still room for improvement in the performance of this model in discriminating between “presence of distress” and “presence of distress caused by breast cancer,” this model will be useful in supporting patients with breast cancer because we were able to extract descriptions of “physical changes that cause distress” in patients with breast cancer.
First, the BERT model used in this study has great strength in recognizing context, but the upper limit of the number of input words is 512. Although there was concern that the performance might deteriorate with posts having more than 512 input words, it was found that there was almost no difference between the performance only for posts with more than 512 input words and that for all posts. On the other hand, the performance for posts with 512 input words or less was slightly inferior to that for all posts. Based on these results, it was considered that truncation after 512 input words had little effect on the model performance, whereas the lack of information due to a small number of input words had a greater effect in this analysis. This suggests that blog posts containing a larger number of input words than the upper limit would not degrade model performance (
Second, the small number of blog posts for each label in our data set is also the limitation of this study. Our model was built from the data set containing descriptions of five worry types. The prediction performance of the model was different for each label, and the higher the IAA and the greater the number of posts, the higher the precision and the
Third, the patients’ blogs used in this study were written in Japanese. It is important to develop a classification model in Japanese, but the lack of applicability to multiple languages may be a limitation.
Our findings could lead to the development of better patient support systems and methods that can respond to temporal and interindividual changes in worries. Our methodology also facilitates the identification of worries and may promote the sharing of problems among patients. Furthermore, in the future, by combining sentiment analysis with our model, it might be possible to enrich the interpretation of the findings and deepen the understanding of how breast cancer patients’ worries influence their emotions. Although this study focused only on worries about breast cancer, there are many common worries that are not specific for breast cancer, and it is expected that the model could be extended to other disease areas.
In conclusion, this study showed that the BERT model can extract multiple worries, such as “treatment,” “physical,” “psychological,” “work/financial,” and “family/friends,” from text generated by patients with breast cancer. This is the first study to deal with multiple patient worries using BERT and demonstrates the usefulness of NLP techniques in dealing with patient-generated text. The results will be helpful to identify breast cancer patients’ worries and give them timely social support.
Supplementary material.
Bidirectional Encoder Representations From Transformers
interannotator agreement
latent Dirichlet allocation
long short-term memory
natural language processing
This study was supported by JSPS KAKENHI Grant Number JP21H03170.
The data consisting of blog articles in the study are available from Mediaid Corporation upon reasonable request.
TW, SY, EA, HK, and SH designed the study. TW and SH conducted annotation. TW performed the data analysis, created the natural language processing (NLP) model, and conducted all experiments. HY owned and provided the data source of Life Palette. SY and EA supervised the study design from the NLP technical perspective. SH supervised the study. TW and SH drafted and completed the manuscript. All authors reviewed and approved the manuscript.
HY is the chief executive officer of Mediaid Corporation that operates Life Palette. The other authors declare no competing interests.