This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Cancer, is properly cited. The complete bibliographic information, a link to the original publication on https://cancer.jmir.org/, as well as this copyright and license information must be included.
The cancer incidence rate is essential to public health surveillance. The analysis of this information allows authorities to know the cancer situation in their regions, especially to determine cancer patterns, monitor cancer trends, and help prioritize the allocation of health resource.
This study aimed to present the design and implementation of an R Shiny application to assist cancer registries conduct rapid descriptive and predictive analytics in a user-friendly, intuitive, portable, and scalable way. Moreover, we wanted to describe the design and implementation road map to inspire other population registries to exploit their data sets and develop similar tools and models.
The first step was to consolidate the data into the population registry cancer database. These data were cross validated by ASEDAT software, checked later, and reviewed by experts. Next, we developed an online tool to visualize the data and generate reports to assist decision-making under the R Shiny framework. Currently, the application can generate descriptive analytics using population variables, such as age, sex, and cancer type; cancer incidence in region-level geographical heat maps; line plots to visualize temporal trends; and typical risk factor plots. The application also showed descriptive plots about cancer mortality in the Lleida region. This web platform was built as a microservices cloud platform. The web back end consists of an application programming interface and a database, which NodeJS and MongoDB have implemented. All these parts were encapsulated and deployed by Docker and Docker Compose.
The results provide a successful case study in which the tool was applied to the cancer registry of the Lleida region. The study illustrates how researchers and cancer registries can use the application to analyze cancer databases. Furthermore, the results highlight the analytics related to risk factors, second tumors, and cancer mortality. The application shows the incidence and evolution of each cancer during a specific period for gender, age groups, and cancer location, among other functionalities. The risk factors view permitted us to detect that approximately 60% of cancer patients were diagnosed with excess weight at diagnosis. Regarding mortality, the application showed that lung cancer registered the highest number of deaths for both genders. Breast cancer was the lethal cancer in women. Finally, a customization guide was included as a result of this implementation to deploy the architecture presented.
This paper aimed to document a successful methodology for exploiting the data in population cancer registries and propose guidelines for other similar records to develop similar tools. We intend to inspire other entities to build an application that can help decision-making and make data more accessible and transparent for the community of users.
Cancer morbidity and mortality are increasing worldwide despite the development of new prevention strategies and screening programs. This increase can be attributed to several factors, including population growth, aging, and changes in lifestyle and environmental factors. The authors of [
The cancer incidence rate is essential for public health surveillance [
Over recent decades, there has been an exponential growth in PBCRs. The first volume of the Cancer Incidence in Five Continents (CI5), published in 1966, contained information from 32 registries in 29 countries, whereas the latest volume, published in 2021, included information from 343 PBCR in 65 countries.
Several data sources are integrated into PBCRs, including hospitals, death certificates, and laboratory services. Moreover, PBCRs follow international procedures, ensuring high-quality and reliable data. These goals are accomplished by performing exhaustive (automatic and manual) validity checks [
PBCRs are commonly used in epidemiological research. Thus, they have a crucial role in providing extensive information about tumor histology, stage at diagnosis, place and nature of the treatment, and survival [
The data sets and databases stored in PBCRs grow year on year. Data visualization is essential for exploring and communicating findings in medical research, especially in epidemiological surveillance. Hence, there is an intrinsic need for rapid raw data visualization. The current situation and context (historical data) can be understood by navigating among descriptive analyses, and, before executing time-consuming predictive or prescriptive models, it is essential to generate alarms and accurate predictions or discover hidden trends or patterns.
Previous literature has described the research of the implementation of web platforms to analyze data information related to cancer. Petrov and Alexeyenko [
Currently, PBCRs expend resources and time to extract, analyze, and present the data to gain insight into the incidence, mortality, and survival rates for cancer. Moreover, these insights are generated manually.
One approach to solving this limitation is to develop a generic platform based on microservices for PBCRs capable of generating interactive plots, tables, and statistics to determine the epidemiological cancer situation. To address this challenge, in this paper, we propose a platform capable of (1) navigation across time and feature-based data, (2) plotting aggregated and disaggregated data on demand, and (3) automatic integration of new data.
The core activities of the PBCR have expanded beyond the provision of data to perform epidemiological research or the provision of cancer reports and statistics for a region. The data in PBCRs are the basis for estimating the cancer burden and its trends over time and are crucial in the scheduling and evaluation of cancer control programs in the registration area. One of the simplest ways of tackling this problem is to use segregated information to convince authorities about which population segments need more or different attention. For instance, geographical heat maps can be used to spot differences across urban or rural areas, while age pyramids can highlight age group differences. This can help authorities to invest and generate personalized prevention campaigns.
In summary, in this article, we propose a seed to develop this platform. The main contributions are the presentation of a successful case study for Lleida PBCR and guidelines to evolve these into a reference that can be adopted by the community. The platform was designed to be differentiated by end user. One end user is the PBCR professional who analyzes the incidence of cancer in a specific region and makes decisions to research or prevent cancer. Another end user is the nonprofessional user who wants to know the cancer situation in his or her area.
The paper is structured as follows. The next section presents the methodology involved in designing and implementing the web platform. The Results section describes the different views implemented in this application and how the customization works. The presented data visualizations are related to cancer incidence, risk factors, and mortality. Finally, the results are discussed in the Discussion section, which also includes our conclusions.
The application is based on the model-view-controller pattern. For the visual part, we used the open-source programming language R [
Until the implementation of this application, PBCR professionals were manually extracting the data on demand. Once the cases were received, they cleaned and prepared the tables and plots to analyze them. Finally, they added these results to a formal report sent to public health officials.
However, once the application has been deployed, the professionals can automatically present the data to public health officials. The data extraction and cleaning steps are done by an extract, transform, and load system deployed in a server; therefore, they do not need to spend time preparing the data. In addition, the application permits real-time comparison of cancer cases between the previous years. The following subsections show how the web application has been designed and implemented.
The front end was implemented using the Shiny [
All the plots were made using the plotly library [
The back end consisted of an API and a database for the web application. Both these services were encapsulated using the Docker system, which permits scalability to other infrastructures. The API established the communication between the database and the view. This system was implemented by NodeJS [
The front-end and back-end technologies were encapsulated into Docker containers. Docker is a platform designed to build, share, and run modern applications into containers [
These containers were defined using Docker Compose, which orchestrated all of them. It composes a set of components, each of which is an image and a set of options that specify what the component should have. It uses a configuration file where the user selects the parameters, and when it is executed, it runs the needed processes to build the Docker container. The user can reuse the same image for different components, and these images will be managed in other containers once instantiated [
The case data were extracted from the official Cancer Population Registry in Lleida and the Mortality Registry of Catalonia. Experts from the cancer registry previously validated these cases to ensure the validity of the tumor. In the case of mortality, the included individuals were those patients who died from cancer in the Lleida region. The cancer patients were complemented with their risk factors, extracted from the clinical history records at the time of diagnosis. This information permitted us to build the databases and show them in the visual part.
The database was structured into 3 collections: Patients, Tumors, and Mortality. The Patients collection included sociodemographic information and risk factors; the Tumors collection included such information as the diagnosis and the kind of tumor. Finally, the Mortality collection registered sociodemographic information and cause of death (tumor list).
Database collections and their variables.
Variables | Specification | |
|
||
|
sex | Gender (man/woman) |
|
data_naix | Date of birth (date) |
|
postal_code | Postal code of city residence (number) |
|
postal_desc | Name of city residence (characters) |
|
comarca | Specific region in Lleida (characters) |
|
comarca_desc | Specific region description in Lleida (characters) |
|
alcoholism | Alcohol consumption (yes/no) |
|
diabetes | Diabetes diagnosed (yes/no) |
|
smoking | Smoking consumption (yes/no) |
|
bmi | Body mass index (number) |
|
||
|
data_inc_pobl | Diagnoses date (date) |
|
ltum | Tumor location (characters) |
|
ltum_desc | Tumor location description (characters) |
|
morf | Tumor morphology (characters) |
|
morf_descr | Tumor morphology description (characters) |
|
metode_dx | Diagnostic method (number) |
|
metode_dx_descr | Diagnostic method description (characters) |
|
||
|
data_naix | Date of birth (date) |
|
data_def | Date of death (date) |
|
cause10 | Death cause (characters) |
|
cause10_desc | Death cause description (characters) |
|
sex | Gender (man/woman) |
|
comarca | Specific region in Lleida (characters) |
|
comarca_desc | Specific region description in Lleida (characters) |
|
yeard | Year of death (number) |
All data were anonymized to protect patient privacy and confidentiality. The study was part of the public health response to the impact of cancer on the society. It was approved by the Committee of Ethics and Clinical Research of Lleida (CEIC 21/190-P). As it was a retrospective cohort study and the patients were blinded to the investigators, no written informed consent was necessary according to the CEIC. All methods were carried out in accordance with relevant guidelines and regulations.
This web application consisted of an intuitive analytical web platform for rapid analysis of the population cancer registry data set, containing incidence, mortality, and risk factors related to tumor information. The application shows the incidence and evolution of each cancer during a specific period for gender and age groups. It also permits knowledge of the situation of all the cancers in a particular period and subregion in Lleida. The application also summarizes patients’ risk factors detected in the cancer registry and shows results about cancer mortality. These plots enable the number of cases to be analyzed for each year, filtered by tumor location, gender, and age group.
The web application was designed as a web browser–based dashboard (see
Main menu of the web application.
Specific incidence view.
This view permits the risk factors’ impact on cancer patients to be analyzed.
Risk factors view.
The last implemented view shows an analysis of Lleida residents affected by tumors. In this case, the observed years were between 2012 and 2019 because the Mortality Register of Catalonia was already available for this time. Therefore, as
This view also contains 4 figures, 3 plots, and 1 table. At the top left, there is a horizontal bar plot representing the 10 tumors with the most cases of mortality. It is recalculated by the period and gender chosen; the filtered cancer location does not affect it. On the right, an age pyramid plot analyzes the mortality in each age group by gender. This plot can also be recalculated by the period in years and by cancer location. At the bottom, a table has the tumor locations and the number of patients who passed away, sorted in descending order. The information is displayed by the chosen period of years and gender; the cancer location filter will not affect it. Finally, an evolution plot is calculated to analyze the increase or decrease in deaths for all locations or specific tumors. This plot is recalculated depending on the chosen year, gender, or tumor location.
Mortality view.
The research team designed the system for easy deployment. Therefore, the users only need to consider these items:
Deploy the Mongo database by executing the docker-compose file. The system will download the Mongo image (if it is the first time it runs), build the Docker Container, and deploy the database. Finally, add the information to show in the dashboard web application.
Download the web application project and specify the user and password in the config.js file. Next, execute the docker-compose file to build the containers for the API system and R Shiny application. The system will download the image to make these containers if it is the first time and then deploy the containers.
The research team designed and implemented a web application to rapidly analyze the cancer situation in the Lleida region. It contains information about the incidence of each cancer by subregion, related risk factors, and the cancer mortality registered in this region. The application can be used in computer and mobile browsers because it has been designed responsively. It has been implemented using open-source technologies such as Docker, MongoDB, NodeJS, and R Shiny, which permit easy deployment of cancer registries in other hospitals. The code is also free to download and can be deployed within 1 day.
Recently, new applications have been designed to facilitate the analysis of data sets. Some studies have suggested that the latest technologies can help to extract information and value of the data rapidly and obtain the results instantly in different contexts. Luz et al [
This system helped the research team rapidly analyze the cancer information and reach some conclusions about the data and the use of these technologies. Therefore, regarding cancer incidence, the analysis detected that the number of cases is higher in men than in women in all periods and years [
As the incidence showed, the risk factors view also provided the previous situation of patients with cancer. Regarding risky drinking, 2.2% of the patients diagnosed consumed high amounts of alcohol daily [
The cancer mortality registry permitted us to analyze the severity and impact of this disease, considered the second cause of death globally [
The application presents some strengths and limitations that should be noted. This kind of implementation increases the data’s potential and adds value to the cancer registries. It permits an analysis and comparison of cancer information trends in specific areas in real time and helps make decisions about public health and the impact of cancer. The risk factor situation among cancer patients suggests some associations between risk factors and cancer. The scalability of the technologies used helps to deploy them to other cancer registries. Regarding limitations, the map plot has to be adapted to the region where it is deployed. The inconsistency between the cancer registry and cancer mortality did not permit them to be merged and analyzed in depth. The codification of some risk factors suggested underdiagnosis. A future systematic link between the cancer registry and the primary care medical records could improve the registry of risk factors. Related to the software, R Shiny presented some restrictions and incompatibility with some new libraries even though they were supplied with others that are accepted and adapted perfectly. MongoDB, in the beginning, requires extra effort to understand how it works, which delayed other parts of the application.
The web application discussed in this study offers an analytical model of population cancer information. In addition, the technologies used to build this system permit its deployment into other cancer registries. Although there are web applications based on similar technologies, none use population cancer registry data to show the cancer situation in a specific region.
The views presented in the platform show the incidence of cancer detected in a specific time and particular areas, allowing it to be filtered by such inputs as year, gender, and tumor location. It also shows the evolution of cancer in the years analyzed. In addition, it studies the impact of some risk factors among the patients in the registry. Finally, it permits users to explore cancer mortality and its evolution in the Lleida region, filtering by year, gender, and tumor location.
Regarding future work, the research team is designing new views to analyze cancer incidence and the impact of the second primary tumor in depth. They are also creating a new risk factor view to offer a filter to give the risk factors for specific gender and tumor locations and integrating treatment data, such as for radiotherapy and chemotherapy. Finally, new web views are being created to build machine learning algorithms, train models, and analyze the results.
application programming interface
Committee of Ethics and Clinical Research of Lleida
Cancer Incidence in Five Continents
population-based cancer registry
This work was supported by contract 2019-DI-43 from the Industrial Doctorate Program of the Government of Catalonia and by the Spanish Ministry of Economy and Competitiveness under contract PID2020-113614RB-C22. Some of the authors are members of the research group 2014-SGR163, funded by the Generalitat de Catalunya.
The authors wish to thank to the Arnau de Vilanova University Hospital, Santa Maria University Hospital, and the Catalan Health Service in Lleida for the support and resources to conduct this study.
The data set is available from the corresponding author upon reasonable request.
None declared.