## Abstract

### Background

Antimicrobial resistance acts as a global problem in many regions of the world. The prevention and treatment of modern medicine are becoming ineffective. Governments all over the world are working effortlessly to overcome this problem and there is a requirement for extra care by the government and healthcare delivery systems to strengthen antimicrobial policy and standardize treatment guidelines.

### Objective

This study aims to forecast the antimicrobial resistance rate for the future and simultaneously to bring awareness of new time-series proportion models available to model the rate/proportion data in the field of clinical and public health by taking an example of antimicrobial resistance rate data.

### Methods

Data on

*Escherichia coli*isolated from blood cultures showing variable susceptibility to different antimicrobial agents has received from a clinical microbiology laboratory of tertiary care hospital, Manipal, Karnataka, between the years June 2015 and December 2019. Beta auto-regressive moving average model is used to forecast the antimicrobial resistance rate data. To help non-statisticians an R shiny app named BARMA. app has been developed for the same.### Results

A resistance rate of a total of 55-time points was used to forecast the resistance rate of

*E.coli*to the antimicrobial Amoxicillin-clavulanic acid. On average, the forecasted resistance rate is 57% (50%–65%).### Conclusion

Forecasting of antimicrobial resistance rate can help to alert healthcare policymakers to have appropriate precautionary measures and to attain the sustainable development goals (SDGs).

## Keywords

## 1. Introduction

According to the 2018 study of WHO's Global Antimicrobial Resistant Surveillance System (GLASS) the most common pathogens showing resistance reported in 22 out of 52 countries are

^{1}

*Escherichia coli*,*Klebsiella pneumoniae*,*Streptococcus pneumoniae*,*Staphylococcus aureus,*and*Salmonella*spp. The resistance map by the Center for Disease Dynamics, Economics and Policy (CDDEP) gives a better idea of the percentage of resistance of each pathogen to each antibiotic in different countries. According to the study by the Indian Council of Medical Research (ICMR),*E.coli*is highly resistant to Fluoroquinolones followed by Cephalosporins (84% and 77% out of 1619 isolates tested in the year 2017) and resistance among*K. Pneumoniae*is higher towards Fluoroquinolones and Cephalosporins (69% and 68% out of 1497 isolates tested in the year 2017). As explained by Lisa Jackson in her study*E.coli*doesn't cause a major problem in the urinary tract, but if it infects blood it can cause bloodstream infection leading to a septic shock.Bloodstream infection or bacteremia is the presence of bacteria in the blood. This infection can frequently be expected in a single human being undergoing major surgery and despite taking advanced precautions for infections the development of antimicrobial resistance (AMR) to bacteremia is increasing day by day. AMR is a serious public health concern worldwide. Compared to Western continents the resistance rate of bacterial and fungal pathogens is apical in Asian and African continents due to irregular strategies of antimicrobial usage and standard health care facilities. AMR is also acting as a major drawback to attaining sustainable development goals (SDGs). Without the development of antimicrobial resistance, the attainment of the SDGs would fail to reach its target. So, it is important to track the antimicrobial resistance in SDG 3 (SDG 3: Ensure healthy lives and promote well-being for all at all ages).

^{4}

Box-Jenkins models are one of the most frequently used time-series models in the applied area of Statistics. Concerning the violations of normality assumptions, many researchers proposed generalized linear and non-linear models. To model count data Poisson or Neg-binomial models are more appropriate

^{6}

^{[}^{,}^{8}

^{]}. Whereas in-case of rate/proportion data Beta or Kumaraswamy models^{[}^{9}

^{,}^{10}

^{,}^{11}

^{,}^{,}^{13}

^{]}are accurate to use.AMR is a growing problem in the area of medical and public health worldwide. Due to the resistance of microorganisms to the most antimicrobial agent, treating these infections is challenging. There is an urgent requirement for change in the policy of infectious disease control set-up to achieve the SDGs (Sustainable Development Goals). Future forecasts using time-series models can help in one way to alert the system. This study focuses on modeling AMR rate data using the Beta auto-regressive moving average (BARMA) model used for future forecasting.

This article focuses on bringing awareness of a suitable time-series model for proportion/rate data and to help non-statisticians an R shiny app has been developed to fit the Beta ARMA model.

## 2. Methods

### 2.1 Study design and ethical approval

A retrospective study was conducted in clinical microbiology laboratory of tertiary care hospital, Manipal, Karnataka, between the years June 2015 and December 2019. An Institutional ethical clearance was obtained from Kasturba medical college and Kasturba hospital Institutional ethics committee (IEC no. 832/2019).

### 2.2 Study procedure and inclusion criteria

AMR data on

*Escherichia coli*(*E.coli*) isolated from blood cultures showing variable susceptibility to different antimicrobial agents has received. Antimicrobial susceptibility tests were performed using Vitek −2 automated method beyond 2015, which is accepted and standardized test methods according to Clinical Laboratory Standards Institute (Ref: CLSI supplement M100. Wayne, PA: Clinical and Laboratory Standards Institute; 2012). Laboratory of microbiology had tested*E. coli*against different antimicrobial agents. The resistance proportion is calculated as the number of isolates of*E. coli*resistant to a particular antimicrobial agent/total number of*E. coli*isolates were tested during that time interval.There were total of twelve antimicrobial agents of which only amoxicillin-clavulanic acid (AMC) has been considered for the study since the series is time-dependent.

### 2.3 Statistical analysis

Exploratory data analysis (EDA) has been carried out to understand the characteristics of the antimicrobial resistance rate data. Beta Auto-regressive moving average model has been used to forecast the antimicrobial resistance rate for the future. Akaike information criterion (AIC) and Bayesian information criterion (BIC) along with mean absolute percentage error (MAPE) and mean square error (MSE) are used to select the best model for the given data. EDA and model fit along with forecasting were conducted using the software R. R Shiny web app has provided for the same.

#### 2.3.1 Beta autoregressive moving average model

Rocha and Cribari-Neto (2009) proposed time-series model to analyze the time-series data which is continuous and Beta distributed. This study used the conditional probability density function.

Let

Where, the mean of the distribution is E(z

*Z*_{t}, t = 1, 2…, n be a random variable and assumes that it depending on past information set*F*_{t}.$\mathrm{f}\phantom{\rule{0.25em}{0ex}}\left({\mathrm{z}}_{\mathrm{t}}|{F}_{t-1}\phantom{\rule{0.25em}{0ex}}\right)=\frac{\mathrm{\u0490}\phantom{\rule{0.25em}{0ex}}\left(\mathrm{\rho}\right)}{\mathrm{\u0490}\left({\mathrm{\mu}}_{\mathrm{t}}\mathrm{\rho}\right)\left(\left(1-{\mu}_{\mathrm{t}}\right)\mathrm{\rho}\right)}\phantom{\rule{0.25em}{0ex}}{\mathrm{z}}_{\mathrm{t}}\phantom{\rule{0.25em}{0ex}}\left({\mathrm{\mu}}_{\mathrm{t}}\phantom{\rule{0.25em}{0ex}}\mathrm{\rho}-1\right)\phantom{\rule{0.25em}{0ex}}\left(1-{\mathrm{z}}_{\mathrm{t}}\right)\phantom{\rule{0.25em}{0ex}}\left(1-{\mathrm{\mu}}_{\mathrm{t}}\right){\phantom{\rule{0.25em}{0ex}}}_{\phantom{\rule{0.25em}{0ex}}}^{\mathrm{\rho}-1}\phantom{\rule{0.25em}{0ex}}\phantom{\rule{0.25em}{0ex}}0\phantom{\rule{0.25em}{0ex}}<\phantom{\rule{0.25em}{0ex}}{\mathrm{z}}_{\mathrm{t}}\phantom{\rule{0.25em}{0ex}}<\phantom{\rule{0.25em}{0ex}}1$

Where, the mean of the distribution is E(z

_{t}|*F*_{t−1}) = μ_{t}and variance is V (z_{t}|*F*_{t−1}) = V (μ_{t})/(1 +ρ); here V (μ_{t}) = μ_{t}(1-μ_{t}) and ρ is the precision parameter.Then the Beta ARMA model is,

Where, p and q are the order (lag) of autoregressive and moving average terms and the mean and variance of error term is,

$\mathrm{g}\left({\mathrm{\mu}}_{\mathrm{t}}\right)={\mathrm{\u0572}}_{\mathrm{t}}=\mathrm{\alpha}+{\mathrm{x}}_{\mathrm{t}}\text{'}\phantom{\rule{0.25em}{0ex}}\mathrm{\beta}+{\mathrm{\u01a9}}_{i=1}^{p}{\mathrm{\phi}}_{\mathrm{i}}\left\{\mathrm{g}\left({\mathrm{z}}_{\mathrm{t}-\mathrm{i}}\right)-{\mathrm{x}}_{\mathrm{t}-\mathrm{i}}\text{'}\phantom{\rule{0.25em}{0ex}}\mathrm{\beta}\right\}+{\mathrm{\u01a9}}_{j=1}^{q}\phantom{\rule{0.25em}{0ex}}{\mathrm{\theta}}_{\mathrm{j}}{\mathrm{\u03f5}}_{\mathrm{t}-\mathrm{j}}$

Where, p and q are the order (lag) of autoregressive and moving average terms and the mean and variance of error term is,

$\mathrm{E}\left({\mathrm{\u03f5}}_{\mathrm{t}}\right)=\mathrm{E}\left({\mathrm{z}}_{\mathrm{t}}-{\mathrm{\mu}}_{\mathrm{t}}\left|{\mathrm{F}}_{\mathrm{t}-1})=\mathrm{E}(\mathrm{g}\left({\mathrm{z}}_{\mathrm{t}}\right)-{\mathrm{\u0572}}_{\mathrm{t}}\right|{\mathrm{F}}_{\mathrm{t}-1}\right)=0$

$\mathrm{V}\left({\mathrm{\u03f5}}_{\mathrm{t}}\right)=\mathrm{V}\left({\mathrm{z}}_{\mathrm{t}}-{\mathrm{\mu}}_{\mathrm{t}}\left|{F}_{t-1})=\mathrm{V}(\mathrm{g}\left({\mathrm{z}}_{\mathrm{t}}\right)-{\mathrm{\u0572}}_{\mathrm{t}}\right|{F}_{t-1}\right)={\left(\mathrm{g}\text{'}\left({\mathrm{\mu}}_{\mathrm{t}}\phantom{\rule{0.25em}{0ex}}\right)\right)}^{2}\phantom{\rule{0.25em}{0ex}}\mathrm{V}\left({\mathrm{\mu}}_{\mathrm{t}}\right)/\left(1+\mathrm{\rho}\right)$

#### 2.3.2 Forecast evaluation criteria

In many cases, the MAPE (mean absolute percentage error) is used for model selection or comparison. However, when data are close to zero other forecast measurement criteria can be used such as MAE (mean absolute error), MSE (mean square error), and RMSE (root mean square error). In the case of time-series analysis ACF (Auto-correlation function), plot is an appropriate one.

### 2.4 R shiny web application

Shiny is a statistical package in R used to develop interactive website applications directly from the R studio. To create this app one doesn't require knowledge of web programming. This app can also extend to JavaScript actions, htmlwidgets and CSS themes. Apps developed by shiny are held in a script called app. R and we can create this app in R studio by going through, File - New File.

- Shiny Web App … Which will create a demo template of codes for the R shiny app with three components say, user interface object (controls the layout and appearance of widgets on the app), server function (holds the instructions that the computer requires to build the app), and a call to the shiny app function (creates app objects from a pair of user interface and server).

Generally, the display or layout of a user interface is divided into two parts say, mainPanel and sidebdarPanel. The sidebar panel displays the special instructions or widgets such as drop-downs or slider or checkbox to the users and the main panel usually display the result outputs. There are different functions available in R under the package shiny to create different widgets. To know more click here.

## 3. Results

An AMR data on

*E. coli*isolated from blood cultures showing variable susceptibility result to different antimicrobial agents for the year June 2015 and December 2019 has obtained from clinical microbiology laboratory of tertiary care hospital, Manipal, Karnataka. The data includes information of a monthly number of*Escherichia coli*isolates and the total number of isolates resistant to the antimicrobial AMC. The rate of resistance has been calculated for the same.The summary of the data along with the time plot has shown in Table 1 and Fig. 1. Auto-correlation function (ACF) and partial auto-correlation function (PACF) graph of the data has shown in Fig. 2, which helped in selecting a maximum lag for AR and MA terms. To check for the stationarity in a series Augmented Dickey-Fuller (ADF) test has carried out and the results of the test are represented in Table 2. As the p-value (0.01311) is less than the 5\% level of significance, we conclude the series is stationary (see Table 3).

Table 1Summary statistics for the data on rate of

*Escherichia coli*resistant to antimicrobial AMC.antimicrobials | Sample size | Range | Mean | Std. deviation | Skewness | Min | Max | Median |
---|---|---|---|---|---|---|---|---|

AMC | 55 | 0.393 | 0.579 | 0.105 | 0.103 | 0.407 | 0.8 | 0.583 |

Table 3Fitted βARMA model for resistance rate data for the antibiotic AMC.

Parameter | Estimate | Std. error | Z stat | p-value |
---|---|---|---|---|

α | 0.107 | 0.026 | 4.017 | 0.0001 |

${\phi}_{5}$ | 0.836 | 0.025 | 32.685 | 0.000 |

${\phi}_{\mathit{5}}$ | −0.264 | 0.017 | 15.783 | 0.000 |

θ_{3} | -2.644 | 0.350 | 7.545 | 0.000 |

θ_{6} | 1.165 | 0.319 | 3.642 | 0.0003 |

ρ | 83.264 | 17.849 | 4.665 | 0.000 |

To explore the effect of lagged series on the current resistance rate, Beta Auto-regressive moving average model has fitted to the data. A software R has been used for the analysis. The analysis has been conducted by dividing the time-series data into training and testing sets for forecast evaluation. Out of 55 monthly time points last 6 observations are considered as testing sets and following forecast evaluation measurement has been used in Table 5.

Here, e(h) is the forecast error, i.e. e(h) = (testing set series (h) – forecasted series (h)) and h is the forecast lead. ${Z}_{n+h}$ is the testing series.

$\mathrm{M}\mathrm{S}\mathrm{E}=\frac{1}{6}\sum _{h=1}^{6}{e\left(h\right)}^{2}$

$\mathrm{M}\mathrm{A}\mathrm{P}\mathrm{E}=\frac{100}{6}\sum _{h=1}^{6}|e\left(h\right)/{Z}_{n+h}|$

$\mathrm{M}\mathrm{A}\mathrm{E}=\frac{1}{6}\phantom{\rule{0.25em}{0ex}}\sum _{h=1}^{6}|e\left(h\right)|$

Here, e(h) is the forecast error, i.e. e(h) = (testing set series (h) – forecasted series (h)) and h is the forecast lead. ${Z}_{n+h}$ is the testing series.

The βARMA model has fitted to the original time-series whereas to fit the ARIMA model the proportion data has converted to normal by using arc sine transformation. There were no major changes in the ACF and PACF plot with and without transformation.

From ACF and PACF plots we can observe, the 10th lag in the ACF plot, 8th lag in the PACF plot are significant before the first drop to the non-significance. As there is no exponential decaying pattern in ACF and PACF plot the following models were fitted to the data.

**Model 1:**In the βARMA model as the lagged terms for AR and MA are open to mention in the software R, we considered only the significant lagged variable from the ACF and PACF plot (Fig. 2). The model is,

${\eta}_{t}=logit\left({\mu}_{t}\right)=\alpha +{\phi}_{8}g\left({z}_{t-8}\right)+{\theta}_{10}{\epsilon}_{t-10}$

After fitting a model to the data as the 10th lagged variable of MA was insignificant, we decided the final model as,

${\eta}_{t}=logit\left({\mu}_{t}\right)=\alpha +{\phi}_{8}g\left({z}_{t-8}\right)$

**Model 2:**In the βARMA model instead of considering only significant variables observed in Fig. 2, we considered the maximum order for (p, q) as (8, 10) and a backward selection procedure has been used to select the model with significant lagged variables. The selected model is,

${\eta}_{t}=logit\left({\mu}_{t}\right)=\alpha +{\phi}_{3}g\left({z}_{t-3}\right)+{\phi}_{5}g\left({z}_{t-5}\right)+{\theta}_{3}{\epsilon}_{t-3}\phantom{\rule{0.25em}{0ex}}+{\theta}_{6}{\epsilon}_{t-6}$

**Model 3**: As we have to provide maximum lag value in order for p and q to fit the ARIMA model using R, from Fig. 2 we decided p = 8 and q = 1 due to convergence problems in the higher order of q. Arc sine transformation has been used to transform the proportion data into normal.

${\mathrm{Z}}_{\mathrm{t}}=\mathrm{\alpha}+\sum _{i=1}^{p}{\phi}_{i}{z}_{t-i}-\sum _{j=1}^{q}{\theta}_{j}{z}_{t-j}$

The best model among three was decided on the basis of forecast evaluation criteria.

From the residual plot (Fig. 3) we can infer, Model 2 and Model 3 follows white noise assumptions, where Model 1 is not as it shows auto-correlation in the residual series. The auto-correlation in residual series is cross verified by using the Ljung-Box test and from Table 4 we conclude the residuals of Models 2 & 3 are independently distributed. Model selection criteria have then been carried out between Models 2 and 3 to select the best model. The forecasted series from Model 2 are 0.537, 0.595, 0.554, 0.641, 0.568, and 0.614. The forecasted series from Model 3 are 0.515, 0.598, 0.578, 0.664, 0.584, and 0.633. The testing series is 0.545, 0.592, 0.700, 0.469, 0.667, 0.600. The following series is also represented in the form of a plot in Fig. 4 for better understanding. From the forecast evaluation (Fig. 4 and Table 5) we chose Model 2 as best among both. Thus we conclude Model 2 fits well for AMC data and Fig. 5 represents the fitted versus observed data along with a forecasted series. Regression diagnostic plots have been provided (Fig. 6) to show that Model 2 fits well with the AMC data.

Table 4Ljung Box Test results.

Model | X-squared | lag | p-value |
---|---|---|---|

Model 1 | 11.563 | 3 | 0.009 |

Model 2 | 3.1134 | 1 | 0.077 |

Model 3 | 5.2828 | 10 | 0.8715 |

Table 5Model selection criteria.

Model | MAPE | MAE | MSE |
---|---|---|---|

Model 2 | 12.40 | 0.014 | 0.008 |

Model 3 | 13.90 | 0.0781 | 0.0103 |

The estimated model is,

Model 2,

where, (

$\stackrel{\u02c6}{\mu}{\phantom{\rule{0.25em}{0ex}}}_{t}^{\phantom{\rule{0.25em}{0ex}}}=\frac{{e}^{\alpha +{\phi}_{3}g\left({{z}_{t}}_{-3}\right)+{\phi}_{5}g\left({{z}_{t}}_{-5}\right)+{\theta}_{3}{\epsilon}_{t-3}\phantom{\rule{0.25em}{0ex}}+{\theta}_{6}{\epsilon}_{t-6}}}{1+{e}^{\alpha +{\phi}_{3}g\left({{z}_{t}}_{-3}\right)+{\phi}_{5}g\left({{z}_{t}}_{-5}\right)+{\theta}_{3}{\epsilon}_{t-3}\phantom{\rule{0.25em}{0ex}}+{\theta}_{6}{\epsilon}_{t-6}}}$

where, (

*α*ˆ*, φ*^{ˆ}_{3}*, φ*^{ˆ}_{5}*,**θˆθˆ3*, θ^{ˆ}_{6}_{,}*ρ*^{ˆ}) = (0.107, 0.836, -0.264, -2.644, 1.165, 83.264). The forecasted series for the year 2020 for the first 6 months is, 0.619, 0.526, 0.608, 0.540, 0.552 and 0.585. Which is on average 57% with a confidence interval (50%–65%).The R code of the fitted BARMA model used in this article is available hereand R shiny web app can be accessed at BARMA.app.

## 4. Discussion

The use of antimicrobial drugs has brought benefits to mankind. However, it also has an increased threat to the development of antimicrobial resistance. Infectious diseases are not static, they spread rapidly. The effectiveness of potential control measures can be understood by studying the disease dynamics. Patients unnecessarily exposed to antibiotics may face serious adverse drug reactions with no clinical benefits. A growing threat to public health is the misuse of antibiotics leading to the growing problem of antibiotic resistance.

In this study, a total of 55-time points were used to forecast the antimicrobial resistance rate of

*E. coli*to the antimicrobial agent Amoxicillin-clavulanic acid. The study found on average the forecasted resistance rate is 57% (50%–65%) for the next six months. Thus, forecasting of AMR rate can help healthcare policymakers to have appropriate precautionary measures and to attain sustainable development goals (SDGs).Antibiotic stewardship programs are considered to be an approach to reducing resistance rates in hospitals. The availability of research information on

*E. coli*resistance is an important step closer to designing focused strategies to address the worldwide AMR crisis.Study includes few limitations. First, the time-series analysis conducted only for one antimicrobial agent whose resistance rate falls in the interval (0, 1) and where the series is time dependent. Secondly, the most important regressor variable antibiotic consumption is not included in the study due to insufficient information to conduct time-series analysis.

## 5. Conclusion

Our study describes the pattern of antimicrobial resistance rate and the appropriate time-series model to forecast the same. There is a need to expand and strengthen antimicrobial policy, standard treatment guidelines, and a country-wide plan for the containment of AMR. To find a solution to this emerging threat it is necessary to allow antimicrobial resistance within the sustainable development goals agenda (SDG 3) before it turns into a global crisis.

This study recommends using the Beta ARMA model to fit rate/proportion data. The developed BARMA. app will surely help the non-statisticians to fit time-series model to their data without the help of Statisticians or computer coding.

## Author roles

JL: Methodology, Investigation, Data curation, Formal Analysis, writing original draft and reviewing and editing; AK: Conceptualization, Methodology and reviewing and editing; VKE: Investigation, Data curation, writing -reviewing and editing.

## Ethics statement

The study was conducted after taking approval from the Kasturba medical college and Kasturba hospital Institutional ethics committee (IEC no. 832/2019).

## Code and data availability

Will be provided on request.

## Sources of funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

## Declaration of competing interest

Authors have no conflicts of interest to disclose.

## Acknowledgment

The author JL would like to acknowledge the support given by the Manipal Academy of Higher Education for providing the TMA Pai Endowment fellowship.

## References

- High Levels of Antibiotic Resistance Found World- Wide, New Data Shows Bangkok: Media centre.(29)January 2018 (Available from)
- Economics Policy. ResistanceMap: Antibiotic Resistance.2018 ([May 31, 2018]. Available from:)
- A common bacterium leads to serious blood infection in many seniors,.
*New Study Finds.*May 2015; 4 (Available from:) - The sepsis syndrome, multi-organ failure: a plea for com- parable definitions.
*Ann Intern Med.*1991; 114 (Available from:): 332 - Antimicrobial resistance—a threat to the world's sustainable development.
*Ups J Med Sci.*2016 Jul 2; 121: 159-164 - Time Series Analysis: Forecasting and Control.John Wiley Sons, 2015 May 29
- Regression Analysis of Count Data.Cambridge University Press, 1998
- Observation-driven models for Poisson counts.
*Biometrika.*2003; 90: 777-790 - Beta regression for modelling rates and proportions.
*J Appl Stat.*2004 Aug 1; 31: 799-815 - The Kumaraswamy distribution: median-dispersion re-parameterizations for regression modeling and simulation-based estimation.
*Stat Pap.*2013 Feb 1; 54: 177-192 - Kumaraswamy regression model with Aranda-Ordaz link func- tion.
*Test.*2020 Jan 23; : 1-21 - Beta autoregressive moving average models.
*Test.*2009 Nov 1; 18: 529 - Kumaraswamy autoregressive moving average models for double bounded environmental data.
*J Hydrol.*2017 Dec 1; 555: 385-396

## Article info

### Publication history

Published online: March 29, 2023

Accepted:
March 27,
2023

Received in revised form:
March 6,
2023

Received:
November 4,
2022

### Identification

### Copyright

© 2023 The Author(s). Published by Elsevier B.V. on behalf of INDIACLEN.

### User license

Creative Commons Attribution – NonCommercial – NoDerivs (CC BY-NC-ND 4.0) | How you can reuse

Elsevier's open access license policy

Creative Commons Attribution – NonCommercial – NoDerivs (CC BY-NC-ND 4.0)

## Permitted

### For non-commercial purposes:

- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article (private use only, not for distribution)
- Reuse portions or extracts from the article in other works

## Not Permitted

- Sell or re-use for commercial purposes
- Distribute translations or adaptations of the article

Elsevier's open access license policy