Threshold Selection for Peak Over Threshold Models Using Logistic Regression

Temitope Comfort Iroko; Iliyasu  Tukur; Victor  Adeyanju

doi:10.34198/ejms.15625.10211036

Temitope Comfort Iroko Department of Mathematics, University of Wisconsin-Milwaukee, USA
Iliyasu Tukur Helpman Development Institute, Abuja, Nigeria
Victor Adeyanju Department of Mathematics, Tai Solarin University of Education, Nigeria

DOI: https://doi.org/10.34198/ejms.15625.10211036

Keywords: extreme value theory, threshold selection, logistic regression, generalized Pareto distribution, tail modeling

Abstract

Threshold selection remains a critical challenge in the application of Extreme Value Theory (EVT), particularly in actuarial science, where accurate modeling of extreme insurance claims is vital for solvency, capital adequacy, and reinsurance pricing. Traditional graphical tools, such as the mean residual life (MRL) plot, are highly dependent on visual interpretation and expert judgment, which limits reproducibility and consistency between practitioners. This paper proposes a machine learning approach using logistic regression to assign extremeness probabilities to insurance claims. The model is trained on labels generated from an initial quantile-based rule, classifying claims above the 90th percentile as extreme. The optimal threshold is determined as the smallest amount of claim with predicted probability exceeding a predefined cut-off (e.g., $90\%$). This probability-based rule provides an objective alternative to visual diagnostics and aligns well with classical EVT tools such as the MRL plot. After identifying exceedances above the selected threshold, a Generalized Pareto Distribution (GPD) is fitted using maximum likelihood estimation. Tail-based risk measures, including Value at Risk (VaR) and Expected Shortfall (ES), are then computed at the $99\%$ confidence interval to quantify the severity of potential extreme losses. The proposed framework is interpretable, reproducible, and readily applicable in actuarial workflows, offering a more consistent and automated solution for tail risk modeling.

References

Beirlant, J., Goegebeur, Y., Segers, J., & Teugels, J. L. (2006). Statistics of extremes: Theory and applications. John Wiley & Sons. https://doi.org/10.1002/0470012382

Bommier, E. (2014). Peaks-over-threshold modelling of environmental data (Technical report, U.U.D.M. Project Report 2014:33). Department of Mathematics, Uppsala University.

del Castillo, J., & Daoudi, J. (2009). Estimation of the generalized Pareto distribution. Statistics & Probability Letters, 79(5), 684–688. https://doi.org/10.1016/j.spl.2008.10.021

Charras-Garrido, M., & Lezaud, P. (2013). Extreme value analysis: An introduction. Journal de la Société Française de Statistique, 154(2), 66–97.

Coles, S., Bawa, J., Trenner, L., & Dorazio, P. (2001). An introduction to statistical modeling of extreme values (Vol. 208). Springer. https://doi.org/10.1007/978-1-4471-3675-0

Davison, A. C., & Smith, R. L. (1990). Models for exceedances over high thresholds. Journal of the Royal Statistical Society: Series B (Methodological), 52(3), 393–425. https://doi.org/10.1111/j.2517-6161.1990.tb01796.x

Embrechts, P., Klüppelberg, C., & Mikosch, T. (1997). Modelling extremal events: For insurance and finance (Vol. 33). Springer. https://doi.org/10.1007/978-3-642-33483-2

Hosmer, D. W. Jr., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (3rd ed.). John Wiley & Sons. https://doi.org/10.1002/9781118548387

Mosala, R., Rachuene, K. A., & Shongwe, S. C. (2024). Most suitable threshold method for extremes in financial data with different volatility levels. ITM Web of Conferences, 67, 01033. EDP Sciences. https://doi.org/10.1051/itmconf/20246701033

Northrop, P. J., & Coleman, C. L. (2014). Improved threshold diagnostic plots for extreme value analyses. Extremes, 17, 289–303. https://doi.org/10.1007/s10687-014-0183-z

Pickands, J. (1975). Statistical inference using extreme order statistics. The Annals of Statistics, 3(1), 119–131. https://doi.org/10.1214/aos/1176343003

Scarrott, C., & MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT – Statistical Journal, 10(1), 33–60. https://doi.org/10.57805/revstat.v10i1.110

Singh, A. K., Allen, D. E., & Powell, R. J. (2011). Value at Risk estimation using Extreme Value Theory. In F. Chan, D. Marinova, & R. S. Anderssen (Eds.), MODSIM 2011: 19th International Congress on Modelling and Simulation: Proceedings (pp. 1478–1484). Modelling and Simulation Society of Australia and New Zealand.

Tang, Y., Wang, H. J., & Li, D. (2024). High-dimensional extreme quantile regression. arXiv preprint arXiv:2411.13822. Retrieved from https://arxiv.org/abs/2411.13822

Wadsworth, J. L. (2016). Exploiting structure of maximum likelihood estimators for extreme value threshold selection. Technometrics, 58(1), 116–126. https://doi.org/10.1080/00401706.2014.998345

Wager, S., & Athey, S. (2024). Extremal random forests. Journal of the American Statistical Association, 119(548), 1–24. https://doi.org/10.1080/01621459.2023.2300522

World Meteorological Organization. (2009). Guidelines on analysis of extremes in a changing climate in support of informed decisions for adaptation (WMO-No. 1009).

Threshold Selection for Peak Over Threshold Models Using Logistic Regression

Abstract

References

E-ISSN : 2581-8147