Threshold Selection for Peak Over Threshold Models Using Logistic Regression
Abstract
Threshold selection remains a critical challenge in the application of Extreme Value Theory (EVT), particularly in actuarial science, where accurate modeling of extreme insurance claims is vital for solvency, capital adequacy, and reinsurance pricing. Traditional graphical tools, such as the mean residual life (MRL) plot, are highly dependent on visual interpretation and expert judgment, which limits reproducibility and consistency between practitioners. This paper proposes a machine learning approach using logistic regression to assign extremeness probabilities to insurance claims. The model is trained on labels generated from an initial quantile-based rule, classifying claims above the 90th percentile as extreme. The optimal threshold is determined as the smallest amount of claim with predicted probability exceeding a predefined cut-off (e.g., $90\%$). This probability-based rule provides an objective alternative to visual diagnostics and aligns well with classical EVT tools such as the MRL plot. After identifying exceedances above the selected threshold, a Generalized Pareto Distribution (GPD) is fitted using maximum likelihood estimation. Tail-based risk measures, including Value at Risk (VaR) and Expected Shortfall (ES), are then computed at the $99\%$ confidence interval to quantify the severity of potential extreme losses. The proposed framework is interpretable, reproducible, and readily applicable in actuarial workflows, offering a more consistent and automated solution for tail risk modeling.
References
Beirlant, J., Goegebeur, Y., Segers, J., & Teugels, J. L. (2006). Statistics of extremes: Theory and applications. John Wiley & Sons. https://doi.org/10.1002/0470012382
Bommier, E. (2014). Peaks-over-threshold modelling of environmental data (Technical report, U.U.D.M. Project Report 2014:33). Department of Mathematics, Uppsala University.
del Castillo, J., & Daoudi, J. (2009). Estimation of the generalized Pareto distribution. Statistics & Probability Letters, 79(5), 684–688. https://doi.org/10.1016/j.spl.2008.10.021
Charras-Garrido, M., & Lezaud, P. (2013). Extreme value analysis: An introduction. Journal de la Société Française de Statistique, 154(2), 66–97.
Coles, S., Bawa, J., Trenner, L., & Dorazio, P. (2001). An introduction to statistical modeling of extreme values (Vol. 208). Springer. https://doi.org/10.1007/978-1-4471-3675-0
Davison, A. C., & Smith, R. L. (1990). Models for exceedances over high thresholds. Journal of the Royal Statistical Society: Series B (Methodological), 52(3), 393–425. https://doi.org/10.1111/j.2517-6161.1990.tb01796.x
Embrechts, P., Klüppelberg, C., & Mikosch, T. (1997). Modelling extremal events: For insurance and finance (Vol. 33). Springer. https://doi.org/10.1007/978-3-642-33483-2
Hosmer, D. W. Jr., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (3rd ed.). John Wiley & Sons. https://doi.org/10.1002/9781118548387
Mosala, R., Rachuene, K. A., & Shongwe, S. C. (2024). Most suitable threshold method for extremes in financial data with different volatility levels. ITM Web of Conferences, 67, 01033. EDP Sciences. https://doi.org/10.1051/itmconf/20246701033
Northrop, P. J., & Coleman, C. L. (2014). Improved threshold diagnostic plots for extreme value analyses. Extremes, 17, 289–303. https://doi.org/10.1007/s10687-014-0183-z
Pickands, J. (1975). Statistical inference using extreme order statistics. The Annals of Statistics, 3(1), 119–131. https://doi.org/10.1214/aos/1176343003
Scarrott, C., & MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT – Statistical Journal, 10(1), 33–60. https://doi.org/10.57805/revstat.v10i1.110
Singh, A. K., Allen, D. E., & Powell, R. J. (2011). Value at Risk estimation using Extreme Value Theory. In F. Chan, D. Marinova, & R. S. Anderssen (Eds.), MODSIM 2011: 19th International Congress on Modelling and Simulation: Proceedings (pp. 1478–1484). Modelling and Simulation Society of Australia and New Zealand.
Tang, Y., Wang, H. J., & Li, D. (2024). High-dimensional extreme quantile regression. arXiv preprint arXiv:2411.13822. Retrieved from https://arxiv.org/abs/2411.13822
Wadsworth, J. L. (2016). Exploiting structure of maximum likelihood estimators for extreme value threshold selection. Technometrics, 58(1), 116–126. https://doi.org/10.1080/00401706.2014.998345
Wager, S., & Athey, S. (2024). Extremal random forests. Journal of the American Statistical Association, 119(548), 1–24. https://doi.org/10.1080/01621459.2023.2300522
World Meteorological Organization. (2009). Guidelines on analysis of extremes in a changing climate in support of informed decisions for adaptation (WMO-No. 1009).
This work is licensed under a Creative Commons Attribution 4.0 International License.