-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Tweedie regression on insurance claims example #17200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As said in the example, Then, there is #15244 for adding a score function D^2, like R^2. Concerning your third question, I, personally, am not a big fan of the Gini index/AUC. Especially in a regression setting like this example, I would advice strongly against using AUC for cross-validation because it is not a strictly consistent scoring function for the expectation (of y), which we want to predict, see https://arxiv.org/abs/0912.0902). If by "relativities" you mean the coefficients or weights of the GLMs, those are accessible by the attributes |
@lorentzenchr how would you select the parameter Using the Gini criterion for model selection of |
@ogrisel Sorry, couldn't resist 😏 Just to clarify for others, the question is how to choose the parameter The following is to the best of my knowledge:
1Note that, only after selecting a value for To sum up: The situation is quite similar to Negative binomial regression for counts:smirk: |
I don't consider myself an expert on Tweedie regression: @kaskr implemented this in glmmTMB, I believe. I have always gone with approach 1 above (use MLE). The tricky question is whether the estimation is stable enough to try to estimate p simultaneously with the rest of the parameters (which is what glmmTMB does, I think), or whether it is better to profile over p (i.e., do a one-dimensional optimization and/or grid search over p; for each value of p do an MLE fit with the value of p held constant). I agree with the point about negative binomial regression: |
@bbolker Thank you very much for sharing your insights. |
Thanks @jieliang for reaching out. It seems to me that all your questions have received attention. I'm closing this issue. Feel free to keep in touch with the community on Stack Overflow or the scikit-learn mailing list for new questions. Thanks. |
https://scikit-learn.org/dev/auto_examples/linear_model/plot_tweedie_regression_insurance_claims.html
I have a question about hyperparameter tuning: how were the values for alpha and other tunable parameters chosen for the passion, gamma (frequency*severity) and tweedie models? Did you do something like grid search and cross validation?
I also thought that it would be nice to have a method to calculate the D-squared score for the composite frequency*severity model, so that it can be compared to the Tweedie model ( The Tweedie GridSearchCV chooses the best value for power based on D-squared score in the example).
Another question is since there's discussion about Gini index at the end of the example, would it make sense to also use Gini index as one of the scoring metrics in GridSearchCV? In insurance applications, if coming up with most accurate rates is the goal, maybe MAE/RMSE is suitable, while Gini index is better for the purpose of ranking policy holders in terms of risk.
Last suggestion is that a function for deriving the relativities of features would be really useful.
Thank you!
The text was updated successfully, but these errors were encountered: