Abstract Parametric and semiparametric regression beyond the mean have become important tools for... more Abstract Parametric and semiparametric regression beyond the mean have become important tools for multivariate data analysis in this world of heteroscedasticity. Among several alternatives, quantile regression is a very popular choice if regression on more than a location measure is desired. This is also due to the inherent robustness of a quantile estimate. However, when moving towards the tails of a distribution, the handling of extreme observations becomes crucial for empirical estimates. M-quantiles handle outliers within the regression analysis by imposing a strong robustness to the loss function. However, this loss function is typically not designed to handle heteroscedasticity. An adaptive extension to the degree of robustness within the loss function is proposed along with the implementation of semiparametric predictors in an M-quantile regression model. A practical method to compute confidence intervals is also presented. The methods are supported by extensive simulations and an analysis of childhood malnutrition in Tanzania.
There is growing interest in a data integration approach to survey sampling, particularly where p... more There is growing interest in a data integration approach to survey sampling, particularly where population registers are linked for sampling and subsequent analysis. The reason for doing this is simple: it is only by linking the same individuals in the different sources that it becomes possible to create a data set suitable for analysis. But data linkage is not error free. Many linkages are nondeterministic, based on how likely a linking decision corresponds to a correct match, that is, it brings together the same individual in all sources. High quality linking will ensure that the probability of this happening is high. Analysis of the linked data should take account of this additional source of error when this is not the case. This is especially true for secondary analysis carried out without access to the linking information, that is, the often confidential data that agencies use in their record matching. We describe an inferential framework that allows for linkage errors when sam...
Journal of the Royal Statistical Society Series B: Statistical Methodology, 2020
Data linkage can be used to combine values of the variable of interest from a national survey wit... more Data linkage can be used to combine values of the variable of interest from a national survey with values of auxiliary variables obtained from another source, such as a population register, for use in small area estimation. However, linkage errors can induce bias when fitting regression models; moreover, they can create non-representative outliers in the linked data in addition to the presence of potential representative outliers. In this paper, we adopt a secondary analyst’s point of view, assuming that limited information is available on the linkage process, and develop small area estimators based on linear mixed models and M-quantile models to accommodate linked data containing a mix of both types of outliers. We illustrate the properties of these small area estimators, as well as estimators of their mean squared error, by means of model-based and design-based simulation experiments. We further illustrate the proposed methodology by applying it to linked data from the European Su...
This paper extends the concept of informative selection, population distribution and sample distr... more This paper extends the concept of informative selection, population distribution and sample distribution to a spatial process context. These notions were first defined in a context where the output of the random process of interest consists of independent and identically distributed realisations for each individual of a population. It has been showed that informative selection was inducing a stochastic dependence among realisations on the selected units. In the context of spatial processes, the “population” is a continuous space and realisations for two different elements of the population are not independent. We show how informative selection may induce a different dependence among selected units and how the sample distribution differs from the population distribution.
A new semiparametric approach to model-based small area prediction for counts is proposed and use... more A new semiparametric approach to model-based small area prediction for counts is proposed and used for estimating the average number of visits to physicians for Health Districts in Central Italy. The proposed small area predictor can be viewed as an outlier robust alternative to the more commonly used empirical plug-in predictor that is based on a Poisson generalized linear mixed model with Gaussian random effects. Results from the real data application and from a simulation experiment confirm that the proposed small area predictor has good robustness properties and in some cases can be more efficient than alternative small area approaches.
Abstract Parametric and semiparametric regression beyond the mean have become important tools for... more Abstract Parametric and semiparametric regression beyond the mean have become important tools for multivariate data analysis in this world of heteroscedasticity. Among several alternatives, quantile regression is a very popular choice if regression on more than a location measure is desired. This is also due to the inherent robustness of a quantile estimate. However, when moving towards the tails of a distribution, the handling of extreme observations becomes crucial for empirical estimates. M-quantiles handle outliers within the regression analysis by imposing a strong robustness to the loss function. However, this loss function is typically not designed to handle heteroscedasticity. An adaptive extension to the degree of robustness within the loss function is proposed along with the implementation of semiparametric predictors in an M-quantile regression model. A practical method to compute confidence intervals is also presented. The methods are supported by extensive simulations and an analysis of childhood malnutrition in Tanzania.
There is growing interest in a data integration approach to survey sampling, particularly where p... more There is growing interest in a data integration approach to survey sampling, particularly where population registers are linked for sampling and subsequent analysis. The reason for doing this is simple: it is only by linking the same individuals in the different sources that it becomes possible to create a data set suitable for analysis. But data linkage is not error free. Many linkages are nondeterministic, based on how likely a linking decision corresponds to a correct match, that is, it brings together the same individual in all sources. High quality linking will ensure that the probability of this happening is high. Analysis of the linked data should take account of this additional source of error when this is not the case. This is especially true for secondary analysis carried out without access to the linking information, that is, the often confidential data that agencies use in their record matching. We describe an inferential framework that allows for linkage errors when sam...
Journal of the Royal Statistical Society Series B: Statistical Methodology, 2020
Data linkage can be used to combine values of the variable of interest from a national survey wit... more Data linkage can be used to combine values of the variable of interest from a national survey with values of auxiliary variables obtained from another source, such as a population register, for use in small area estimation. However, linkage errors can induce bias when fitting regression models; moreover, they can create non-representative outliers in the linked data in addition to the presence of potential representative outliers. In this paper, we adopt a secondary analyst’s point of view, assuming that limited information is available on the linkage process, and develop small area estimators based on linear mixed models and M-quantile models to accommodate linked data containing a mix of both types of outliers. We illustrate the properties of these small area estimators, as well as estimators of their mean squared error, by means of model-based and design-based simulation experiments. We further illustrate the proposed methodology by applying it to linked data from the European Su...
This paper extends the concept of informative selection, population distribution and sample distr... more This paper extends the concept of informative selection, population distribution and sample distribution to a spatial process context. These notions were first defined in a context where the output of the random process of interest consists of independent and identically distributed realisations for each individual of a population. It has been showed that informative selection was inducing a stochastic dependence among realisations on the selected units. In the context of spatial processes, the “population” is a continuous space and realisations for two different elements of the population are not independent. We show how informative selection may induce a different dependence among selected units and how the sample distribution differs from the population distribution.
A new semiparametric approach to model-based small area prediction for counts is proposed and use... more A new semiparametric approach to model-based small area prediction for counts is proposed and used for estimating the average number of visits to physicians for Health Districts in Central Italy. The proposed small area predictor can be viewed as an outlier robust alternative to the more commonly used empirical plug-in predictor that is based on a Poisson generalized linear mixed model with Gaussian random effects. Results from the real data application and from a simulation experiment confirm that the proposed small area predictor has good robustness properties and in some cases can be more efficient than alternative small area approaches.
Uploads
Papers