A new discussion on the performance of co-Kriging and ordinary Kriging.
1. Common discussions on CK and OK
There have been many discussions about the comparison between co-Kriging (CK) and
ordinary Kriging (OK) in terms of prediction error and variance. For example, a
technical note
about CK with gstat package in R compares CK and OK using different co-variates with different strength of correlation with the response.
Also, the documentation of CK in ArcGIS
mentions that “theoretically, you can do no worse than kriging because if there is no cross-correlation,
you can fall back on autocorrelation for the response.”
To summarize, a common conclusion is that
if there is a strong correlation between
a covariate and the response, CK would make better prediction than OK, otherwise,
CK would make no better prediction than OK.
So, the answer to the question is “no”, as CK is better than OK only if there exist strong correlations between covariates and response.
The “better” means smaller prediction error and variance.
2. A new discussion
The conclusion above is not strictly accurate or reliable for all situations, as in many conditions, even with strong correlated covariates, the outperformance of CK compared to OK is still hard to notice. So, a new problem would be
when and why doesn’t CK outperform OK?
Previous discussions focus on testing the influence of the correlation strength between covariates and response.
However, considering that Kriging is to make predictions at unknown locations based on known sampling locations and
a previous post mentions that CK is used when
auxiliary variables are available but not available at all grid-nodes, therefore,
the sampling size and locations should affect the performance of CK a lot in real practice.
CK is identical to OK when the sampling locations of auxiliary variables is the same with or is a subset of locations where the
response variable is observed.
CK might outperform OK when the auxiliary variable is observed at more locations than the response variable.
The above conclusions are quite intuitive and not hard to understand. A case study using gstat package with the commonly used “meuse” data in R is conducted to validate the above conclusions.
(0). Select strongly correlated zinc (zn) and lead (pb) as response and covariate, respectively.
(1). Sample training and test data. Make sure zinc and lead are sampled at the same locations in the training data.
Conduct OK and CK respectively and calculate their prediction RMSE for test data.
The RMSE for OK and CK are RMSE.OK = 0.166, RMSE.CK1 = 0.182, which demonstrate CK does not outperform OK
when the sampling of covariate is the same with the sampling of response.
(2). Conduct another sampling for lead and make it have more samples than zinc.
The prediction RMSE is RMSE.CK2 = 0.087 which is much smaller than the previous OK and CK prediction.
Therefore, for this case, CK outperforms OK when the covariate is sampled with more density than the response.
(3). To visualize the difference, predictions are made on “meuse.grid” data using OK, CK1, and CK2 respectively.
OK, CK1 and CK2 prediction results.
As can be seen from the figure, OK prediction is almost the same as CK prediction1 when the sampling locations of zinc and lead are the same.
When more lead data are sampled, CK prediction2 becomes quite different from OK prediction and CK prediction1.