# 比較RMSE與模型

Your suggestion about using a null model is similar to \$R^2\$. \$R^2\$ is defined as \$1-MSE/V\$, where \$MSE\$ is the model's mean squared error and \$V\$ is the variance of the observed output. You can think of the variance as the mean squared error of a null model that always gives the mean as its predicted output. Even here, the question is: how much better can you do? This is very hard to answer. The reason is that it's hard to know whether the error reflects variation in the output that's fundamentally unpredictable from the input (e.g. 'noise', but could be something else), or whether additional structure is present that the model has simply failed to capture. Sometimes looking at the residuals can give a hint. Under some circumstances, it's possible to estimate the 'noise' level. For example, if you have many repeated trials where inputs are identical, you can measure variability of the output for equal inputs. This gives a bound on the maximum possible performance. You would typically encounter this situation in the context of controlled experiments. Or, you may be able to do something similar if you have access to a known 'correct model' (e.g. in a theoretical setting, or if you're modeling a well understood physical system). Otherwise, it's hard to know whether there's a better model out there.

Looking at the training vs. test error can give you some idea about the extent to which your model is overfitting (the expected training error would be lower than the expected test error). There can be variability here when using a small number of samples and/or few repetitions. A gap between training and test error isn't a problem per se, but a large gap might signal a problem. Even so...one model that overfits might still have better generalization performance than another model that doesn't.

Instead of asking how good your model is, you can also ask how bad it is. You could use a significance testing approach to see whether your prediction is better than 'chance'. For example, you might compare the test error on real data to the test error on permuted data (where relationships between the input/output have been destroyed, and any apparent performance is due to sampling variability or overfitting).