# 在R中，當glms具有負二項式分佈時，方差分析（glm1，glm2，test =" Chisq"）告訴我的p值是多少？

``````anova(mnegbin1, mnegbin2, test = "Chisq")
Likelihood ratio tests of Negative Binomial Models

Response: count
Model    theta Resid. df    2 x log-lik.   Test    df      LR stat. Pr(Chi)
1 origin + substrate + sample 18.46502      1232       -2459.886
2          substrate + sample 18.46502      1232       -2459.886 1 vs 2     0 -1.500666e-11       1
``````

When you fit a model, you need to check whether it's overly determined, meaning some terms cannot be determined. We can look at one of your samples, for example only for RB5 and Nocardia_nova_SH22a_RB20_RB56:

``````df = structure(list(X = c(226L, 227L, 228L, 229L, 230L, 231L, 232L,
233L, 234L, 235L, 236L, 237L, 238L, 239L, 240L, 241L, 242L, 243L,
244L, 245L, 246L, 247L, 248L, 249L, 250L, 251L, 252L, 253L, 254L,
255L, 256L, 257L, 258L, 259L, 260L, 261L, 262L, 263L, 264L, 265L,
266L, 267L, 268L, 269L, 270L, 1126L, 1127L, 1128L, 1129L, 1130L,
1131L, 1132L, 1133L, 1134L, 1135L, 1136L, 1137L, 1138L, 1139L,
1140L, 1141L, 1142L, 1143L, 1144L, 1145L, 1146L, 1147L, 1148L,
1149L, 1150L, 1151L, 1152L, 1153L, 1154L, 1155L, 1156L, 1157L,
1158L, 1159L, 1160L, 1161L, 1162L, 1163L, 1164L, 1165L, 1166L,
1167L, 1168L, 1169L, 1170L), sample = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Nocardia_nova_SH22a_RB20_RB56",
"RB56"), class = "factor"), substrate = structure(c(10L, 37L,
29L, 11L, 24L, 9L, 39L, 30L, 42L, 14L, 21L, 40L, 38L, 45L, 28L,
5L, 22L, 3L, 41L, 16L, 32L, 27L, 18L, 26L, 20L, 34L, 7L, 12L,
23L, 17L, 44L, 35L, 4L, 19L, 25L, 13L, 1L, 33L, 31L, 43L, 36L,
6L, 2L, 8L, 15L, 10L, 37L, 29L, 11L, 24L, 9L, 39L, 30L, 42L,
14L, 21L, 40L, 38L, 45L, 28L, 5L, 22L, 3L, 41L, 16L, 32L, 27L,
18L, 26L, 20L, 34L, 7L, 12L, 23L, 17L, 44L, 35L, 4L, 19L, 25L,
13L, 1L, 33L, 31L, 43L, 36L, 6L, 2L, 8L, 15L), .Label = c("acylsphingosine",
"agarose", "alcohol", "alginate", "alpha-glucan", "amylaceous polysaccharides",
"arabinogalactans", "aribinose", "beta-glucan", "cellulose",
"chitin", "cutin", "cyclomaltodextrin", "D-galactose", "D-glucan",
"D-glucose", "dermatan sulfate", "dextran", "fructans", "fructose",
"galactan", "galactose", "galacturonide", "glucan", "glucose",
"glucuronoside", "glycan", "glycans", "glycogen", "hemicellulose",
"hyaluronate", "L-rhamnose", "lacto-N-tetraose", "levan", "lignin",
"maltose", "mannan", "mannose", "pectin", "peptidoglycan", "pullulan",
"trehalose", "Vanillyl-alcohol", "xanthan", "xylose"), class = "factor"),
count = c(2L, 0L, 2L, 2L, 0L, 1L, 2L, 2L, 3L, 0L, 0L, 1L,
0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 3L, 1L, 3L, 3L, 0L, 3L, 3L, 5L, 2L, 0L, 0L, 4L,
0L, 0L, 1L, 2L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 0L), origin = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("free", "isolate"
), class = "factor")), row.names = c(226L, 227L, 228L, 229L,
230L, 231L, 232L, 233L, 234L, 235L, 236L, 237L, 238L, 239L, 240L,
241L, 242L, 243L, 244L, 245L, 246L, 247L, 248L, 249L, 250L, 251L,
252L, 253L, 254L, 255L, 256L, 257L, 258L, 259L, 260L, 261L, 262L,
263L, 264L, 265L, 266L, 267L, 268L, 269L, 270L, 1126L, 1127L,
1128L, 1129L, 1130L, 1131L, 1132L, 1133L, 1134L, 1135L, 1136L,
1137L, 1138L, 1139L, 1140L, 1141L, 1142L, 1143L, 1144L, 1145L,
1146L, 1147L, 1148L, 1149L, 1150L, 1151L, 1152L, 1153L, 1154L,
1155L, 1156L, 1157L, 1158L, 1159L, 1160L, 1161L, 1162L, 1163L,
1164L, 1165L, 1166L, 1167L, 1168L, 1169L, 1170L), class = "data.frame")
``````

Now we look at whether how it is distributed:

``````df$$sample = droplevels(df$$sample)
table(df$$origin,df$$sample)

Nocardia_nova_SH22a_RB20_RB56 RB56
free                               45    0
isolate                             0   45
``````

You can see, the samples are either free or isolate, and never both. So you cannot discern the effect of origin from samples. This means your model is over-determined and hence you see the error above.

To know the effect of origin, you have to fit, i used a poisson for this example because if i use only the subset above, the counts are not overdispersed hence glm.nb throws some error:

``````fit = glm(count~ origin+substrate,data=df,family=poisson)
summary(fit)

Estimate   Std. Error       z value   Pr(>|z|)
(Intercept)           -2.266968e+01 2.930556e+04 -7.735624e-04 0.99938279
originisolate          6.931472e-01 2.970443e-01  2.333481e+00 0.01962291
substrateagarose       2.282092e-06 4.144430e+04  5.506408e-11 1.00000000
substratealcohol       2.280854e-06 4.144430e+04  5.503420e-11 1.00000000
substratealginate      2.278865e-06 4.144430e+04  5.498621e-11 1.00000000
substratealpha-glucan  2.295736e+01 2.930556e+04  7.833790e-04 0.99937495
``````

You can see that in this model, free is the reference, and isolate is compared to free in the "originisolate". The coefficient 6.931472e-01 means isolate samples are higher than free samples, by exp(6.931472e-01) times.