# TAQ半小時庫存量數據相對新聞量的回歸

OR

1. 我是否需要使數據在30分鐘內均勻分佈，並在每天的"非交易時間"中加上"零"作為庫存量和新聞量？

Do not run the zeros against the zeros. This is similar to how weekends are treated in academic studies. There is not five days with two additional days of 0 in the regressions for each week in the sample... there is just the five days (although I do encourage you to read about the weekend effect).

Your hypothesis is that there exists a function $Volume(t) = f(News(t)) + e(t)$. When the market is closed, no such function can exist, so what are you supposedly estimating with the zeros in the regression equation? If you include the zeroes, then what you are saying to the model is that during these times $Volume(t)=0$ because $News(t)=0$. Yet we know this is false, and that they are both zero because $t \in \{Market Close\}$.

If you are really concerned about the irregularly spaced time series, you could consider a more legitimate data generating process:

$$Volume(t) = f(News(t))*I(t \in \{Market Open\}) + c*I(t \in \{Market Close\}) + e(t)$$

where $I$ is an indicator function. However you will notice that this will give you identical parameter estimates (if $f$ is linear with an intercept) as if you simply estimated the original equation during trading hours only.