TAQ半小時庫存量數據相對新聞量的回歸


2

我打算對特定股票的半小時新聞量與半小時新聞量進行回歸分析。我正在查看2年的數據進行分析。但是,我一直在思考每天應該在非交易時段進行哪些操作?

具體來說:1.我是否應該僅對交易所的工作時間進行數據回歸,這意味著我回歸的Y值將從開始的每天的9:30-16:00每30分鐘包含"庫存量"日期到我的回歸期結束日期,X值將是每30分鐘對應的"新聞量"嗎?

OR

  1. 我是否需要使數據在30分鐘內均勻分佈,並在每天的"非交易時間"中加上"零"作為庫存量和新聞量?

我相信兩種情況下的回歸結果都會有所不同。需要緊急建議。

2

Do not run the zeros against the zeros. This is similar to how weekends are treated in academic studies. There is not five days with two additional days of 0 in the regressions for each week in the sample... there is just the five days (although I do encourage you to read about the weekend effect).

Your hypothesis is that there exists a function $Volume(t) = f(News(t)) + e(t)$. When the market is closed, no such function can exist, so what are you supposedly estimating with the zeros in the regression equation? If you include the zeroes, then what you are saying to the model is that during these times $Volume(t)=0$ because $News(t)=0$. Yet we know this is false, and that they are both zero because $t \in \{Market Close\}$.

If you are really concerned about the irregularly spaced time series, you could consider a more legitimate data generating process:

$$ Volume(t) = f(News(t))*I(t \in \{Market Open\}) + c*I(t \in \{Market Close\}) + e(t)$$

where $I$ is an indicator function. However you will notice that this will give you identical parameter estimates (if $f$ is linear with an intercept) as if you simply estimated the original equation during trading hours only.