建模時如何確定正確的細節水平?


32

"所有模型都是錯誤的,但有些模型是有用的"-George E. P. Box

我通常會研究所謂的運營問題。在這裡,我通常沒有太多麻煩來確定模型提供價值所需的細節水平。但是,當我碰巧處理戰術/戰略問題時,我會更加努力地找出適當的細節水平。

要對此討論提供一些支持,請考慮以下示例:假設您正在解決一個問題,即您想要確定適當的車輛組合以將殘障人士從他們的家中運送到日託中心。您知道需求每天都在變化,但您想確定在未來4年內需要購買哪些車輛才能運行。

某些車輛具有固定配置,而其他車輛則可以重新配置(例如,某些座椅可以折疊,並且您可以容納輪椅來代替2個常規座椅)。在操作模型中,您肯定要考慮此重新配置,因為它可以在"好"路徑與"壞"路徑之間產生區別。但是,當解決一個更長期的問題時,您僅部分了解從現在開始兩個月後的需求情況,是否有必要考慮到這一點?還是只是一步而已,只會使模型變得更加複雜/緩慢而又沒有真正為解決方案增加價值?

TL; DR您如何確定模型何時足夠詳細才能有用?過多的細節何時會真正損害模型?

23

Interesting! Here are some questions that may be helpful to think through:

  • Will more detail substantially change the optimal solution?
  • Am I including enough detail to answer the question I want to answer?
  • If there are similar models for my problem, what do they include?
  • What other details am I excluding? Could they have more of an effect that the dynamic I'm debating?
  • Will a more precise answer be a more accurate answer? (E.g., if data for the extra details are wrong, including them may give a better solution to the wrong problem)
  • What are my computational limitations? (E.g., if I don't have to solve the problem very often, maybe a longer solution time with a better answer is the way to go.)
  • What level of detail does my client/collaborator need to "trust" the model?

Some downsides to excess detail include:

  • Longer solution time/tractability
  • Risk of obscuring key takeaways
  • Bad data may lead astray

To figure out whether more detail may affect the solution, one option is to do sensitivity/scenario analysis on the simpler model. For the example you give, that might be running the model with different levels of demand to see how the solution changes. If it doesn't change much, that may indicate you don't need to let it vary.

In terms of being able to answer the right question, that sounds obvious but perhaps a good practice to double-check. For the example you give, if the goal is to figure out what mix of vehicles to buy, the model should probably include all of the vehicle types/configuration options.

Looking forward to other responses.


15

I think it's useful to think about timescale, which is related to, but not equivalent to, level of detail. In particular, I think it is usually better to start with a model that does not include multiple, very different, timescales.

Your hypothetical problem involving vehicle mix (strategic question, timescale = years) and seat configuration (operational question, timescale = days) is a great example of this. One would presumably want to start with a model that optimized one or the other, but not both.

But this is not a hard-and-fast rule, and it is worth some experimentation. If the shorter-timescale decisions do not significantly affect the longer-timescale ones, they should not be included in the model. So, if the optimal vehicle mix changes significantly when you include seat configuration in the model, seat configuration should be included in the vehicle mix model. Otherwise, it should not. (Probably it will not change the mix problem, so it should not be included.)

Of course, it is always a tradeoff. As another example, facility location is a strategic problem. So it's worth asking whether we should include tactical decisions like inventory or operational decisions like routing into the facility location decision.

In the case of inventory, the inclusion of inventory changes the optimal facility locations, and moreover in at least some location–inventory models, the computational cost of adding inventory is relatively small. Therefore, it seems reasonable to include inventory in the facility location problem.

On the other hand, routing tends not to change the optimal facility locations much (I believe—someone might want to check me on that), and moreover, location–routing models are much harder to solve than straight facility location models, so the tradeoff argues for not including routing, in general.


12

Assuming that (a) the model is going to be used repeatedly, (b) solving the more complex version can be done in a tolerable (if perhaps longer) time and (c) the complexity does not rise to the level of blowing your credibility with the user, you might try coding both models, running both against a few reasonable scenarios (using different guesstimates for the uncertain parameters), and assess whether the solution to the more complex model looks meaningfully better than that of the less complex model. If the input data will be uncertain at the time the model will be used (as opposed to uncertain at development time but certain at production time), you might also try evaluating the solutions you got in the previous step against other scenarios besides the scenario that generated each one, to get a feel for how robust the models are. This is not as good as a full fledged simulation experiment, but perhaps better than just giving the user one model or the other and crossing your fingers.


7

Which problems are you trying to solve? What is an acceptable level of failures? The level of detail you need is bounded by those answers. You need a sufficiently detailed model where you can statistically prove a certain level of avoidance through the variants you expect.

Trying to solve problems you don't know or have is not solvable by the same model, IMHO; don't try. Try simplifying. If normal operations are simple and non error prone, deviations will be easier to handle.


11

The best model is the one that will make the users of the model make the best decisions. This requires both that the solutions are high-quality and that the users actually use them to drive decisions. It is therefore important always to keep the end-goal in mind when finding the balance between simplicity and complexity. The following three tips can help you strike that balance!

Be agile. Finding the right detail-level of a model from the start is almost impossible as it requires you to understand all aspects of the problem upfront i.e. operational aspects, available data, computational difficulties, and the decision-process the model is going to be used in. The most important thing is therefore to have an iterative and agile approach where you can learn new aspects of the problem along the way and tune your model. Keep the end-users included in the process and get their ongoing feedback

When in doubt, leave it out. Increasing the complexity of a model has a cost. It will be harder for you to explain to the end-users what is going on which can decrease adoption. It will also be more expensive to maintain and more difficult to extend further. Often a model is only for decision-support, where the user can fine-tune the solution provided by the model before making the final decision. If a user can spend a few minutes and get the perfect solution, it might not be worth extending the model. Therefore, if you are in doubt if something should be added, wait and get some feedback.

Ensure you understand the problem with your current model before you extend it. Users will often upfront list constraints and objectives that they think are critical for solving the problem. This doesn't imply that it should be part of your model. Before you add a new constraint, ensure you understand exactly what solutions that the model is currently proposing that are not infeasible and why. In the same way, before adding a new objective, ensure you understand when the model is proposing a solution where a better one exists. Therefore, if you not sure if or why something will improve the decisions, probe the users more to understand where the output of the model is not working.

Finally, spend some time early in the process to find a good way of visualizing your solutions. This will make it much easier to get good feedback from users that will help you tune your model!


7

Something in another dimension that hasn't been mentioned yet.

The "size" of the problem in its simplest form may inform which techniques are most useful. You may even be on the fringe of using just one technique and need to combine some. One thing that I have used for example is clustering techniques to pre-compute assignments thus removing a bunch of decision variables. You of course have to be careful as this may create packing issues if you are using integer assignments.

For example if you wanted to scale your current model to a 10 year time horizon. Does it make sense to model the detail or is something like cashflow more relevant?

Further you could also start to exploit parts of the problem definition, you mention that its a pickup and drop off service. You can maybe bake in some assumptions that represent the potential routes for each customer, humans are quite predictable after all. This may remove the need to generate actual routes. The solutions can then be validated using your detailed model.