循環引用有什麼問題?


172

我今天參加了一次編程討論,在那兒,我做了一些聲明,這些聲明基本上都是在公理上假定循環引用(在模塊,類之間,無論如何)通常是不好的。一經講解,我的同事就問:"循環引用有什麼問題?"

對此我有很深的感情,但是我很難做到簡潔而具體。我可能提出的任何解釋都傾向於依賴於我也考慮過公理的其他項目("不能孤立使用,因此無法測試","參與對像中狀態變化時的未知/不確定行為",等等。。),但我很想听聽一個簡潔的原因,為什麼循環引用很糟糕,卻沒有我自己的大腦所經歷的那種信念飛躍,多年來花費了很多時間來弄清它們的理解,修正,並擴展各種代碼。

編輯:我不是在問同質的循環引用,例如雙向鏈接列表或指向父母的指針。這個問題的確是在詢問"更大範圍"的循環引用,例如libA調用libB,後者又回調libA。如果願意,可以用"模塊"代替" lib"。謝謝您到目前為止的所有回答!

1

Some garbage collectors have trouble cleaning them up, because each object is being referenced by another.

EDIT: As noted by the comments below, this is true only for an extremely naive attempt at a garbage collector, not one that you would ever encounter in practice.


10

Hmm... that depends on what you mean by circular dependence, because there are actually some circular dependencies which I think are very beneficial.

Consider an XML DOM -- it makes sense for every node to have a reference to their parent, and for every parent to have a list of its children. The structure is logically a tree, but from the point of view of a garbage collection algorithm or similar the structure is circular.


2

I'd answer that question with another question:

What situation can you give me where keeping a circular reference model is the best model for what you're trying to build?

From my experience, the best model will pretty much never involve circular references in the way I think you mean it. That being said, there are a lot of models where you use circular references all the time, it's just extremely basic. Parent -> Child relationships, any graph model, etc, but these are well known models and I think you're referring to something else entirely.


23

A circular reference is twice the coupling of a non-circular reference.

If Foo knows about Bar, and Bar knows about Foo, you have two things that need changing (when the requirement comes that Foos and Bars must no longer know about each other). If Foo knows about Bar, but a Bar doesn't know about Foo, you can change Foo without touching Bar.

Cyclical references can also cause bootstrapping problems, at least in environments that last for a long time (deployed services, image-based development environments), where Foo depends on Bar working in order to load, but Bar also depends on Foo working in order to load.


15

They may be bad not by themselves but as an indicator of a possible poor design. If Foo depends on Bar and Bar depends on Foo, it is justified to question why they are two instead of a unique FooBar.


9

Is like the Chicken or the Egg problem.

There are many cases in which circular reference are inevitable and are useful but, for example, in the following case it doesn't work:

Project A depends on project B and B depends on A. A needs to be compiled to be used in B which requires B to be compiled before A which requires B to be compiled before A which ...


17

When you tie two bits of code together, you effectively have one large piece of code. The difficulty of maintaining a bit of code is at least the square of its size, and possibly higher.

People often look at single class (/function/file/etc.) complexity and forget that you really should be considering the complexity of the smallest separable (encapsulatable) unit. Having a circular dependency increases the size of that unit, possibly invisibly (until you start trying to change file 1 and realize that also requires changes in files 2-127).


5

In database terms, circular references with proper PK/FK relationships make it impossible to insert or delete data. If you can't delete from table a unless the record is gone from table b and you can't delete from table b unless the record is gone from table A, you can't delete. Same with inserts. this is why many databases do not allow you to set up cascading updates or deletes if there is a circular reference because at some point, it becomes not possible. Yes you can set up these kind of relationships with out the PK/Fk being formally declared but then you will (100% of the time in my experience) have data integrity problems. That's just bad design.


234

There are a great many things wrong with circular references:

  • Circular class references create high coupling; both classes must be recompiled every time either of them is changed.

  • Circular assembly references prevent static linking, because B depends on A but A cannot be assembled until B is complete.

  • Circular object references can crash naïve recursive algorithms (such as serializers, visitors and pretty-printers) with stack overflows. The more advanced algorithms will have cycle detection and will merely fail with a more descriptive exception/error message.

  • Circular object references also make dependency injection impossible, significantly reducing the testability of your system.

  • Objects with a very large number of circular references are often God Objects. Even if they are not, they have a tendency to lead to Spaghetti Code.

  • Circular entity references (especially in databases, but also in domain models) prevent the use of non-nullability constraints, which may eventually lead to data corruption or at least inconsistency.

  • Circular references in general are simply confusing and drastically increase the cognitive load when attempting to understand how a program functions.

Please, think of the children; avoid circular references whenever you can.


6

While I agree with most of the comments here I would like to plead a special case for the "parent"/"child" circular reference.

A class often needs to know something about its parent or owning class, perhaps default behavior, the name of the file the data came from ,the sql statement that selected the column, or, the location of a log file etc.

You can do this without a circular reference by having a containing class so that what was previously the "parent" is now a sibling, but it is not always possible to re-factor existing code to do this.

The other alternative is to pass all the data a child might need in its constructor, which end up being just plain horrible.


4

I'll take this question from modelling point of view.

As long as you don't add any relationships that aren't actually there, you are safe. If you do add them, you get less integrity in data (cause there is a redundancy) and more tightly coupled code.

The thing with the circular references specifically is that I haven't seen a case where they would be actually needed except one - self reference. If you model trees or graphs, you need that and it is perfectly all right because self-reference is harmless from the code-quality point of view (no dependency added).

I believe that at the moment you start to need a not-self reference, immediately you should ask if you can't model it as a graph (collapse the multiple entities into one - node). Maybe there is a case in between where you make a circular reference but modelling it as graph is not appropriate but I highly doubt that.

There is a danger that people think that they need a circular reference but in fact they don't. The most common case is "The-one-of-many case". For instance, you have got a customer with multiple addresses from which one should be marked as the primary address. It is very tempting to model this situation as two separate relationships has_address and is_primary_address_of but it is not correct. The reason is that being the primary address is not a separate relationship between users and addresses but instead it is an attribute of the relationship has address. Why is that? Because its domain is limited to the user's addresses and not to all the addresses there are. You pick one of the links and mark it as the strongest (primary).

(Going to talk about databases now) Many people opt for the two-relationships solution because they understand to "primary" as being a unique pointer and a foreign key is kind of a pointer. So foreign key should be the thing to use, right? Wrong. Foreign keys represent relationships but "primary" is not a relationship. It is a degenerated case of an ordering where one element is above all and the rest is not ordered. If you needed to model a total ordering you would of course consider it as a relationship's attribute because there is basically no other choice. But at the moment you degenerate it, there is a choice and quite a horrible one - to model something that is not a relationship as a relationship. So here it comes - relationship redundancy which is certainly not something to be underestimated. The uniqueness requirement should be imposed in another way, for instance by unique partial indexes.

So, I wouldn't allow a circular reference to occur unless it is absolutely clear that it comes from the thing I am modelling.

(note: this is slightly biased to database design but I would bet it is fairly applicable to other areas too)


1

Circular references in data structures is sometimes the natural way of expressing a data model. Coding-wise, it's definitely not ideal and can be (to some extent) solved by dependency injection, pushing the problem from code to data.


-3

In my opinion having unrestricted references makes program design easier, but we all know that some programming languages lack support for them in some contexts.

You mentioned references between modules or classes. In that case it's a static thing, predefined by the programmer, and it's clearly possible for the programmer to search for a structure that lacks circularity, though it might not fit the problem cleanly.

The real problem comes in circularity in run time data structures, where some problems actually can't be defined in a way that gets rid of circularity. In the end though - it's the problem that should dictate and requiring anything else is forcing the programmer to solve an unnecessary puzzle.

I'd say that's a problem with the tools not a problem with the principle.


1

A circular reference construct is problematic, not just from a design standpoint, but from an error catching standpoint as well.

Consider the possibility of a code failure. You haven't placed proper error catching in either class, either because you haven't developed your methods that far yet, or you're lazy. Either way, you don't have an error message to tell you what transpired, and you need to debug it. As a good program designer, you know what methods are related to what processes, so you can narrow it down to those methods relevant to the process that caused the error.

With circular references, your problems have now doubled. Because your processes are tightly bound, you have no way of knowing which method in which class might have caused the error, or from whence the error came, because one class is dependent on the other is dependent on the other. You now have to spend time testing both classes in conjunction to find out which one is really responsible for the error.

Of course, proper error catching resolves this, but only if you know when an error is likely to occur. And if you're using generic error messages, you're still not much better off.