Re: Sets and bags / Identity of a link



Date view Thread view Subject view Author view

Arend Rensink (rensink@cs.utwente.nl)
Wed, 24 Jan 2001 11:17:58 +0100


Hm. Let's see what options we have here. 1. An association is a set of links. Each link has (at least) an identity and a data content yielding a tuple of objects. 2a. An association is a set of object tuples. 2c. An association is a list of object tuples. Let me throw in some more possibilities: 3a. An association is a set-valued object function - that is, a function yielding for every object of a given "source" class (the left-hand side of the association) a (possibly empty) set of objects of a "target" class (the right hand side of the association) 3b. An association is a bag-valued object function. 3c. An association is a list-valued object function, where the lists may not contain duplicates. 3d. An association is an arbitrary list-valued object function. How does this help? We were having trouble deciding between three options, now there are seven! (I'm not even being completist here; I could have thrown in bags of links, lists of links and lists of tuples.) Before everyone starts screaming though, a few observation: - 2a and 3a are mathematically equivalent, as are 2b and 3b - so the number of options is actually four. - There is a hierarchy of modelling power: 3d is strictly more powerful (= able to make more distinctions) than 3c, 1 and 3c are both more powerful than 2b/3b, which are in turn more powerful than 2a/3a. Options 1 and 3c, however, are mutually incomparable: both can make distinctions the other can't. - Options 1 and 2x favour a bidirectional interpretation of assocations, 3x a unidirectional interpretation. - Options 3a-3c are all directly supported by OCL. Now let's have the pros and cons. Gonzalo Génova wrote: > > Similarly, the UML defines an Association as a set of tuples (p. 55). > [...] > Unfortunately, the UML standard states that > each tuple may appear at most once in an association. That seems clear enough. The UML chooses 2a. Or does it? An association end may be marked "ordered"; then we get interpretation 3d (sic!). > Guy Genilloud, in his contribution to the UML Semantics FAQ, in the 1999 > Ecoop workshop, proposes to remove this constraint, defining an association > as a bag of tuples (a bag instead of a set). This could work, but the > analogy between links and objects would become weaker. Should we say also > that a class is a bag of objects? Personally, I prefer the other solution. > If we admit that classes are sets of objects, allowing different objects to > have the same data content, why not admitting that associations are sets of > links, allowing in an analogous way that different links have the same data > content (that is, link or relate the same objects)? This argument proposes an analogy between classes and associations, and therefore between objects and links: since objects have identities, so should links; or if links don't have them (we use bags of tuples) then neither should objects (we use bags of records). I think the analogy is false. We don't have object identities for fun; they are absolutely necessary - but frankly, they are a pain. If it were possible to do away with them and use bags of objects, we'd do that in a flash. Why voluntarily introduce the same problems for links? > I think we need that > links have an identity that is different from the composition of the > identities of the related objects, in order to manipulate them (the links) > more clearly. Here you suggest that links are sometimes explicitly manipulated, i.e. should somehow be considered "first class citizens" (to take a term from programming language design). That would be a good reason to have link identities, but in what context does this occur? Certainly not on the implementation level (shudder)! > A link identity makes unnecessary to discuss whether > associations should be sets of tuples or bags of tuples. This is the argument that 1. is more powerful than 2b. True, but there is a tradeoff between power and simplicity. Daniel Jackson wrote: > > 1. a set of links is much more cumbersome to handle semantically. navigation > is no longer just relational image. Mathematical simplicity can be used to reason against 1., but not against 2b. Solutions 3x also correspond directly to well-established mathematical concepts, but navigation is slightly harder since you get sets of sets etc.; you can flatten them (see OCL) but you have to be careful there (see OCL). > 2. it's phenomenologically spurious (see chapter on "designations" in: M. > Jackson, Requirements and Specifications, Addison Wesley 1995). we won't > want an abstract model to make distinctions that we can't make in the real > world. for example, if the parent association is a set of links, we'll have > to explain what the difference is between Alice being Bob's parent under > link L1, and Alice being Bob's parent under link L2. Again, an argument against option 1. What about the other options? Can a bag of tuples (2b) be necessary to model a valid phenomenon? Probably, if we want to allow certain abstractions: I have the same student twice in different courses, when I abstract from the courses I get a bag of students. I'd hate to rule out such abstractions just because the formalisation can't deal with them. What about a list-valued function (3c/3d)? Even easier, relevant orderings exist all over the place. > 3. relations are a more basic notion, and therefore deserve to be built-in. > you can always get create a set of links explicitly using binary relations, > but hiding the links identities is harder. "More basic" - "harder": aren't those questions of taste? You follow a mathematical intuition (as do I) which makes you think relations are easy and identities are hard. Another may think "ease of modelling", draws two lines between the same objects and would like his formalisation to reflect this directly - thus, link identities. > 4. links would increase the complexity of analysis. our Alloy Analyzer can > generate snapshots and check properties of object models automatically. with > links, it would not be able to handle such large models. As noted above, there's a tradeoff between simplicity and power. You can't say "simpler is better" without arguing you don't need the power. (Okay, you did that.) > 5. what would it mean to have two distinct links that link the same objects? > this might sometimes be useful, but then you should just construct a link > object specially. in the general case, it will be tedious and obscure to > have to rule this out. That's the same argument as 3, I believe. Guy Genilloud wrote: > > But this consideration was not my contention with UML. My contention had to > do with the idea that UML should have as few constraints as possible, for > the reason that it is easy to add constraints, and impossible to remove > them... Seems to me you're too late. The deed is done. > 1- Reverse engineering: the impossibility to model in a class diagram some > object implementations. If two objects can have an unbounded number of links > between them, the set of tuples pre-constraint makes that situation > impossible to represent in a class diagram in a natural way (that is, > without introducing a third object class, that has no object instances)... I'd like an example here. Anyway, your argument is against option 2a/3a, but not necessarily in favour of 1. > 2- Not always inforceable. If you are implementing a CORBA object, how do > you make sure that you never get two links to a same object (remember that a > CORBA object may have several different identifiers, that may not be > compared for equality). An interesting point. I suspect you could model this by making the CORBA object identifier, which obviously is NOT the same as the actual object identity, a class in its own right, with an association to the real object. Exit problem. > 3- Modeling experts. Some very respectable experts do believe that having > this pre-constraint is a bad idea. For example, the EXPRESS modeling > language has both bags and sets. Of course, other experts are all in favor > of it. But my argument is that adding the set constraint is easy, while > removing it (if it is in the core of UML) is impossible. An ease-of-modelling argument against 2a/3a; not (I repeat) distinguishing between the other solutions. > 4- Intuition. When I give an example with a customer having a table > reservation at a restaurant, most people get it wrong at first. They don't > realize that the model they propose would make it impossible for a customer > to make 2 reservations for the same table at different dates... I don't see the point immediately. Are you suggesting the different dates should give rise to different links? Why not have a class "appointment" with links to customer, date and table? Overall, I don't see many arguments against solution 2b: arguments of modelling power don't seem to favour 1 over 2b (they do favour 1 over 2a) and arguments of mathematical elegance/simplicity don't seem to favour 2a over 2b (they do favour 2a over 1). UML is self-contradictory (surprise... ;-) since on the one hand it clearly imposes 2a, but in OCL all of 3x can be expressed. I myself prefer 3d, but then I don't like bidirectional associations much anyway. I'm curious to see what you folks think of this. -- Arend Rensink http://www.cs.utwente.nl/~rensink Department of Computer Science mailto:rensink@cs.utwente.nl University of Twente tel: +31 53 489 4862 P.O. Box 217, NL-7500 AE Enschede, Netherlands fax: +31 53 489 3247


Date view Thread view Subject view Author view