# Re: Sets and bags / Identity of a link

Arend Rensink (rensink@cs.utwente.nl)
Wed, 24 Jan 2001 11:17:58 +0100

```
Hm. Let's see what options we have here.

1.  An association is a set of links. Each link has (at least) an identity
and a data content yielding a tuple of objects.

2a. An association is a set of object tuples.

2c. An association is a list of object tuples.

Let me throw in some more possibilities:

3a. An association is a set-valued object function - that is, a function
yielding for every object of a given "source" class (the left-hand
side of the association) a (possibly empty) set of objects of a "target"
class (the right hand side of the association)

3b. An association is a bag-valued object function.

3c. An association is a list-valued object function, where the lists may
not contain duplicates.

3d. An association is an arbitrary list-valued object function.

How does this help? We were having trouble deciding between three options,
now there are seven! (I'm not even being completist here; I could have
thrown in bags of links, lists of links and lists of tuples.) Before
everyone starts screaming though, a few observation:

- 2a and 3a are mathematically equivalent, as are 2b and 3b - so the
number of options is actually four.

- There is a hierarchy of modelling power: 3d is strictly more powerful
(= able to make more distinctions) than 3c, 1 and 3c are both more
powerful than 2b/3b, which are in turn more powerful than 2a/3a.
Options 1 and 3c, however, are mutually incomparable: both can make
distinctions the other can't.

- Options 1 and 2x favour a bidirectional interpretation of assocations,
3x a unidirectional interpretation.

- Options 3a-3c are all directly supported by OCL.

Now let's have the pros and cons.

Gonzalo Génova wrote:
>
> Similarly, the UML defines an Association as a set of tuples (p. 55).
> [...]
> Unfortunately, the UML standard states that
> each tuple may appear at most once in an association.

That seems clear enough. The UML chooses 2a. Or does it? An association
end may be marked "ordered"; then we get interpretation 3d (sic!).

> Guy Genilloud, in his contribution to the UML Semantics FAQ, in the 1999
> Ecoop workshop, proposes to remove this constraint, defining an association
> as a bag of tuples (a bag instead of a set). This could work, but the
> analogy between links and objects would become weaker. Should we say also
> that a class is a bag of objects? Personally, I prefer the other solution.
> If we admit that classes are sets of objects, allowing different objects to
> have the same data content, why not admitting that associations are sets of
> links, allowing in an analogous way that different links have the same data
> content (that is, link or relate the same objects)?

This argument proposes an analogy between classes and associations, and
therefore between objects and links: since objects have identities, so
should links; or if links don't have them (we use bags of tuples) then
neither should objects (we use bags of records).

I think the analogy is false. We don't have object identities for fun;
they are absolutely necessary - but frankly, they are a pain. If it were
possible to do away with them and use bags of objects, we'd do that in a
flash. Why voluntarily introduce the same problems for links?

> I think we need that
> links have an identity that is different from the composition of the
> identities of the related objects, in order to manipulate them (the links)
> more clearly.

Here you suggest that links are sometimes explicitly manipulated, i.e.
should somehow be considered "first class citizens" (to take a term from
programming language design). That would be a good reason to have link
identities, but in what context does this occur? Certainly not on the
implementation level (shudder)!

> A link identity makes unnecessary to discuss whether
> associations should be sets of tuples or bags of tuples.

This is the argument that 1. is more powerful than 2b. True, but there
is a tradeoff between power and simplicity.

Daniel Jackson wrote:
>
> 1. a set of links is much more cumbersome to handle semantically. navigation
> is no longer just relational image.

Mathematical simplicity can be used to reason against 1., but not against 2b.
Solutions 3x also correspond directly to well-established mathematical
concepts, but navigation is slightly harder since you get sets of sets etc.;
you can flatten them (see OCL) but you have to be careful there (see OCL).

> 2. it's phenomenologically spurious (see chapter on "designations" in: M.
> Jackson, Requirements and Specifications, Addison Wesley 1995). we won't
> want an abstract model to make distinctions that we can't make in the real
> world. for example, if the parent association is a set of links, we'll have
> to explain what the difference is between Alice being Bob's parent under

Again, an argument against option 1. What about the other options? Can a
bag of tuples (2b) be necessary to model a valid phenomenon? Probably,
if we want to allow certain abstractions: I have the same student twice
in different courses, when I abstract from the courses I get a bag of
students. I'd hate to rule out such abstractions just because the
formalisation can't deal with them. What about a list-valued function
(3c/3d)? Even easier, relevant orderings exist all over the place.

> 3. relations are a more basic notion, and therefore deserve to be built-in.
> you can always get create a set of links explicitly using binary relations,
> but hiding the links identities is harder.

"More basic" - "harder": aren't those questions of taste? You follow a
mathematical intuition (as do I) which makes you think relations are easy
and identities are hard. Another may think "ease of modelling", draws
two lines between the same objects and would like his formalisation to
reflect this directly - thus, link identities.

> 4. links would increase the complexity of analysis. our Alloy Analyzer can
> generate snapshots and check properties of object models automatically. with
> links, it would not be able to handle such large models.

As noted above, there's a tradeoff between simplicity and power. You can't
say "simpler is better" without arguing you don't need the power. (Okay,
you did that.)

> 5. what would it mean to have two distinct links that link the same objects?
> this might sometimes be useful, but then you should just construct a link
> object specially. in the general case, it will be tedious and obscure to
> have to rule this out.

That's the same argument as 3, I believe.

Guy Genilloud wrote:
>
> But this consideration was not my contention with UML. My contention had to
> do with the idea that UML should have as few constraints as possible, for
> the reason that it is easy to add constraints, and impossible to remove
> them...

Seems to me you're too late. The deed is done.

> 1- Reverse engineering: the impossibility to model in a class diagram some
> object implementations. If two objects can have an unbounded number of links
> between them, the set of tuples pre-constraint makes that situation
> impossible to represent in a class diagram in a natural way (that is,
> without introducing a third object class, that has no object instances)...

I'd like an example here. Anyway, your argument is against option 2a/3a,
but not necessarily in favour of 1.

> 2- Not always inforceable. If you are implementing a CORBA object, how do
> you make sure that you never get two links to a same object (remember that a
> CORBA object may have several different identifiers, that may not be
> compared for equality).

An interesting point. I suspect you could model this by making the CORBA
object identifier, which obviously is NOT the same as the actual object
identity, a class in its own right, with an association to the real object.
Exit problem.

> 3- Modeling experts. Some very respectable experts do believe that having
> this pre-constraint is a bad idea. For example, the EXPRESS modeling
> language has both bags and sets. Of course, other experts are all in favor
> of it. But my argument is that adding the set constraint is easy, while
> removing it (if it is in the core of UML) is impossible.

An ease-of-modelling argument against 2a/3a; not (I repeat) distinguishing
between the other solutions.

> 4- Intuition. When I give an example with a customer having a table
> reservation at a restaurant, most people get it wrong at first. They don't
> realize that the model they propose would make it impossible for a customer
> to make 2 reservations for the same table at different dates...

I don't see the point immediately. Are you suggesting the different dates
should give rise to different links? Why not have a class "appointment"
with links to customer, date and table?

Overall, I don't see many arguments against solution 2b: arguments
of modelling power don't seem to favour 1 over 2b (they do favour 1
over 2a) and arguments of mathematical elegance/simplicity don't seem
to favour 2a over 2b (they do favour 2a over 1). UML is self-contradictory
(surprise... ;-) since on the one hand it clearly imposes 2a, but in OCL
all of 3x can be expressed. I myself prefer 3d, but then I don't like
bidirectional associations much anyway.

I'm curious to see what you folks think of this.

--
Arend Rensink                              http://www.cs.utwente.nl/~rensink
Department of Computer Science                  mailto:rensink@cs.utwente.nl
University of Twente                                    tel: +31 53 489 4862
P.O. Box 217, NL-7500 AE Enschede, Netherlands          fax: +31 53 489 3247
```