Re: Unambiguously describing molecules... and more

From:H. S. Teoh <hsteoh@...>
Date:Sunday, March 18, 2007, 22:44
On Sun, Mar 18, 2007 at 03:29:12PM -0700, Leon Lin wrote:
> Hi, > > "H. S. Teoh" <hsteoh@...> wrote: On Fri, Mar 16, 2007 at 08:21:40PM -0400, Leon Lin wrote: > [...] > >> Can you come up with anything better that SMILES (see > >> > > >It looks pretty solid to me already. What disadvantages do you see in it > >that might need improvement? > > Perhaps this is more of an impossibility that a linguistic idea, but > can anyone think of an efficient way to describe the angles at which > the bonds are? This is probably not important or necessary except > maybe with proteins, which twist and bend even though they could be > stretched out into a long sequence of amino acids.
Ahhh, I see. That would be hard... macro-structure is very hard to capture in something that aims to describe every last detail. A more specialized system, perhaps one aimed specifically at describing proteins, would work better, I think. That way, irrelevant details such as repeating the precise structure of common amino acids all the time can be abstracted away in more convenient units. The human brain can only actively keep track of so many things simultaneously; you need to chunk off repetitive parts so that they're easier to handle.
> Going back the graph thing, if I'm not mistaken, SMILES cannot tell > if something is on the inside or outside of something else. For > example, if we had a buckyball (see > with some attachments, you > would have no idea whether these appendages were on the inside or > outside of the "ball". This difference affects how the molecule > interacts with other things.
True. But then, you're trying to represent a 3D structure in what is essentially a 1D medium. This will be non-trivial at best.
> Also, if the language is a human conlang, then readability is an > issue. A "shortcut" option could be available for describing > crystals or other things that have a repeated pattern. For example, > cubane could be described as follows: > > Cubane is a molecule with 8 carbon atoms in the structure of a cube > (with an atom at each vertex). Each carbon atom is bonded to one > hydrogen atom. > > In SMILES, cubane is C12C3C4C1C5C4C3C25. When in the midst of other > hydrocarbons, one could mistake this for another asymmetrical, > branching, molecule whose structure is difficult to remember. How > complicated would a description of a crystal be? Surely there is a > way to describe symmetrical chemical structures? I am not > critisizing SMILES because I don't think it was created for humans > to read.
[...] So your system will need some way of abstracting away repeated units, and have a consistent system for describing the 3D geometry of the structure form by these units. I don't know if anybody has done something like this, but it would surely be interesting! I'm very interested in consistent systems of describing geometric structures (esp. in >2 dimensions). T -- We are in class, we are supposed to be learning, we have a teacher... Is it too much that I expect him to teach me??? -- RL