[Gambas-user] [slightly OT] No Problem with Gambas or MySql!

Sun Oct 6 20:17:47 CEST 2013

A simple explanation as to why you do not repeat data in a rational
database. (Very simplistic):

Rational or Relational data bases normalize data so that any usage of a
single bit of data is stored only once. This allows the data to be edited
in only one location and also requires the least amount of storage. By
centralizing the bit of data into one location you gain power. For example,
say you have a customer database where the customer(s) must enter their
city each time they place an order. The customer(s). Some customers may
misspell the city name, or use an abbreviation. Allowing such entries into
the data would make using the address information difficult. But it would
also make it difficult to search the database for all customers from a
particular city.

If you allow the same information in multiple places, you will cause
situations where the data must be edited in multiple places or every
search/query must be written to handle an almost endless list of
possibilities for each value of interest.

Say you allow multiple records for the same order. When you update the
order status, one will show the order with the updated status and one
without. Which one is correct? Do you assume the one with the lastest
status is correct? What if someone in the shipping department made an
error? Now you have confusing records.... You also have wasted space which
cost real money for real storage, and real cpu time to locate the correct
or all records, and real geo time for the user to review and decide which
of the multiple records is correct. The user may also have to do additional
work to combine information from multiple records to compile a complete
picture of the data.

By enforcing data normalization (the art of not repeating data values), you
limit the amount of work both the cpu and the end user must do. You build a
system where there is only one entry to edit, or update. You provide a
single source and a single point of contention/resolution to conflicts.

Data normalization is to data what OOP is to programming. It is not
perfect, and can be taken to extremes. But it is a valuable tool and should
be used in both coding and data storage/retrieval. It is not the only tool
and there are the rare edge cases where one may need to break the rules.
But I can pretty much guaranty you that if you don't understand why you
need it, you don't yet understand when you should and should break the
rules. So err on the side of caution and normalize you data until you have
the experience needed to know when to break the rules. If you always
normalize your data you will be correct 99.999991 percent of the time. That
last 0.000009 percent of the time you will may never get to....

On Sun, Oct 6, 2013 at 2:55 AM, Charlie Reinl <Karl.Reinl at ...9...> wrote:

> Am Sonntag, den 06.10.2013, 10:55 +1030 schrieb Bruce:
> > On Sun, 2013-10-06 at 00:58 +0200, Willy Raets wrote:
> > > On Sun, 2013-10-06 at 00:05 +0200, Caveat wrote:
> > > > > If you wanted to delete one row with values 347 200, then which
> row would it be?
> > > >
> > > > It *doesn't matter*!  The rows are the same, so pick a row, ANY row!
> > >
> > > That is exactly what I have been doing for the last 10 years.
> > > In VB code I just looked for the criteria in the table and deleted the
> > > first record that met the criteria, next update table and ready.
> > >
> > > In Gambas I tried the same, now with the table being in MySql, and it
> > > didn't work.
> > >
> > > It never, ever failed me in those 10 years, even without a primary key.
> > > Although most other tables do have primary keys and relations set up,
> > > there never was a need for this table to have a primary key.
> > > And I still do not see the need for a primary key.
> > >
> > > >
> > > > > Without the primary key, you're asking MySQL to guess, and it wont.
> > > >
> > > > Yes it will, if you use LIMIT (see my previous two mails).
>  Otherwise, it'll happily delete all the rows that meet the criteria,
> whether they're duplicates or not.
> > > >
> > >
> > > And even in MySql there is NO need for the table to have a primary key
> > > as the 'SELECT ... limit 1' will do what I expect, delete only 1 record
> > > that meets the criteria.
> > >
> > > > > >"duplicate rows are and always were a mistake in SQL", C.J.Date
> > >
> > > Duplicate rows are a reality in real life data gathering.
> > > I have been working with databases since dbase somewhere in the '80's
> > > and I have had to deal with real life situations concerning data
> > > storage, not with theory for ideal models. Not saying I haven't read my
> > > share on relational database design and take it into account when
> > > designing a database.
> > >
> > > >
> > > > The guy has obviously never read William Kent's "Data and Reality".
> It's
> > > > a book I'd still recommend to anyone working with data and
> particularly
> > > > databases, even though it was written 35 years ago!
> > > >
> > > > <RANT>
> > > > Why *must* every record be uniquely identifiable?  Explain why...
> don't
> > > > just throw random snippets of relational database theory at me!  If I
> > > > ask you to model the contents of your pockets on a database, are you
> > > > going to attach some artificial key to each individual penny?  Why?
> > > > They each buy the same number of penny chews (showing my age!), and
> if
> > > > you take 3 pennies away, you've still got 3 pennies less than you had
> > > > before, no matter how carefully you select those 3 pennies!
> > > > </RANT>
> > > >
> > > > <HINT>
> > > > A tuple is, by definition, unique.  If I have 8 rows on my database
> > > > representing 8 real-world penny objects and I can delete 3 of them...
> > > > are the remaining 5 rows tuples?  How closely does the database model
> > > > reality in this case?
> > > > </HINT>
> > >
> > > I like your penny story ;)
> > >
> > >
> >
> > [ASIDE!]
> >
> > Interesting discussion, but someone recently asked me why I prefer
> > postgresql over mysql and this thread reminded me that I never answered.
> >
> > One of those reasons is that this whole situation cannot occur in
> > postgresql.  It has been designed such that it cannot. Considering a
> > non-normalised table, i.e. one that has no natural (unique) primary key,
> > all rows are still capable of being operated on using the standard sql
> > commands. However, in that case, *all* rows that succeed the WHERE
> > filter clause will be acted on. Fullstop.
> >
> > Putting that another way, postgresql "doesn't really care" whether the
> > table has a primary key set or not. Nor does it care whether a delete or
> > an update query uses a unique row identifier.  That sounds pretty slack,
> > doesn't it?  Well, not at all really.  It relies on the WHERE clause to
> > identify the set of rows to be operated on and that is all.  If such a
> > clause identifies multiple rows then they will all be affected. So, to
> > continue the coin collection analogy and a theoretical sql command:
> > SELL COIN WHERE YEAR="1930" AND ALLOY="Cupronickel";
> > would sell all the coins that match.  The idea that you could write a
> > command like:
> > SELL THE FIRST COIN YOU FIND WHERE YEAR="1930" AND ALLOY="Cupronickel";
> > really, to me brings a sense of randomness into the game ... ??? What is
> > the dbms supposed to imagine "THE FIRST COIN YOU FIND" to mean?
> >
> > Here in the thoroughbred racing industry, we have a somewhat similar
> > real world situation.  All thoroughbreds have a registered name and
> > there are strict controls over the reuse of a name.  However, horse
> > owners are allowed to change the registered name of the horse
> > occasionally and under certain conditions. All pretty good and you could
> > imagine that the name "Lemon Liniment" would uniquely identify a
> > specific horse from any of the other 61,000 we have on our database.
> > Further you might imagine that if the owners changed its' name to
> > "Cherry Zontop" then all we would have to do is, once the name change
> > has been registered, change that "unique" primary key.  Hah ha!  No such
> > simplicity exists on the planet.
> >
> > First off, the horse is really and truly identified by its microchip ID.
> > This is what the register uses as the unique identifier.  However, they
> > don't publish that (and for good reasons if you think about it ...) and
> > secondly name change events are frequently if not often publicised some
> > time after the actual change has been registered.  This is a major,
> > major problem, (to us, logistically not technically).
> >
> > Suppose Jack (the owner of Lemon Liniment) sells it to Jill (on
> > Wednesday) who re-registers it as Cherry Zontop.  Jill then races it
> > once on the next Saturday and we see a race entrance and result appear
> > in the official race records for Cherry Zontop.  We think that Cherry
> > Zontop is a "new untried" horse and add it as a new horse in the
> > database.  On the Monday (or later) the registry publicises the name
> > change...
> >
> > Under MySQL merging the histories of Lemon Liniment into Cherry Zontop
> > was a technical nightmare that essentially required offloading and
> > reloading multiple sets of rows from multiple tables*.  Under postgresql
> > it is the execution of one SQL command:
> > UPDATE horse SET name='Cherry Zontop', namechanged=TRUE, oldname='Lemon
> > Liniment' WHERE name='Lemon Liniment';
> >
> > * I have simplified the story a fair bit, there are around 18 underlying
> > tables that have the horse name as the foreign key or as part of the
> > foreign key.
> >
> > postgres, because it properly supports "ON UPDATE CASCADE" constraints
> > on those tables and their foreign key columns handles it all internally.
> >
> > Anyway, as I said, [ASIDE] and just one of the reasons for preferring
> > postgresql.
> >
> > regards
> > Bruce
>
> Salut Bruce,
>
> how do you look back to 'Lemon Liniment' history ?
>
>
> --
> Amicalement
> Charlie
>
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
> _______________________________________________
> Gambas-user mailing list
> Gambas-user at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gambas-user
>

-- 
If you ask me if it can be done. The answer is YES, it can always be done.
The correct questions however are... What will it cost, and how long will
it take?