[Gambas-user] [slightly OT] No Problem with Gambas or MySql!
Caveat
Gambas at ...1950...
Mon Oct 7 14:04:55 CEST 2013
Thanks Randall, we got some great discussions going here. I think the
closing paragraph sums it up quite well...
"
So err on the side of caution and normalize your data until you have
the experience needed to know when to break the rules. If you always
normalize your data you will be correct 99.999991 percent of the time. That
last 0.000009 percent of the time you may never get to....
"
Of course, as I later said, my pennies example was a prime candidate for
normalisation! And, I think, the case Willy is dealing with would have
been normalised (in an ideal world) from:
| ID | Gewicht |
345 100
345 100
345 50
347 200
347 200
347 200
347 200
to:
| ID | Gewicht | Count |
345 100 2
345 50 1
347 200 4
with a Primary Key composed of ID and gewicht (weight).
But, Willy is working with a legacy system, and we don't know for sure
why someone decided to make (for example) 4 identical 347-200 rows,
instead of one 347-200 row with a count of 4. Perhaps it's a very busy
system and the designer thought having many people able to create and
delete arbitrary numbers of ID-weight rows would make it easier to
manage the counts/weights in a system with many concurrent users, rather
than having people queue up to modify the count on a normalised 347-200
row? Who knows!
Kind regards,
Caveat
On 06/10/13 20:24, Randall Morgan wrote:
> If you need some books on relational databases. I can give you private
> access to my library or digital/scanned books. They will explain things
> much better that we have here and teach you SQL. Also, know that relational
> data bases are only one type of data base. There are other types of
> databases not yet directly supported by Gambas. But 99% of databases are
> indeed relational and even those that are not typically carry along of the
> ideas from relational databases.
>
>
>
>
> On Sun, Oct 6, 2013 at 11:17 AM, Randall Morgan <rmorgan62 at ...626...> wrote:
>
>> A simple explanation as to why you do not repeat data in a rational
>> database. (Very simplistic):
>>
>> Rational or Relational data bases normalize data so that any usage of a
>> single bit of data is stored only once. This allows the data to be edited
>> in only one location and also requires the least amount of storage. By
>> centralizing the bit of data into one location you gain power. For example,
>> say you have a customer database where the customer(s) must enter their
>> city each time they place an order. The customer(s). Some customers may
>> misspell the city name, or use an abbreviation. Allowing such entries into
>> the data would make using the address information difficult. But it would
>> also make it difficult to search the database for all customers from a
>> particular city.
>>
>> If you allow the same information in multiple places, you will cause
>> situations where the data must be edited in multiple places or every
>> search/query must be written to handle an almost endless list of
>> possibilities for each value of interest.
>>
>> Say you allow multiple records for the same order. When you update the
>> order status, one will show the order with the updated status and one
>> without. Which one is correct? Do you assume the one with the lastest
>> status is correct? What if someone in the shipping department made an
>> error? Now you have confusing records.... You also have wasted space which
>> cost real money for real storage, and real cpu time to locate the correct
>> or all records, and real geo time for the user to review and decide which
>> of the multiple records is correct. The user may also have to do additional
>> work to combine information from multiple records to compile a complete
>> picture of the data.
>>
>> By enforcing data normalization (the art of not repeating data values),
>> you limit the amount of work both the cpu and the end user must do. You
>> build a system where there is only one entry to edit, or update. You
>> provide a single source and a single point of contention/resolution to
>> conflicts.
>>
>> Data normalization is to data what OOP is to programming. It is not
>> perfect, and can be taken to extremes. But it is a valuable tool and should
>> be used in both coding and data storage/retrieval. It is not the only tool
>> and there are the rare edge cases where one may need to break the rules.
>> But I can pretty much guaranty you that if you don't understand why you
>> need it, you don't yet understand when you should and should break the
>> rules. So err on the side of caution and normalize you data until you have
>> the experience needed to know when to break the rules. If you always
>> normalize your data you will be correct 99.999991 percent of the time. That
>> last 0.000009 percent of the time you will may never get to....
>>
>>
>>
>>
>> On Sun, Oct 6, 2013 at 2:55 AM, Charlie Reinl <Karl.Reinl at ...9...>wrote:
>>
>>> Am Sonntag, den 06.10.2013, 10:55 +1030 schrieb Bruce:
>>>> On Sun, 2013-10-06 at 00:58 +0200, Willy Raets wrote:
>>>>> On Sun, 2013-10-06 at 00:05 +0200, Caveat wrote:
>>>>>>> If you wanted to delete one row with values 347 200, then which
>>> row would it be?
>>>>>> It *doesn't matter*! The rows are the same, so pick a row, ANY row!
>>>>> That is exactly what I have been doing for the last 10 years.
>>>>> In VB code I just looked for the criteria in the table and deleted the
>>>>> first record that met the criteria, next update table and ready.
>>>>>
>>>>> In Gambas I tried the same, now with the table being in MySql, and it
>>>>> didn't work.
>>>>>
>>>>> It never, ever failed me in those 10 years, even without a primary
>>> key.
>>>>> Although most other tables do have primary keys and relations set up,
>>>>> there never was a need for this table to have a primary key.
>>>>> And I still do not see the need for a primary key.
>>>>>
>>>>>>> Without the primary key, you're asking MySQL to guess, and it
>>> wont.
>>>>>> Yes it will, if you use LIMIT (see my previous two mails).
>>> Otherwise, it'll happily delete all the rows that meet the criteria,
>>> whether they're duplicates or not.
>>>>> And even in MySql there is NO need for the table to have a primary key
>>>>> as the 'SELECT ... limit 1' will do what I expect, delete only 1
>>> record
>>>>> that meets the criteria.
>>>>>
>>>>>>>> "duplicate rows are and always were a mistake in SQL", C.J.Date
>>>>> Duplicate rows are a reality in real life data gathering.
>>>>> I have been working with databases since dbase somewhere in the '80's
>>>>> and I have had to deal with real life situations concerning data
>>>>> storage, not with theory for ideal models. Not saying I haven't read
>>> my
>>>>> share on relational database design and take it into account when
>>>>> designing a database.
>>>>>
>>>>>> The guy has obviously never read William Kent's "Data and Reality".
>>> It's
>>>>>> a book I'd still recommend to anyone working with data and
>>> particularly
>>>>>> databases, even though it was written 35 years ago!
>>>>>>
>>>>>> <RANT>
>>>>>> Why *must* every record be uniquely identifiable? Explain why...
>>> don't
>>>>>> just throw random snippets of relational database theory at me! If
>>> I
>>>>>> ask you to model the contents of your pockets on a database, are you
>>>>>> going to attach some artificial key to each individual penny? Why?
>>>>>> They each buy the same number of penny chews (showing my age!), and
>>> if
>>>>>> you take 3 pennies away, you've still got 3 pennies less than you
>>> had
>>>>>> before, no matter how carefully you select those 3 pennies!
>>>>>> </RANT>
>>>>>>
>>>>>> <HINT>
>>>>>> A tuple is, by definition, unique. If I have 8 rows on my database
>>>>>> representing 8 real-world penny objects and I can delete 3 of
>>> them...
>>>>>> are the remaining 5 rows tuples? How closely does the database
>>> model
>>>>>> reality in this case?
>>>>>> </HINT>
>>>>> I like your penny story ;)
>>>>>
>>>>>
>>>> [ASIDE!]
>>>>
>>>> Interesting discussion, but someone recently asked me why I prefer
>>>> postgresql over mysql and this thread reminded me that I never answered.
>>>>
>>>> One of those reasons is that this whole situation cannot occur in
>>>> postgresql. It has been designed such that it cannot. Considering a
>>>> non-normalised table, i.e. one that has no natural (unique) primary key,
>>>> all rows are still capable of being operated on using the standard sql
>>>> commands. However, in that case, *all* rows that succeed the WHERE
>>>> filter clause will be acted on. Fullstop.
>>>>
>>>> Putting that another way, postgresql "doesn't really care" whether the
>>>> table has a primary key set or not. Nor does it care whether a delete or
>>>> an update query uses a unique row identifier. That sounds pretty slack,
>>>> doesn't it? Well, not at all really. It relies on the WHERE clause to
>>>> identify the set of rows to be operated on and that is all. If such a
>>>> clause identifies multiple rows then they will all be affected. So, to
>>>> continue the coin collection analogy and a theoretical sql command:
>>>> SELL COIN WHERE YEAR="1930" AND ALLOY="Cupronickel";
>>>> would sell all the coins that match. The idea that you could write a
>>>> command like:
>>>> SELL THE FIRST COIN YOU FIND WHERE YEAR="1930" AND ALLOY="Cupronickel";
>>>> really, to me brings a sense of randomness into the game ... ??? What is
>>>> the dbms supposed to imagine "THE FIRST COIN YOU FIND" to mean?
>>>>
>>>> Here in the thoroughbred racing industry, we have a somewhat similar
>>>> real world situation. All thoroughbreds have a registered name and
>>>> there are strict controls over the reuse of a name. However, horse
>>>> owners are allowed to change the registered name of the horse
>>>> occasionally and under certain conditions. All pretty good and you could
>>>> imagine that the name "Lemon Liniment" would uniquely identify a
>>>> specific horse from any of the other 61,000 we have on our database.
>>>> Further you might imagine that if the owners changed its' name to
>>>> "Cherry Zontop" then all we would have to do is, once the name change
>>>> has been registered, change that "unique" primary key. Hah ha! No such
>>>> simplicity exists on the planet.
>>>>
>>>> First off, the horse is really and truly identified by its microchip ID.
>>>> This is what the register uses as the unique identifier. However, they
>>>> don't publish that (and for good reasons if you think about it ...) and
>>>> secondly name change events are frequently if not often publicised some
>>>> time after the actual change has been registered. This is a major,
>>>> major problem, (to us, logistically not technically).
>>>>
>>>> Suppose Jack (the owner of Lemon Liniment) sells it to Jill (on
>>>> Wednesday) who re-registers it as Cherry Zontop. Jill then races it
>>>> once on the next Saturday and we see a race entrance and result appear
>>>> in the official race records for Cherry Zontop. We think that Cherry
>>>> Zontop is a "new untried" horse and add it as a new horse in the
>>>> database. On the Monday (or later) the registry publicises the name
>>>> change...
>>>>
>>>> Under MySQL merging the histories of Lemon Liniment into Cherry Zontop
>>>> was a technical nightmare that essentially required offloading and
>>>> reloading multiple sets of rows from multiple tables*. Under postgresql
>>>> it is the execution of one SQL command:
>>>> UPDATE horse SET name='Cherry Zontop', namechanged=TRUE, oldname='Lemon
>>>> Liniment' WHERE name='Lemon Liniment';
>>>>
>>>> * I have simplified the story a fair bit, there are around 18 underlying
>>>> tables that have the horse name as the foreign key or as part of the
>>>> foreign key.
>>>>
>>>> postgres, because it properly supports "ON UPDATE CASCADE" constraints
>>>> on those tables and their foreign key columns handles it all internally.
>>>>
>>>> Anyway, as I said, [ASIDE] and just one of the reasons for preferring
>>>> postgresql.
>>>>
>>>> regards
>>>> Bruce
>>> Salut Bruce,
>>>
>>> how do you look back to 'Lemon Liniment' history ?
>>>
>>>
>>> --
>>> Amicalement
>>> Charlie
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> October Webinars: Code for Performance
>>> Free Intel webinars can help you accelerate application performance.
>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
>>> from
>>> the latest Intel processors and coprocessors. See abstracts and register >
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Gambas-user mailing list
>>> Gambas-user at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/gambas-user
>>>
>>
>>
>> --
>> If you ask me if it can be done. The answer is YES, it can always be done.
>> The correct questions however are... What will it cost, and how long will
>> it take?
>>
>
>
More information about the User
mailing list