[Gambas-user] help with data import into postgres - funny characters> import fails

Robert Moss the.at.robert at ...626...
Fri Feb 27 05:16:36 CET 2009


Sorry to post two messages, but I just thought of something. Maybe they are
nulls, and maybe postgres has a problem importing nulls into a string
datatype. Just a thought to look into. DEFINITELY look at the hex
representation (ideally next to the character representation) and see if you
can identify what the characters are. UTF characters often take this form
(on little-endian[first], hi-order 2nd) 24 04   |   52 04    nulls
obviously: 00 00, and bitshifted:  40 45 | 20 42.

On Thu, Feb 26, 2009 at 8:11 PM, Robert Moss <the.at.robert at ...626...>wrote:

> Is the error a Gambas error or a Postgres error? It might be they are using
> a charset with special keys spefic to their industry, which might indicate
> more data, or, more likely, the data is errorenous and got bit-shifted, and
> now, the high-order bit might be too large to be part of the UTF-8 charset.
> I would see if you can get the binary/hex of those keys, and shift the bits
> each way by one, (try two also), and then if one of them makes sense, you
> know its a communication error (that got uncorrected??? doesn't sound like
> TCP).
>
> Since you've never encountered these characters, the most likely scenario
> is data corruption. Try to correct it with a bitshift, and if that works,
> write code to search for high-order bits that are incorrect, and shift the
> sequence appropriately.
>
> Let me know what you think ^_^
>
> -Robert
>
>
> On Thu, Feb 26, 2009 at 5:57 PM, richard terry <rterry at ...1946...>wrote:
>
>> Hi ,
>>
>> I'm trying to import the hl7 from a local radiology provider:
>> It's got funny characters in the file and postgres baulks when I go to
>> save
>> the line with the message attatched, but it basically says - invalid byte
>> sequence for encoding UTF8.
>>
>> My postgres database is created with UTF8 encoding
>>
>> CREATE DATABASE "27Feb09"
>>  WITH OWNER = richard
>>       ENCODING = 'UTF8';
>>
>> snipped is the offending bit:
>>
>> with a maximum AP diameter\.br\of 1.0cm and a less than 50% stenosis in
>> its
>> proximal segment.  \.br\\.br\Continued�����/2\.br\In the mid superficial
>> femoral artery there is staccato flow and an
>>
>> I've imported thousands and thousands of lines from a pathology provider
>>  and
>> never encountered these characters.
>>
>> Thanks in anticipation.
>>
>>
>> ------------------------------------------------------------------------------
>> Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco,
>> CA
>> -OSBC tackles the biggest issue in open source: Open Sourcing the
>> Enterprise
>> -Strategies to boost innovation and cut costs with open source
>> participation
>> -Receive a $600 discount off the registration fee with the source code:
>> SFAD
>> http://p.sf.net/sfu/XcvMzF8H
>> _______________________________________________
>> Gambas-user mailing list
>> Gambas-user at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/gambas-user
>>
>
>



More information about the User mailing list