[Gambas-user] Help with some parsing

Tue Feb 17 08:16:59 CET 2009

richard terry ha scritto:
> I'm importing some old data from windows. The data is exported from msAccess97 
> and some of the text fields have a carriage return of some sort in the 
> middle. The same character seems to be used by access as the end of line for 
> each record, so when I try and import it, I'm getting a truncated line, which 
> then mucks up the next line
>
> ie: there is a character in there  that here I've designated as [xx] which is 
> intrepreted as a new line
>
>
> 1 Doe|john|01/02/1950|some text in here saying something
> 2 Smith|Peter|19/02/1944|also some text [xx]
> 3 but is split onto a new line so the parser crashes
> 3 Brown|Michael|17/05/1966|but this line is ok
>
> So my quesitoni is 'how to discover what the character is ?chr$(10)(13)
> and how to eliminate those before parsing.
>   
You could open your data file with a hex editor, or use hexdump or 
similar, and look at both the correct lines and the splitted ones to see 
how they end. If their ending is different, you could read your file in 
binary mode and manage to reconstruct it by code.
But I suspect that all the lines end in the same way - CR-LF.

Another way, probably better, is to guess the number of fields a line 
should have; say, every line must have 3 fields, and 2 pipes "|". You 
read the file line by line and, if you don't find the expected number of 
pipes then the line is a continuation and must be appended to the one 
read before. In the excerpt you reported, the line number 3 does not 
contain pipes, so it should be joined with the previous.

Hope this helps - regards,

-- 
Doriano Blengino

"Listen twice before you speak.
This is why we have two ears, but only one mouth."