[Gambas-user] Help needed from regexp gurus

Fernando Cabral fernandojosecabral at ...626...
Sat Jun 17 19:38:49 CEST 2017


Still beating my head against the wall due to my lack of knowledge about
the PCRE methods and properties... Because of this, I have progressed not
only very slowly but also -- I fell -- in a very inelegant way. So perhaps
you guys who are more acquainted with PCRE might be able to hint me on a
better solution.

I want to search a long string that can contain a sentence, a paragraph or
even a full text. I wanna find and isolate every word it contains. A word
is defined as any sequence of alphabetic characters followed by a
non-alphatetic character.

The sample code bellow does work, but I don't feel it is as elegant and as
fast as it could and should be.  Especially the way I am traversing the
string from the beginning to the end. It looks awkward and slow. There must
be a more efficient way, like working only with offsets and lengths instead
of copying the string again and again.

Dim Alphabetics as string "abc...zyzABC...ZYZ"
Dim re as RegExp
Dim matches as String []
Dim RawText as String

re.Compile("([" & Alphabetics & "]+?)([^" & Alphabetics & "]+)",
RegExp.utf8)
RawText = "abc12345def ghi jklm mno p1"

Do While RawText
     re.Exec(RawText)
     matches.add(re[1].text)
     RawText = String.Mid(RawText, String.Len(re.text) + 1)
Loop

For i = 0 To matches.Count - 1
  Print matches[i]
Next


Above code correctly finds "abc, def, ghi, jlkm, mno, p". But the tricks I
have used are cumbersome (like advancing with string.mid() and resorting to
re[1].text and re.text.

-- 
Fernando Cabral
Blogue: http://fernandocabral.org
Twitter: http://twitter.com/fjcabral
e-mail: fernandojosecabral at ...626...
Facebook: f at ...3654...
Telegram: +55 (37) 99988-8868
Wickr ID: fernandocabral
WhatsApp: +55 (37) 99988-8868
Skype:  fernandojosecabral
Telefone fixo: +55 (37) 3521-2183
Telefone celular: +55 (37) 99988-8868

Enquanto houver no mundo uma só pessoa sem casa ou sem alimentos,
nenhum político ou cientista poderá se gabar de nada.



More information about the User mailing list