Help with RegEx. Finding paragraphs containing a string


joy8388608
 

I'm having trouble with the RegEx to extract paragraphs containing a string.
For the following sample text, I would like to create a document containing
three paragraphs from the R: to the blank line (2020-07-24, 2020-07-26 and 2020-08-01)

In other word, extract all paragraphs containing @T@.

I've tried (?s)R:\d{4}.*?@T@.*?(\R\R|\Z) but it matches one OR MORE
paragraphs until it finds one containing @T@.

I was planning to use GetDocListAll to grab them all and paste into a new doc.

Thanks for the RegEx help.

Joy

R:2020-07-23 Thu
Beef and 2.5mg Pred and 250mcg B12

R:2020-07-24 Fri
7:38 AM Beef
7:56 AM Beef
8:03 AM @T@ This happened again Aug 2.
Beef and 250mcg B12

R:2020-07-25 Sat
Beef and 2.5mg Pred and 250mcg B12
Very slight stomach sounds around 5-6 AM

R:2020-07-26 Sun
Beef and 250mcg B12
5:35 PM @T@ after eating

R:2020-08-01 Sat
Beef and 250mcg B12
10:55 AM @T@ hb


John Shotsky
 

One way would be to add a token to the paragraphs that contain the text you want, then use getdoc to capture them, then remove the tokens.
Another would be to delete all the groups that don't contain the @T@ so what remains are the ones you want.
The second would probably be easiest because you could delete anything starting with R:, does not contain [^@] and ends with a blank line.
Regards,
John

-----Original Message-----
From: Clips@Notetab.groups.io <Clips@Notetab.groups.io> On Behalf Of joy8388608 via groups.io
Sent: Friday, December 25, 2020 10:29 AM
To: Clips@Notetab.groups.io
Subject: [NTB-Clps] Help with RegEx. Finding paragraphs containing a string

I'm having trouble with the RegEx to extract paragraphs containing a string.
For the following sample text, I would like to create a document containing three paragraphs from the R: to the blank line (2020-07-24, 2020-07-26 and 2020-08-01)

In other word, extract all paragraphs containing @T@.

I've tried (?s)R:\d{4}.*?@T@.*?(\R\R|\Z) but it matches one OR MORE paragraphs until it finds one containing @T@.

I was planning to use GetDocListAll to grab them all and paste into a new doc.

Thanks for the RegEx help.

Joy

R:2020-07-23 Thu
Beef and 2.5mg Pred and 250mcg B12

R:2020-07-24 Fri
7:38 AM Beef
7:56 AM Beef
8:03 AM @T@ This happened again Aug 2.
Beef and 250mcg B12

R:2020-07-25 Sat
Beef and 2.5mg Pred and 250mcg B12
Very slight stomach sounds around 5-6 AM

R:2020-07-26 Sun
Beef and 250mcg B12
5:35 PM @T@ after eating

R:2020-08-01 Sat
Beef and 250mcg B12
10:55 AM @T@ hb


Axel Berger
 

John Shotsky wrote:
One way would be
I'd begin with a Join lines to get rid of the pesky paragraphs. After that
you won't need a dotall any more and Joy's Regex should work.

And best wishes for whomever those data pertain to.


--
/¯\ No | Dipl.-Ing. F. Axel Berger Tel: +49/ 221/ 7771 8067
\ / HTML | Roald-Amundsen-Straße 2a Fax: +49/ 221/ 7771 8069
 X in | D-50829 Köln-Ossendorf http://berger-odenthal.de
/ \ Mail | -- No unannounced, large, binary attachments, please! --


Flo
 

Try...

^R:(.+\R)*.*@T@.*\R(.+\R)*

Regards,
Flo


joy8388608
 

On Fri, Dec 25, 2020 at 10:55 AM, John Shotsky wrote:

One way would be to add a token to the paragraphs that contain the text you
want, then use getdoc to capture them, then remove the tokens.
Another would be to delete all the groups that don't contain the @T@ so what
remains are the ones you want.
The second would probably be easiest because you could delete anything
starting with R:, does not contain [^@] and ends with a blank line.
Thanks but I'm still stuck. Unless I'm missing something, for the first suggestion, "@T@" IS the token I want.
For the second, it's the same problem but harder because now it's a negative search.

And, to be clear, the data does contain other @ sequences such as @A@ and @F@ but those are not being considered in the match.

Ah, I love regex but it continues to stump me quite a bit.

Joy


joy8388608
 

On Fri, Dec 25, 2020 at 12:00 PM, <flo.gehrke@t-online.de> wrote:

Try...
^R:(.+\R)*.*@T@.*\R(.+\R)*
Thank you, Flo. I knew it could be done and I knew you would know how. Now I just have to study this...

I so appreciate your help. I could have done it in a clip, but this way is so much cleaner and a learning experience.

Joy