Removing duplicate entries


WV- Mike
 

I have a list of I want to remove duplicate entries in.
Can anyone help with a clip to accomplish this?
See: https://EpicRoadTrips.us/2021/rhodie_list/ww_rhodies_list.txt

Examples:

A. Bedford
A. BedfordLowinsky -5 10yrs: 6'
unknown x ponticum

Aberham Lincoln
Aberham Lincoln
Acclaim
AcclaimDexter Hybrid
[Pygmalion X haematodes x wellfleet]
-5 10yrs: 5'

Reformatted to:

A. Bedford Lowinsky -5 10yrs: 6'
unknown x ponticum

Aberham Lincoln

Acclaim Dexter Hybrid
[Pygmalion X haematodes x wellfleet]
-5 10yrs: 5'

The duplicate names represent the labels from photos and the text label for the photos.
See:  https://www.rhodiesrus.com/plants-c4fi

Thanks,

--
Mike Breiding
www.EpicRoadTrips.us


John Shotsky
 

You can sort them using the menu - Modify lines, sort ascending.
Then, you can remove dupes with the following:
^!Jump 1
^!Replace "^(.+\R)\K\1" >> "" ARSW
Regards,
John

-----Original Message-----
From: Clips@Notetab.groups.io <Clips@Notetab.groups.io> On Behalf Of WV- Mike
Sent: Sunday, January 31, 2021 6:56 AM
To: Clips@Notetab.groups.io
Subject: [NTB-Clps] Removing duplicate entries

I have a list of I want to remove duplicate entries in.
Can anyone help with a clip to accomplish this?
See: https://EpicRoadTrips.us/2021/rhodie_list/ww_rhodies_list.txt

Examples:

A. Bedford
A. BedfordLowinsky -5 10yrs: 6'
unknown x ponticum

Aberham Lincoln
Aberham Lincoln
Acclaim
AcclaimDexter Hybrid
[Pygmalion X haematodes x wellfleet]
-5 10yrs: 5'

Reformatted to:

A. Bedford Lowinsky -5 10yrs: 6'
unknown x ponticum

Aberham Lincoln

Acclaim Dexter Hybrid
[Pygmalion X haematodes x wellfleet]
-5 10yrs: 5'

The duplicate names represent the labels from photos and the text label for the photos.
See: https://www.rhodiesrus.com/plants-c4fi

Thanks,

--
Mike Breiding
www.EpicRoadTrips.us


WV- Mike
 

On 1/31/2021 8:06 AM, John Shotsky wrote:
You can sort them using the menu - Modify lines, sort ascending. Then, you can remove dupes with the following: ^!Jump 1 ^!Replace "^(.+\R)\K\1" >> "" ARSW Regards, John
Hi John,
I ran the clip after sorting and got these results for the first several lines:

'Anne Hardgrove' X (#2 red(T. Ring) same as 'Anna Delp x 'Delp's Cindy Lou)
'Arctic Gold' X 'Hardgrove's Deepest Yellow'
'Degram Group' X 'Atrier Group'
'Delp's Small Fry' X 'Hi Tech'
'Diane' X 'Gomer Waterer'
'Everestianum' X 'Hindustan'
'Gletschernacht'
'Goldbukett' X 'Nippon'

Without sorting I got these results:

A. Bedford
A. BedfordLowinsky -5 10yrs: 6'
unknown x ponticum

Aberham Lincoln
Acclaim
AcclaimDexter Hybrid
[Pygmalion X haematodes x wellfleet]
-5 10yrs: 5'

Thanks,
-Mike


-----Original Message----- From: Clips@Notetab.groups.io <Clips@Notetab.groups.io> On Behalf Of WV- Mike Sent: Sunday, January 31, 2021 6:56 AM To: Clips@Notetab.groups.io Subject: [NTB-Clps] Removing duplicate entries I have a list of I want to remove duplicate entries in. Can anyone help with a clip to accomplish this? See: https://EpicRoadTrips.us/2021/rhodie_list/ww_rhodies_list.txt Examples: A. Bedford A. BedfordLowinsky -5 10yrs: 6' unknown x ponticum Aberham Lincoln Aberham Lincoln Acclaim AcclaimDexter Hybrid [Pygmalion X haematodes x wellfleet] -5 10yrs: 5' Reformatted to: A. Bedford Lowinsky -5 10yrs: 6' unknown x ponticum Aberham Lincoln Acclaim Dexter Hybrid [Pygmalion X haematodes x wellfleet] -5 10yrs: 5' The duplicate names represent the labels from photos and the text label for the photos. See: https://www.rhodiesrus.com/plants-c4fi Thanks, -- Mike Breiding www.EpicRoadTrips.us
-- Mike Breiding www.EpicRoadTrips.us


Flo
 

Mike,

I have tested this...

; Add an empty line between groups
^!Replace "^(.+)(\R\1.*)" >> "\r\n$0" WARS
; Remove first line in a group
^!Replace "^(.+)(\R\1.*)" >> "" WARS1
; Reduce empty lines
^!Replace "^\R{2,}" >> "\r\n" WARS

against your complete ww_rhodies_list.txt (1,292 quite irregular lines).

By "group" I mean a line being followed by a complete or partial duplicate. NT counts 391 groups like that in your file.

Maybe this will help to get a little closer to a solution of your task.

Regards,
Flo


WV- Mike
 

On 2/1/2021 6:09 AM, Flo wrote:
Mike, I have tested this... ; Add an empty line between groups ^!Replace "^(.+)(\R\1.*)" >> "\r\n$0" WARS ; Remove first line in a group ^!Replace "^(.+)(\R\1.*)" >> "" WARS1 ; Reduce empty lines ^!Replace "^\R{2,}" >> "\r\n" WARS against your complete ww_rhodies_list.txt (1,292 quite irregular lines). By "group" I mean a line being followed by a complete or partial duplicate. NT counts 391 groups like that in your file. Maybe this will help to get a little closer to a solution of your task. Regards, Flo
Thanks, Flo.
That did most of the "clean-up" work and I can take care of the rest manually.
-Mike


-- Mike Breiding www.EpicRoadTrips.us


Art Kocsis
 

On 01-31-2021 07:06, John Shotsky wrote:
You can sort them using the menu - Modify lines, sort ascending.
Then, you can remove dupes with the following:
^!Jump 1
^!Replace "^(.+\R)\K\1" >> "" ARSW
A little late to the party but I thought John's proposed solution was quite clever despite it having a minor mistake [the line terminator "\R" needs to be excluded from the first capturing group (which will be retained), and included with the subsequent matching pattern (which will be deleted).

His (corrected) one-liner RegX pattern: "^(.+)\K\R\1"
removes all partial and full duplicates,
maintains original line spacing
does not require a sort
does not require a clip - just stick it in the F&R dialog, replace with the Nul string and click "Replace All".

A neat little technique to stash in your programmer's toolkit. Thanks John.

Art

Just in case you're wondering how it works:
^ Start search at BOL (beginning of line)
( Start RegX group definition (implicitly named "\1")
.+ One or more (any) characters (stop at next criteria)
) End of group "\1"
\K Reset search (but remember group character string)
\R Line terminator (stopping criteria for group "\1")
\1 Back reference for group "\1" string (Note: Captured string, not pattern)

Create a RegX group of the entirety of the next text line (excluding line terminator)
Disregard the captured string & reset the search (but remember the group string)
If subsequent text line begins with an identical string ("\1"):
Delete line terminator
Delete group "\1" character string on second line
If "Replace All" (A) option in effect, repeat for all subsequent lines
If "Whole Doc" (W) option in effect, start from Beginning of Text (clips only)


-----Original Message-----
From: Clips@Notetab.groups.io <Clips@Notetab.groups.io> On Behalf Of WV- Mike

I have a list of I want to remove duplicate entries in.
Can anyone help with a clip to accomplish this?
See: https://EpicRoadTrips.us/2021/rhodie_list/ww_rhodies_list.txt

Examples:

A. Bedford
A. BedfordLowinsky -5 10yrs: 6'
unknown x ponticum

Aberham Lincoln
Aberham Lincoln
Acclaim
AcclaimDexter Hybrid


WV- Mike
 

On 2/3/2021 3:42 AM, Art Kocsis via groups.io wrote:
On 01-31-2021 07:06, John Shotsky wrote:
You can sort them using the menu - Modify lines, sort ascending.
Then, you can remove dupes with the following:
^!Jump 1
^!Replace "^(.+\R)\K\1" >> "" ARSW
A little late to the party but I thought John's proposed solution was quite clever despite it having a minor mistake [the line terminator "\R" needs to be excluded from the first capturing group (which will be retained), and included with the subsequent matching pattern (which will be deleted).

His (corrected) one-liner RegX pattern: "^(.+)\K\R\1"
   removes all partial and full duplicates,
   maintains original line spacing
   does not require a sort
   does not require a clip - just  stick it in the F&R dialog, replace with the Nul string and click "Replace All".

A neat little technique to stash in your programmer's toolkit. Thanks John.
Art
Thanks for this alternate way to modify the list.
Flo's clip worked well for me since it seperated the list into "groups" which makes it easier to read.
See:
https://EpicRoadTrips.us/2021/rhodie_list/ww_rhodies_list.txt

WV-Mike
=====
Just in case you're wondering how it works:
   ^   Start search at BOL (beginning of line)
   (   Start RegX group definition (implicitly named "\1")
   .+  One or more (any) characters (stop at next criteria)
   )   End of group "\1"
   \K  Reset search (but remember group character string)
   \R  Line terminator (stopping criteria for group "\1")
   \1  Back reference for group "\1" string (Note: Captured string, not pattern)

Create a RegX group of the entirety of the next text line (excluding line terminator)
Disregard the captured string & reset the search (but remember the group string)
If subsequent text line begins with an identical string ("\1"):
   Delete line terminator
   Delete group "\1" character string on second line
If "Replace All" (A) option in effect, repeat for all subsequent lines
If "Whole Doc" (W) option in effect, start from Beginning of Text (clips only)


-----Original Message-----
From: Clips@Notetab.groups.io <Clips@Notetab.groups.io> On Behalf Of WV- Mike

I have a list of I want to remove duplicate entries in.
Can anyone help with a clip to accomplish this?
See: https://EpicRoadTrips.us/2021/rhodie_list/ww_rhodies_list.txt

Examples:

A. Bedford
A. BedfordLowinsky -5 10yrs: 6'
unknown x ponticum

Aberham Lincoln
Aberham Lincoln
Acclaim
AcclaimDexter Hybrid



-- Mike Breiding www.EpicRoadTrips.us


Art Kocsis
 

On 03-02-2021 07:53, WV- Mike wrote:
Thanks for this alternate way to modify the list.
Flo's clip worked well for me since it seperated the list into "groups" which makes it easier to read.
See:
https://EpicRoadTrips.us/2021/rhodie_list/ww_rhodies_list.txt
No argument. I just wanted to highlight the technique and give credit where credit is due.

Also, as an old assembly language programmer/mathematician I tend to appreciate and strive for the compact elegant solution - sometimes at the expense of timeliness. A direct opposite of Axel who's figure of merit is what is the quickest way to something that works. His approach yields productivity, mine give me satisfaction. Both are valid.

Art