[NTB-Bsic] replace first occurrence of a string in each line


John Shotsky
 

Thomas,
(I added the clips group to your request, as questions of regex should usually be directed there.)
In reading your request originally, I noted the double commas in the later part of your example. I assume that is/was simply an error, and that single commas are what is expected, since a double comma is not valid most cases. If something needs to be done to 'fix' the double comma issue, that would require a bit more code. In fact, there are multiple cases where the csv is in error in terms of spaces, commas, double quotes. There is a relatively simple case to just 'fix' all the csv which would also fix the issue you present.
So, my first observation is that the target bit of text always ends with: quote, comma, space, no quotes or further commas, xls, comma, quote:
Attributes", Common 397 June 2010.xls,"E:\pathname
You only want to change the first instance in each line.
So, my solution would be:
Capture only the first instance of the desired text and surround it with quotes. Remove unneeded space at the beginning. This solution requires that leading space being present to work. If it's not there, a small change would be required. (add a '?' after the space following the \K)
---
^!Replace "\",\K ([^\r\n,"]+?xls)(?=,\")" >> "\"$1\"" ARSW
---
You can't see the space after the \K in email, but it's there. This regex simply says find the first part, but don't capture it. Find the target part and capture it. Find the last part but don't capture it.
Replace the target part with the target part surrounded with quotes.
Your request mentioned to only 'fix' one end of the target code, but that would probably not be correct, as the target text should probably be surrounded by quotes.
However, to just add that one quote following the .xls, the following would work:
^!Replace "\", [^\r\n,"]+?xls\K(?=,\")" >> "\"" ARSW
Nothing is captured, but the quote is added.
Regex is very rich, and usually provides multiple methods of accomplishing a thing. I make HEAVY use of both the \K and the assertion, ensuring that my code nearly always targets only what is intended. One should always observe the problem carefully, as it is easy to miss some important element of the goal. In this instance, the questions would be:
Is that first space always there, and should it be deleted if so?
Is only the one added double quote wanted, or should that particular text be surrounded with quotes?
Are the other instances of 'broken' csv important at all?
Regards,
John

-----Original Message-----
From: Basic@Notetab.groups.io <Basic@Notetab.groups.io> On Behalf Of Thomas Gruber
Sent: Sunday, December 29, 2019 5:47 AM
To: Basic@Notetab.groups.io
Subject: [NTB-Bsic] replace first occurrence of a string in each line

Hi,
I need to replace a specific string in a text document, but only the first occurrence of it in each line. Can that be done using find/replace, using RegEx (or even without it?)? Or with a clip?
My problem in detail:
I have a .csv file generated by a tool (Treesize Pro), that uses "," (comma) as a delimiter as usual, but doesn't put the first item in the line into surrounding quotes (it does it for the following items). Unfortunately sometimes the first item itself contains commas in the string, which corrupts the structure of course. so I want to put the first item into surrounding quotes. I managed to put an initial double quote character into each line, but so far didn't manage the ending double quote. I want to search for the first occurrence of
," (comma followed by double quote)
and replace it by
"," (double quote comma double quote)
but only once (first time) in each line.
Preferably without a clib, but if this can't be done by RegEx then a clip would of course also be fine.
Any help appreciated
Thomas

Here's a sample of the heading line plus 2 lines from the actual file (contents modified to shorten and simplify it):

"Name","Path","Size","Last Change","Last Access","File Type","Owner","Attributes", Common 397 June 2010.xls,"E:\pathname","30,005.0 KB",7/30/2010,6/24/2015,".xls (XLS File)","TengD",, Common 397 June 2010.xls,"E:\pathname2","30,005.0 KB",7/30/2010,8/21/2019,".xls (XLS File)","Administrators",,

and this is what I want to get:

"Name","Path","Size","Last Change","Last Access","File Type","Owner","Attributes", "Common 397 June 2010.xls","E:\pathname","30,005.0 KB",7/30/2010,6/24/2015,".xls (XLS File)","TengD",, "Common 397 June 2010.xls","E:\pathname2","30,005.0 KB",7/30/2010,8/21/2019,".xls (XLS File)","Administrators",,


Thomas Gruber
 

Hi John,
I wasn’t aware that RegEx questions were handled in the Clips group - thanks for adding it.
I’ve now removed the „Basic“ group from this trail, to avoid double posting.
There are no unnecessary double commas in the example - I think some formatting (line breaks) got lost through email. If you have a look at the original post on the groups web site, you’ll see, it had 3 lines in the example, and then another 3 lines showing what I wanted to achieve. Here it is again:

--------------
Here's a sample of the heading line plus 2 lines from the actual file (contents modified to shorten and simplify it):

"Name","Path","Size","Last Change","Last Access","File Type","Owner","Attributes",
Common 397 June 2010.xls,"E:\pathname","30,005.0 KB",7/30/2010,6/24/2015,".xls (XLS File)","TengD",,
Common 397 June 2010.xls,"E:\pathname2","30,005.0 KB",7/30/2010,8/21/2019,".xls (XLS File)","Administrators",,

and this is what I want to get:

"Name","Path","Size","Last Change","Last Access","File Type","Owner","Attributes",
"Common 397 June 2010.xls","E:\pathname","30,005.0 KB",7/30/2010,6/24/2015,".xls (XLS File)","TengD",,
"Common 397 June 2010.xls","E:\pathname2","30,005.0 KB",7/30/2010,8/21/2019,".xls (XLS File)","Administrators",,
———————

The only double commas are at the end of each line, and they mean that there’s an empty column (Attributes), which is correct CSV syntax. So I have no problem with those. All lines also have a trailing comma, which I believe can be included or not, doesn’t make any difference.

The only problem with this file is the first field (Name) - which isn’t enclosed in double quotes, but should be as the field value itself can contain commas (yes, we have users who put commas into file names :-( ). As it’s generated by a utility program I have no influence on the structure, have to amend it once it has been generated.

I managed to generate the leading double quotes in each line using RegEx, but not the ones closing the first field.
The RegEx expressions Axel and Art sent cover this (haven’t yet tested Art’s solution but I’m sure it will work). So the problem is solved, thanks a lot.
Kind regards
Thomas

Am 30.12.2019 um 15:21 schrieb John Shotsky <jshotsky@comcast.net>:

Thomas,
(I added the clips group to your request, as questions of regex should usually be directed there.)
In reading your request originally, I noted the double commas in the later part of your example. I assume that is/was simply an error, and that single commas are what is expected, since a double comma is not valid most cases. If something needs to be done to 'fix' the double comma issue, that would require a bit more code. In fact, there are multiple cases where the csv is in error in terms of spaces, commas, double quotes. There is a relatively simple case to just 'fix' all the csv which would also fix the issue you present.
So, my first observation is that the target bit of text always ends with: quote, comma, space, no quotes or further commas, xls, comma, quote:
Attributes", Common 397 June 2010.xls,"E:\pathname
You only want to change the first instance in each line.
So, my solution would be:
Capture only the first instance of the desired text and surround it with quotes. Remove unneeded space at the beginning. This solution requires that leading space being present to work. If it's not there, a small change would be required. (add a '?' after the space following the \K)
---
^!Replace "\",\K ([^\r\n,"]+?xls)(?=,\")" >> "\"$1\"" ARSW
---
You can't see the space after the \K in email, but it's there. This regex simply says find the first part, but don't capture it. Find the target part and capture it. Find the last part but don't capture it.
Replace the target part with the target part surrounded with quotes.
Your request mentioned to only 'fix' one end of the target code, but that would probably not be correct, as the target text should probably be surrounded by quotes.
However, to just add that one quote following the .xls, the following would work:
^!Replace "\", [^\r\n,"]+?xls\K(?=,\")" >> "\"" ARSW
Nothing is captured, but the quote is added.
Regex is very rich, and usually provides multiple methods of accomplishing a thing. I make HEAVY use of both the \K and the assertion, ensuring that my code nearly always targets only what is intended. One should always observe the problem carefully, as it is easy to miss some important element of the goal. In this instance, the questions would be:
Is that first space always there, and should it be deleted if so?
Is only the one added double quote wanted, or should that particular text be surrounded with quotes?
Are the other instances of 'broken' csv important at all?
Regards,
John

-----Original Message-----
From: Basic@Notetab.groups.io <Basic@Notetab.groups.io> On Behalf Of Thomas Gruber
Sent: Sunday, December 29, 2019 5:47 AM
To: Basic@Notetab.groups.io
Subject: [NTB-Bsic] replace first occurrence of a string in each line

Hi,
I need to replace a specific string in a text document, but only the first occurrence of it in each line. Can that be done using find/replace, using RegEx (or even without it?)? Or with a clip?
My problem in detail:
I have a .csv file generated by a tool (Treesize Pro), that uses "," (comma) as a delimiter as usual, but doesn't put the first item in the line into surrounding quotes (it does it for the following items). Unfortunately sometimes the first item itself contains commas in the string, which corrupts the structure of course. so I want to put the first item into surrounding quotes. I managed to put an initial double quote character into each line, but so far didn't manage the ending double quote. I want to search for the first occurrence of
," (comma followed by double quote)
and replace it by
"," (double quote comma double quote)
but only once (first time) in each line.
Preferably without a clib, but if this can't be done by RegEx then a clip would of course also be fine.
Any help appreciated
Thomas

Here's a sample of the heading line plus 2 lines from the actual file (contents modified to shorten and simplify it):

"Name","Path","Size","Last Change","Last Access","File Type","Owner","Attributes", Common 397 June 2010.xls,"E:\pathname","30,005.0 KB",7/30/2010,6/24/2015,".xls (XLS File)","TengD",, Common 397 June 2010.xls,"E:\pathname2","30,005.0 KB",7/30/2010,8/21/2019,".xls (XLS File)","Administrators",,

and this is what I want to get:

"Name","Path","Size","Last Change","Last Access","File Type","Owner","Attributes", "Common 397 June 2010.xls","E:\pathname","30,005.0 KB",7/30/2010,6/24/2015,".xls (XLS File)","TengD",, "Common 397 June 2010.xls","E:\pathname2","30,005.0 KB",7/30/2010,8/21/2019,".xls (XLS File)","Administrators",,






John Shotsky
 

Thomas,
If commas are permitted in file names, just remove the comma in the negative class: [^\r\n"]. It still won't get past that last comma.
Understanding the request, and understanding what might be encountered by regex leads to better solutions.
Regards,
John

-----Original Message-----
From: Clips@Notetab.groups.io <Clips@Notetab.groups.io> On Behalf Of Thomas Gruber
Sent: Monday, December 30, 2019 6:54 AM
To: Clips@Notetab.groups.io
Subject: Re: [NTB-Clps] [NTB-Bsic] replace first occurrence of a string in each line

Hi John,
I wasn’t aware that RegEx questions were handled in the Clips group - thanks for adding it.
I’ve now removed the „Basic“ group from this trail, to avoid double posting.
There are no unnecessary double commas in the example - I think some formatting (line breaks) got lost through email. If you have a look at the original post on the groups web site, you’ll see, it had 3 lines in the example, and then another 3 lines showing what I wanted to achieve. Here it is again:

--------------
Here's a sample of the heading line plus 2 lines from the actual file (contents modified to shorten and simplify it):

"Name","Path","Size","Last Change","Last Access","File Type","Owner","Attributes", Common 397 June 2010.xls,"E:\pathname","30,005.0 KB",7/30/2010,6/24/2015,".xls (XLS File)","TengD",, Common 397 June 2010.xls,"E:\pathname2","30,005.0 KB",7/30/2010,8/21/2019,".xls (XLS File)","Administrators",,

and this is what I want to get:

"Name","Path","Size","Last Change","Last Access","File Type","Owner","Attributes", "Common 397 June 2010.xls","E:\pathname","30,005.0 KB",7/30/2010,6/24/2015,".xls (XLS File)","TengD",, "Common 397 June 2010.xls","E:\pathname2","30,005.0 KB",7/30/2010,8/21/2019,".xls (XLS File)","Administrators",, ———————

The only double commas are at the end of each line, and they mean that there’s an empty column (Attributes), which is correct CSV syntax. So I have no problem with those. All lines also have a trailing comma, which I believe can be included or not, doesn’t make any difference.

The only problem with this file is the first field (Name) - which isn’t enclosed in double quotes, but should be as the field value itself can contain commas (yes, we have users who put commas into file names :-( ). As it’s generated by a utility program I have no influence on the structure, have to amend it once it has been generated.

I managed to generate the leading double quotes in each line using RegEx, but not the ones closing the first field.
The RegEx expressions Axel and Art sent cover this (haven’t yet tested Art’s solution but I’m sure it will work). So the problem is solved, thanks a lot.
Kind regards
Thomas


Am 30.12.2019 um 15:21 schrieb John Shotsky <jshotsky@comcast.net>:

Thomas,
(I added the clips group to your request, as questions of regex should
usually be directed there.) In reading your request originally, I noted the double commas in the later part of your example. I assume that is/was simply an error, and that single commas are what is expected, since a double comma is not valid most cases. If something needs to be done to 'fix' the double comma issue, that would require a bit more code. In fact, there are multiple cases where the csv is in error in terms of spaces, commas, double quotes. There is a relatively simple case to just 'fix' all the csv which would also fix the issue you present.
So, my first observation is that the target bit of text always ends with: quote, comma, space, no quotes or further commas, xls, comma, quote:
Attributes", Common 397 June 2010.xls,"E:\pathname You only want to
change the first instance in each line.
So, my solution would be:
Capture only the first instance of the desired text and surround it
with quotes. Remove unneeded space at the beginning. This solution
requires that leading space being present to work. If it's not there,
a small change would be required. (add a '?' after the space following
the \K)
---
^!Replace "\",\K ([^\r\n,"]+?xls)(?=,\")" >> "\"$1\"" ARSW
---
You can't see the space after the \K in email, but it's there. This regex simply says find the first part, but don't capture it. Find the target part and capture it. Find the last part but don't capture it.
Replace the target part with the target part surrounded with quotes.
Your request mentioned to only 'fix' one end of the target code, but that would probably not be correct, as the target text should probably be surrounded by quotes.
However, to just add that one quote following the .xls, the following would work:
^!Replace "\", [^\r\n,"]+?xls\K(?=,\")" >> "\"" ARSW Nothing is
captured, but the quote is added.
Regex is very rich, and usually provides multiple methods of accomplishing a thing. I make HEAVY use of both the \K and the assertion, ensuring that my code nearly always targets only what is intended. One should always observe the problem carefully, as it is easy to miss some important element of the goal. In this instance, the questions would be:
Is that first space always there, and should it be deleted if so?
Is only the one added double quote wanted, or should that particular text be surrounded with quotes?
Are the other instances of 'broken' csv important at all?
Regards,
John

-----Original Message-----
From: Basic@Notetab.groups.io <Basic@Notetab.groups.io> On Behalf Of
Thomas Gruber
Sent: Sunday, December 29, 2019 5:47 AM
To: Basic@Notetab.groups.io
Subject: [NTB-Bsic] replace first occurrence of a string in each line

Hi,
I need to replace a specific string in a text document, but only the first occurrence of it in each line. Can that be done using find/replace, using RegEx (or even without it?)? Or with a clip?
My problem in detail:
I have a .csv file generated by a tool (Treesize Pro), that uses "," (comma) as a delimiter as usual, but doesn't put the first item in the line into surrounding quotes (it does it for the following items). Unfortunately sometimes the first item itself contains commas in the string, which corrupts the structure of course. so I want to put the first item into surrounding quotes. I managed to put an initial double quote character into each line, but so far didn't manage the ending double quote. I want to search for the first occurrence of
," (comma followed by double quote)
and replace it by
"," (double quote comma double quote)
but only once (first time) in each line.
Preferably without a clib, but if this can't be done by RegEx then a clip would of course also be fine.
Any help appreciated
Thomas

Here's a sample of the heading line plus 2 lines from the actual file (contents modified to shorten and simplify it):

"Name","Path","Size","Last Change","Last Access","File
Type","Owner","Attributes", Common 397 June
2010.xls,"E:\pathname","30,005.0 KB",7/30/2010,6/24/2015,".xls (XLS
File)","TengD",, Common 397 June 2010.xls,"E:\pathname2","30,005.0
KB",7/30/2010,8/21/2019,".xls (XLS File)","Administrators",,

and this is what I want to get:

"Name","Path","Size","Last Change","Last Access","File
Type","Owner","Attributes", "Common 397 June
2010.xls","E:\pathname","30,005.0 KB",7/30/2010,6/24/2015,".xls (XLS
File)","TengD",, "Common 397 June 2010.xls","E:\pathname2","30,005.0
KB",7/30/2010,8/21/2019,".xls (XLS File)","Administrators",,






Thomas Gruber
 

Hi John,
here's the solution I tested now, which encloses the first "column" in the CSV file in double quotes, in each line. Done as a search/replace via the menu, not as a clip.

Reg.Ex.: ^\b(.*?)(,")(.*)$
Replace with: "$1","$3

the requirement for this to work: the 2nd "column" in the CSV file must be enclosed in double quotes, otherwise the search for (,") doesn't work. So this fixes a CSV file where the first "column" isn't included in double quotes, but the 2nd "column" is, by enclosing the 1st column in double quotes.
by "column" I mean a logical entity (in my case a file name), which may contain commas or other separator characters (spaces, ...).
I created this by combining Alex' find/replace solution with my own bit - thanks to you all.
Thomas


Axel Berger
 

Thomas Gruber wrote:
Done as a search/replace via the menu, not as a clip.
As I said I dislike doing that because it requires typing every time. I
forgot to elaborate:
This is not (only) because I'm lazy (which I am), but rather because every
time I type I'm apt to make mistakes. I usually test clips on short
excerpts from files. Often the first tests reveal serious blunders, but
after it works once, it works every time without fail. I can't do that
using the menu.


--
/¯\ No | Dipl.-Ing. F. Axel Berger Tel: +49/ 221/ 7771 8067
\ / HTML | Roald-Amundsen-Straße 2a Fax: +49/ 221/ 7771 8069
 X in | D-50829 Köln-Ossendorf http://berger-odenthal.de
/ \ Mail | -- No unannounced, large, binary attachments, please! --