Explore all Regex open source software, libraries, packages, source code, cloud functions and APIs.

Popular New Releases in Regex

path-to-regexp

Named Capturing Groups

re2

hyperscan

Hyperscan 5.4.0

xregexp

JavaVerbalExpressions

Popular Libraries in Regex

z

by rupa doticonshelldoticon

star image 13416 doticonWTFPL

z - jump around

JSVerbalExpressions

by VerbalExpressions doticonjavascriptdoticon

star image 11978 doticonMIT

JavaScript Regular expressions made easy

Command-line-text-processing

by learnbyexample doticonshelldoticon

star image 9596 doticon

:zap: From finding text to search and replace, from sorting to beautifying text and more :art:

regexr

by gskinner doticonjavascriptdoticon

star image 7650 doticonGPL-3.0

RegExr is a HTML/JS based tool for creating, testing, and learning about Regular Expressions.

path-to-regexp

by pillarjs doticontypescriptdoticon

star image 6377 doticonMIT

Turn a path string such as `/user/:name` into a regular expression

re2

by google doticonc++doticon

star image 6180 doticonBSD-3-Clause

RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.

regulex

by CJex doticontypescriptdoticon

star image 4673 doticonMIT

:construction: Regular Expression Excited!

any-rule

by any86 doticontypescriptdoticon

star image 4312 doticonMIT

🦕 常用正则大全, 支持web / vscode / idea / Alfred Workflow多平台

hyperscan

by intel doticonc++doticon

star image 3205 doticonNOASSERTION

High-performance regular expression matching library

Trending New libraries in Regex

HaE

by gh0stkey doticonjavadoticon

star image 974 doticonApache-2.0

HaE - BurpSuite Highlighter and Extractor

regex2fat

by 8051Enthusiast doticonrustdoticon

star image 946 doticonUnlicense

Turn your favourite regex into FAT32

learn_gnuawk

by learnbyexample doticonshelldoticon

star image 748 doticonMIT

Example based guide to mastering GNU awk

regexploit

by doyensec doticonpythondoticon

star image 470 doticonApache-2.0

Find regular expressions which are vulnerable to ReDoS (Regular Expression Denial of Service)

nomino

by yaa110 doticonrustdoticon

star image 315 doticonNOASSERTION

Batch rename utility for developers

hgrep

by rhysd doticonrustdoticon

star image 314 doticonMIT

Grep with human-friendly search results

MANSPIDER

by blacklanternsecurity doticonpythondoticon

star image 268 doticon

Spider entire networks for juicy files sitting on SMB shares. Search filenames or file content - regex supported!

super-expressive-php

by bassim doticonphpdoticon

star image 259 doticonMIT

super-expressive-php is a php library that allows you to build regular expressions in almost natural language

pcre2

by PhilipHazel doticoncdoticon

star image 213 doticonNOASSERTION

PCRE2 development is now based here.

Top Authors in Regex

1

regexhq

36 Libraries

star icon641

2

jonschlinkert

30 Libraries

star icon590

3

sindresorhus

24 Libraries

star icon2348

4

mathiasbynens

20 Libraries

star icon3065

5

k4m4

15 Libraries

star icon192

6

VerbalExpressions

13 Libraries

star icon19204

7

micromatch

12 Libraries

star icon3025

8

kevva

9 Libraries

star icon461

9

google

8 Libraries

star icon9413

10

facelessuser

7 Libraries

star icon308

1

36 Libraries

star icon641

2

30 Libraries

star icon590

3

24 Libraries

star icon2348

4

20 Libraries

star icon3065

5

15 Libraries

star icon192

6

13 Libraries

star icon19204

7

12 Libraries

star icon3025

8

9 Libraries

star icon461

9

8 Libraries

star icon9413

10

7 Libraries

star icon308

Trending Kits in Regex

Regex libraries are essential for searching text, validating input in forms, or modifying wildcards to produce more specific and accurate search results. Using regex, you can parse out email addresses, URLs, phone numbers, or any other specific phrase from a larger document of text. Regex libraries were designed for this kind of search and extraction, serving as both a tool and a language understood by most software applications. Regex libraries are the most efficient and reliable way to parse text. They can be used to search, edit, and delete data with extreme precision. Regex libraries come in handy while working on text-based data such as documents, emails, scripts, etc. These libraries use regular expressions to parse text and filter out useful information from it. In this kit, we will be looking at some of the best Python regex libraries.

Re.sub is a function in Python's re-module. It allows substituting a string pattern with another string. It can replace all pattern occurrences with a specified replacement string. 


We can make different types of substitutions with re.sub are: 

Regular Expression Substitution: 

Replace a character pattern with patterns like ASCII letters or characters. 

String Replacement: 

Replace all occurrences of a string with another string. 

Numeric Substitution: 

Replace all occurrences of a number with another number. 

Character Class Substitution: 

Replace all occurrences of a character class with another character class. 


Re.sub also offers several options for matching and replacing strings. These options include: 

Case-insensitive matching: 

This makes the search case insensitive. It means that we have to match the upper and lowercase letters. 

Range matching: 

This limits the search to a certain range of characters. 

Greedy matching: 

This allows the search to match as many characters as possible. 

Regex matching:

This allows for regex patterns in the search. 

Unicode matching: 

This allows for the search to match Unicode characters. 


The re. sub-Python module helps in performing complex string manipulations and substitutions. It allows developers to search for patterns within strings. Then it can replace them with other characters or strings. It can apply formatting to strings, extract substrings, and more. Re.sub supports regular expressions, a powerful pattern-matching language. It will allow developers to work with complex patterns. It is a tool for text processing, simplifying difficult and time-consuming programming tasks. 


Also, another Python module, "itertools", is a tool collection. It helps assist in working with iterators. It provides functions that allow you to work with iterators quickly. The functions can be chain, product, and zip_longest. These functions enable you to create and manipulate iterators and process data. It will create complex programming tasks. The itertools module also makes it easier to work with generators. It will allow you to create, modify, and manipulate iterators. Using the functions in this module, you can simplify complex programming tasks. It can, in turn, help process data. 


You can make powerful string operations by understanding the use of re.sub. Re.sub is an important tool for improving your Python programming skills. It allows you to make advanced string manipulation and pattern-matching operations. Those are otherwise difficult or impossible in Python. Re.sub can extract patterns from a string, split strings into groups. You can then perform substitutions and more.


With a better understanding of re.sub, you can create complex scripts with fewer lines of code. You can do it by making your code more efficient. Additionally, we use the re.sub in web development and data analysis. So, a better understanding of using it can make you more valuable in the workplace. 


Replacing patterns is often used in text-processing tasks. It can perform search and replace operations. The operations can be data mining, data cleansing, and text mining. It helps identify patterns in large datasets and allows for easier analysis. For example, a medical researcher can use pattern replacement. It will help identify common symptoms of a particular disease. Also, a financial analyst might use pattern replacement. It can help identify recurring financial trends in the stock market. 


Regex (or Regular Expressions) is a way of defining patterns in strings in Python. It is a tool used to search, edit, and manipulate text. Regex can verify that a string contains a given pattern and validate user input. It can help extract information from a string. 


Here is an example of replacing a specific pattern using regex in Python. 

Code


In this solution, we are replacing specific pattern using regex in Python

Instructions

Follow the steps carefully to get the output easily.

  1. Install Jupyter Notebook on your computer.
  2. Open terminal and install the required libraries with following commands.
  3. Copy the code using the "Copy" button above and paste it into your IDE's Python file.
  4. Remove the last line.
  5. Add a line: print(new_s)
  6. Run the file.


I hope you found this useful. I have added the link to dependent libraries, version information in the following sections.


I found this code snippet by searching for "Replacing specific pattern using regex in Python" in kandi. You can try any such use case!

Dependent Libraries


FAQ  

What is a compiled regular expression object, and how does it work with re.sub python?  

A compiled RE object is a pre-compiled version of a regular expression pattern. We can parse and analyze it. It is faster than a regular expression pattern string. It is already pre-compiled. With re.sub python, a compiled RE object can perform a search and replace operation on a string. We can pass the compiled regular expression object with a replacement string to re.sub. It will replace all instances of the pattern in the string with the replacement string. 


How do I use the match object argument in re.sub python?  

The match object argument replaces a regular expression match object. You can use it to search and replace a specific pattern in a string. We can do it with the specified string. 


How can I access the regex match objects produced using re.sub python?  

You can use the optional argument count in the re.sub() method to limit the number of replacements. Then, we use the re.findall() method to get all the matches. 


What can support pattern matches by Python's module for regular expressions (re)?  

Python's module for regular expressions supports the patterns. It supports patterns like 

  • literal strings, 
  • wildcards, 
  • character classes, 
  • sets, 
  • repetition, 
  • groupings, 
  • anchors, 
  • look-ahead, 
  • look-behinds, and 
  • backreferences. 


How can I use re.sub to search for zero or more occurrences of a given string?  

You can use the following regex expression in re.sub to search for zero or more occurrences of a given string. This expression will match any instance of the given string, including zero occurrences. You can use (string){0,} where "string" is the string you want to search for. 


When using re, how do I determine if my data set has a matching substring?  

You can use the re.search() to find out if there is a matching substring in your data set. The re.search() function will search through the string for any substring. It matches the pattern you have specified and returns a match object if we find one. 


Does the Unicode support in Python affect how we compare strings when using re functions such as sub()?  

No, the Python Unicode support does not affect how we compare the strings. But when we use the re functions like sub(). The re functions will treat strings as raw bytes so that we can treat the Unicode characters like any other byte. 


Can I assign a group name to each matching pattern using re-functions like sub()?  

You cannot assign a group name to each matching pattern when using re-functions such as sub(). But you can retrieve the matched pattern from the group() method. 


What escape sequence should prevent errors when passing strings into functions like sub()?  

We can use the escape sequence as "\". This helps escape any special characters in a string. It might otherwise cause errors when passed into a function. 


Are there any alternative methods for accessing matches? Which made with regex patterns other than through the sub() function?  

Yes, we can use alternative methods for accessing matches with regex patterns. It includes using the findall() function, the search() function, and the split() function. The findall() function searches for all occurrences of a pattern and returns them as a list of strings. The search() function searches for the first pattern occurrence. It will then return a corresponding match object. The split() splits a string into a list of strings based on a given pattern. 

If you do not have regex that is required to run this code, you can install it by clicking on the above link and copying the pip Install command from the respective page in kandi.


You can search for any dependent library on kandi like regex

Environment Tested


I tested this solution in the following versions. Be mindful of changes when working with other versions.

  1. The solution is created in Python 3.9.6
  2. The solution is tested on re version 2.2.1


Using this solution, we are able to replace specific patterns using regex in Python

Support

  1. For any support on kandi solution kits, please use the chat
  2. For further learning resources, visit the Open Weaver Community learning page.


Trending Discussions on Regex

Escaping metacharacters in a Raku regex (like Perl's quotemeta() or \Q...\E)?

python-docx adding bold and non-bold strings to same cell in table

What Raku regex modifier makes a dot match a newline (like Perl's /s)?

How can the Raku behavior on capturing group in alternate be the same as Perl

Difference in Perl regex variable $+{name} and $-{name}

Combine 2 string columns in pandas with different conditions in both columns

Lookaround regex and character consumption

Parsing binary files in Raku

Regex to match nothing but zeroes after the first zero

python regex where a set of options can occur at most once in a list, in any order

QUESTION

Escaping metacharacters in a Raku regex (like Perl's quotemeta() or \Q...\E)?

Asked 2022-Mar-29 at 23:38

How can I escape metacharacters in a Raku regex the way I would with Perl's quotemeta function (\Q..\E)?

That is, the Perl code

1my $sentence = 'The quick brown fox jumped over the lazy dog';
2my $substring = 'quick.*?fox';
3$sentence =~ s{$substring}{big bad wolf};
4print $sentence
5

treats each of ., *, and ? as metacharacters and thus prints The big bad wolf jumped over the lazy dog. But if I change the second-to-last line to $sentence =~ s{\Q$substring\E}{big bad wolf};, then Perl treats .*? as literal characters and thus prints The quick brown fox jumped over the lazy dog.

How can I treat characters literally in a Raku regex?

ANSWER

Answered 2022-Feb-10 at 00:03
Your question's answer:

You can treat characters in a Raku regex literally by surrounding them with quotes (e.g., '.*?') or by using using regular variable interpolation (e.g., $substring inside the regex where $substring is a string contaning metacharacters).

Thus, to translate the Perl program with \Q...\E from your question into Raku, you could write:

1my $sentence = 'The quick brown fox jumped over the lazy dog';
2my $substring = 'quick.*?fox';
3$sentence =~ s{$substring}{big bad wolf};
4print $sentence
5my $sentence = 'The quick brown fox jumped over the lazy dog';
6my $substring = 'quick.*?fox';
7$sentence ~~ s/$substring/big bad wolf/;
8print $sentence
9

This would treat .*? as literal characters, not metacharacters. If you wanted to avoid interpolation with literal text rather than a variable, you could change the substitution regex to s/quick '.*?' fox/big bad wolf/. Conversely, if you want to use the $substring variable as part of a regex (that is, if you do want .*? to be metacharacters) you'd need to to change the substitution regex to s/<$substring>/big bad wolf/. For more details, you can consult the Rexex interpolation docs.

How you could have found this answer without waiting for SO

What should you do when you don't know how to do something in Raku? Asking either on the IRC channel or here on Stack Overflow is an option – and asking a clear Q on SO has the benefit of making the answer more searchable for anyone else who has the same question in the future.

But both IRC and SO are asynchronous – so you'll probably need to wait a bit for an answer. There are other ways that folks interested in Raku frequently get good/great answers to their questions more easily and quickly than they could from IRC/SO, and the remainder of this answer provides some guidance about these ways. (I've numbered the steps in the general order I'd recommend, but there's no reason you need to follow that order).

Easily get better answers more quickly than asking SO Qs Step -1: Let Raku answer the question for you

Raku strives to have awesome error messages, and sometimes you'll be lucky enough to try something in a way that doesn't work but where Raku can tell what you were trying to do.

In those cases, Raku will just tell you how to do what you wanted to do. And, in fact, \Q...\E is one such case. If you'd tried to do it the Perl way

1my $sentence = 'The quick brown fox jumped over the lazy dog';
2my $substring = 'quick.*?fox';
3$sentence =~ s{$substring}{big bad wolf};
4print $sentence
5my $sentence = 'The quick brown fox jumped over the lazy dog';
6my $substring = 'quick.*?fox';
7$sentence ~~ s/$substring/big bad wolf/;
8print $sentence
9/\Q$substring\E/
10

you'd have gotten the same answer I gave above (use $substring or quotes) in the form of the following error message:

1my $sentence = 'The quick brown fox jumped over the lazy dog';
2my $substring = 'quick.*?fox';
3$sentence =~ s{$substring}{big bad wolf};
4print $sentence
5my $sentence = 'The quick brown fox jumped over the lazy dog';
6my $substring = 'quick.*?fox';
7$sentence ~~ s/$substring/big bad wolf/;
8print $sentence
9/\Q$substring\E/
10Unsupported use of \Q as quotemeta.  In Raku please use: quotes or
11literal variable match.
12

So, sometimes, Raku will solve the problem for you! But that's not something that will happen all the time and, any time you're tempted to ask a SO question, it's a good bet that Raku didn't answer your question for you. So here are the steps you'd take in that case:

Step 0: check the docs

The first true step should, of course, be to search the Raku docs for anything useful. I bet you did this – the docs currently don't return any relevant results for \Q..\E. In fact, the only true positive match of \Q...\E in those results is from the Perl to Raku guide - in a nutshell: "using String::ShellQuote (because \Q…\E is not completely right) ...". And that's obviously not what you're interested in.

The docs website doesn't always yield a good answer to simple questions. Sometimes, as we clearly see with the \Q...\E case, it doesn't yield any answer at all for the relevant search term.

Step 1: Search Stack Overflow

Again, you probably did this, but it's good to keep in mind: You can limit your SO search questions/answers tagged as related to Raku by adding [raku] to your query. Here, a query of [raku] "\Q...\E" wouldn't have yielded anything relevant – but, thanks to your question, it will in the future :)

Step 2: Archived/historical "spec" docs

Raku's design was written up in a series of "spec" docs written principally by Larry Wall over a 2 decade period.

(The word "specs" is short for "specification speculations". It's both ultra authoritative detailed and precise specifications of the Raku language, authored primarily by Larry Wall himself, and mere speculations -- because it was all subject to implementation. And the two aspects are left entangled, and now out-of-date. So don't rely on them 100% -- but don't ignore them either.)

The "specs", aka design docs, are a fantastic resource. You can search them using google by entering your search terms in the search box at design.raku.org.


A search for \Q...\E lists 7 pages. The only useful match is Synopsis 5: Regexes and Rules ("24 Jun 2002 — \Q$var\E / ..."). If I click it and then do an in-page search for \Q, I get 2 matches that, together, answer your question (at least with respect to variables – they don't mention literal strings):

In Raku / $var / is like a Perl / \Q$var\E /

\Q...\E sequences are gone.

Step 3: IRC chat logs

In this case, searching the design docs answered your question. But what if it hadn't/we didn't understand the answer?

In that case, searching the IRC logs can be a great option (as previously discussed in the Quicker answers section of an answer to a past Q. The IRC logs are an incredibly rich mine of info with outstanding search features. Please read that section for clear general guidance.

In this particular case, if we'd searched for \Q in the old Raku channel, we would have gotten a bunch of useful matches. None of the first few fully answer your question, but several do (or at least make the answer clear) if read in context – but it's the need to read the surrounding context that makes me put searching the IRC logs below the previous steps.

Source https://stackoverflow.com/questions/71057626

QUESTION

python-docx adding bold and non-bold strings to same cell in table

Asked 2022-Feb-26 at 21:23

I'm using python-docx to create a document with a table I want to populate from textual data. My text looks like this:

101:02:10.3 
2a: Lorem ipsum dolor sit amet,  
3b: consectetur adipiscing elit.
4a: Mauris a turpis erat. 
501:02:20.4 
6a: Vivamus dignissim aliquam
7b: Nam ultricies
8(etc.)
9

I need to organize it in a table like this (using ASCII for visualization):

101:02:10.3 
2a: Lorem ipsum dolor sit amet,  
3b: consectetur adipiscing elit.
4a: Mauris a turpis erat. 
501:02:20.4 
6a: Vivamus dignissim aliquam
7b: Nam ultricies
8(etc.)
9+---+--------------------+---------------------------------+
10|   |         A          |                B                |
11+---+--------------------+---------------------------------+
12| 1 | 01:02:10.3         | a: Lorem ipsum dolor sit amet,  |
13| 2 |                    | b: consectetur adipiscing elit. |
14| 3 |                    | a: Mauris a turpis erat.        |
15| 4 | ------------------ | ------------------------------- |
16| 5 | 01:02:20.4         | a: Vivamus dignissim aliqua     |
17| 6 |                    | b: Nam ultricies                |
18+---+--------------------+---------------------------------+
19

however, I need to make it so everything after "a: " is bold, and everything after "b: " isn't, while they both occupy the same cell. It's pretty easy to iterate and organize this the way I want, but I'm really unsure about how to make only some of the lines bold:

101:02:10.3 
2a: Lorem ipsum dolor sit amet,  
3b: consectetur adipiscing elit.
4a: Mauris a turpis erat. 
501:02:20.4 
6a: Vivamus dignissim aliquam
7b: Nam ultricies
8(etc.)
9+---+--------------------+---------------------------------+
10|   |         A          |                B                |
11+---+--------------------+---------------------------------+
12| 1 | 01:02:10.3         | a: Lorem ipsum dolor sit amet,  |
13| 2 |                    | b: consectetur adipiscing elit. |
14| 3 |                    | a: Mauris a turpis erat.        |
15| 4 | ------------------ | ------------------------------- |
16| 5 | 01:02:20.4         | a: Vivamus dignissim aliqua     |
17| 6 |                    | b: Nam ultricies                |
18+---+--------------------+---------------------------------+
19IS_BOLD = { 
20    'a': True
21    'b': False
22}
23
24row_cells = table.add_row().cells
25
26for line in lines: 
27    if is_timestamp(line): # function that uses regex to discern between columns
28        if row_cells[1]:
29            row_cells = table.add_row().cells
30
31        row_cells[0].text = line
32
33    else 
34        row_cells[1].text += line
35
36        if IS_BOLD[ line.split(&quot;:&quot;)[0] ]:
37            # make only this line within the cell bold, somehow.
38

(this is sort of pseudo-code, I'm doing some more textual processing but that's kinda irrelevant here). I found one probably relevant question where someone uses something called run but I'm finding it hard to understand how to apply it to my case.

Any help? Thanks.

ANSWER

Answered 2022-Feb-26 at 21:23

You need to add run in the cell's paragraph. This way you can control the specific text you wish to bold

Full example:

101:02:10.3 
2a: Lorem ipsum dolor sit amet,  
3b: consectetur adipiscing elit.
4a: Mauris a turpis erat. 
501:02:20.4 
6a: Vivamus dignissim aliquam
7b: Nam ultricies
8(etc.)
9+---+--------------------+---------------------------------+
10|   |         A          |                B                |
11+---+--------------------+---------------------------------+
12| 1 | 01:02:10.3         | a: Lorem ipsum dolor sit amet,  |
13| 2 |                    | b: consectetur adipiscing elit. |
14| 3 |                    | a: Mauris a turpis erat.        |
15| 4 | ------------------ | ------------------------------- |
16| 5 | 01:02:20.4         | a: Vivamus dignissim aliqua     |
17| 6 |                    | b: Nam ultricies                |
18+---+--------------------+---------------------------------+
19IS_BOLD = { 
20    'a': True
21    'b': False
22}
23
24row_cells = table.add_row().cells
25
26for line in lines: 
27    if is_timestamp(line): # function that uses regex to discern between columns
28        if row_cells[1]:
29            row_cells = table.add_row().cells
30
31        row_cells[0].text = line
32
33    else 
34        row_cells[1].text += line
35
36        if IS_BOLD[ line.split(&quot;:&quot;)[0] ]:
37            # make only this line within the cell bold, somehow.
38from docx import Document
39from docx.shared import Inches
40import os
41import re
42
43
44def is_timestamp(line):
45    # it's flaky, I saw you have your own method and probably you did a better job parsing this.
46    return re.match(r'^\d{2}:\d{2}:\d{2}', line) is not None
47
48
49def parse_raw_script(raw_script):
50    current_timestamp = ''
51    current_content = ''
52    for line in raw_script.splitlines():
53        line = line.strip()
54        if is_timestamp(line):
55            if current_timestamp:
56                yield {
57                    'timestamp': current_timestamp,
58                    'content': current_content
59                }
60
61            current_timestamp = line
62            current_content = ''
63            continue
64
65        if current_content:
66            current_content += '\n'
67
68        current_content += line
69
70    if current_timestamp:
71        yield {
72            'timestamp': current_timestamp,
73            'content': current_content
74        }
75
76
77def should_bold(line):
78    # i leave it to you to replace with your logic
79    return line.startswith('a:')
80
81
82def load_raw_script():
83    # I placed here the example from your question. read from file instead I presume
84
85    return '''01:02:10.3 
86a: Lorem ipsum dolor sit amet,  
87b: consectetur adipiscing elit.
88a: Mauris a turpis erat. 
8901:02:20.4 
90a: Vivamus dignissim aliquam
91b: Nam ultricies'''
92
93
94def convert_raw_script_to_docx(raw_script, output_file_path):
95    document = Document()
96    table = document.add_table(rows=1, cols=3, style=&quot;Table Grid&quot;)
97
98    # add header row
99    header_row = table.rows[0]
100    header_row.cells[0].text = ''
101    header_row.cells[1].text = 'A'
102    header_row.cells[2].text = 'B'
103
104    # parse the raw script into something iterable
105    script_rows = parse_raw_script(raw_script)
106
107    # create a row for each timestamp row
108    for script_row in script_rows:
109        timestamp = script_row['timestamp']
110        content = script_row['content']
111
112        row = table.add_row()
113        timestamp_cell = row.cells[1]
114        timestamp_cell.text = timestamp
115
116        content_cell = row.cells[2]
117        content_paragraph = content_cell.paragraphs[0]  # using the cell's default paragraph here instead of creating one
118
119        for line in content.splitlines():
120            run = content_paragraph.add_run(line)
121            if should_bold(line):
122                run.bold = True
123
124            run.add_break()
125
126    # resize table columns (optional)
127    for row in table.rows:
128        row.cells[0].width = Inches(0.2)
129        row.cells[1].width = Inches(1.9)
130        row.cells[2].width = Inches(3.9)
131
132    document.save(output_file_path)
133
134
135def main():
136    script_dir = os.path.dirname(__file__)
137    dist_dir = os.path.join(script_dir, 'dist')
138
139    if not os.path.isdir(dist_dir):
140        os.makedirs(dist_dir)
141
142    output_file_path = os.path.join(dist_dir, 'so-template.docx')
143    raw_script = load_raw_script()
144    convert_raw_script_to_docx(raw_script, output_file_path)
145
146
147if __name__ == '__main__':
148    main()
149
150

Result (file should be in ./dist/so-template.docx):

enter image description here


BTW - if you prefer sticking with your own example, this is what needs to be changed:

101:02:10.3 
2a: Lorem ipsum dolor sit amet,  
3b: consectetur adipiscing elit.
4a: Mauris a turpis erat. 
501:02:20.4 
6a: Vivamus dignissim aliquam
7b: Nam ultricies
8(etc.)
9+---+--------------------+---------------------------------+
10|   |         A          |                B                |
11+---+--------------------+---------------------------------+
12| 1 | 01:02:10.3         | a: Lorem ipsum dolor sit amet,  |
13| 2 |                    | b: consectetur adipiscing elit. |
14| 3 |                    | a: Mauris a turpis erat.        |
15| 4 | ------------------ | ------------------------------- |
16| 5 | 01:02:20.4         | a: Vivamus dignissim aliqua     |
17| 6 |                    | b: Nam ultricies                |
18+---+--------------------+---------------------------------+
19IS_BOLD = { 
20    'a': True
21    'b': False
22}
23
24row_cells = table.add_row().cells
25
26for line in lines: 
27    if is_timestamp(line): # function that uses regex to discern between columns
28        if row_cells[1]:
29            row_cells = table.add_row().cells
30
31        row_cells[0].text = line
32
33    else 
34        row_cells[1].text += line
35
36        if IS_BOLD[ line.split(&quot;:&quot;)[0] ]:
37            # make only this line within the cell bold, somehow.
38from docx import Document
39from docx.shared import Inches
40import os
41import re
42
43
44def is_timestamp(line):
45    # it's flaky, I saw you have your own method and probably you did a better job parsing this.
46    return re.match(r'^\d{2}:\d{2}:\d{2}', line) is not None
47
48
49def parse_raw_script(raw_script):
50    current_timestamp = ''
51    current_content = ''
52    for line in raw_script.splitlines():
53        line = line.strip()
54        if is_timestamp(line):
55            if current_timestamp:
56                yield {
57                    'timestamp': current_timestamp,
58                    'content': current_content
59                }
60
61            current_timestamp = line
62            current_content = ''
63            continue
64
65        if current_content:
66            current_content += '\n'
67
68        current_content += line
69
70    if current_timestamp:
71        yield {
72            'timestamp': current_timestamp,
73            'content': current_content
74        }
75
76
77def should_bold(line):
78    # i leave it to you to replace with your logic
79    return line.startswith('a:')
80
81
82def load_raw_script():
83    # I placed here the example from your question. read from file instead I presume
84
85    return '''01:02:10.3 
86a: Lorem ipsum dolor sit amet,  
87b: consectetur adipiscing elit.
88a: Mauris a turpis erat. 
8901:02:20.4 
90a: Vivamus dignissim aliquam
91b: Nam ultricies'''
92
93
94def convert_raw_script_to_docx(raw_script, output_file_path):
95    document = Document()
96    table = document.add_table(rows=1, cols=3, style=&quot;Table Grid&quot;)
97
98    # add header row
99    header_row = table.rows[0]
100    header_row.cells[0].text = ''
101    header_row.cells[1].text = 'A'
102    header_row.cells[2].text = 'B'
103
104    # parse the raw script into something iterable
105    script_rows = parse_raw_script(raw_script)
106
107    # create a row for each timestamp row
108    for script_row in script_rows:
109        timestamp = script_row['timestamp']
110        content = script_row['content']
111
112        row = table.add_row()
113        timestamp_cell = row.cells[1]
114        timestamp_cell.text = timestamp
115
116        content_cell = row.cells[2]
117        content_paragraph = content_cell.paragraphs[0]  # using the cell's default paragraph here instead of creating one
118
119        for line in content.splitlines():
120            run = content_paragraph.add_run(line)
121            if should_bold(line):
122                run.bold = True
123
124            run.add_break()
125
126    # resize table columns (optional)
127    for row in table.rows:
128        row.cells[0].width = Inches(0.2)
129        row.cells[1].width = Inches(1.9)
130        row.cells[2].width = Inches(3.9)
131
132    document.save(output_file_path)
133
134
135def main():
136    script_dir = os.path.dirname(__file__)
137    dist_dir = os.path.join(script_dir, 'dist')
138
139    if not os.path.isdir(dist_dir):
140        os.makedirs(dist_dir)
141
142    output_file_path = os.path.join(dist_dir, 'so-template.docx')
143    raw_script = load_raw_script()
144    convert_raw_script_to_docx(raw_script, output_file_path)
145
146
147if __name__ == '__main__':
148    main()
149
150IS_BOLD = {
151    'a': True,
152    'b': False
153}
154
155row_cells = table.add_row().cells
156
157for line in lines:
158    if is_timestamp(line):
159        if row_cells[1]:
160            row_cells = table.add_row().cells
161        row_cells[0].text = line
162
163    else:
164        run = row_cells[1].paragraphs[0].add_run(line)
165        if IS_BOLD[line.split(&quot;:&quot;)[0]]:
166            run.bold = True
167
168        run.add_break()
169

Source https://stackoverflow.com/questions/71150313

QUESTION

What Raku regex modifier makes a dot match a newline (like Perl's /s)?

Asked 2022-Feb-09 at 23:24

How do I make the dot (.) metacharacter match a newline in a Raku regex? In Perl, I would use the dot matches newline modifier (/s)?

ANSWER

Answered 2022-Feb-07 at 10:40

TL;DR The Raku equivalent for "Perl dot matches newline" is ., and for \Q...\E it's ....

There are ways to get better answers (more authoritative, comprehensive, etc than SO ones) to most questions like these more easily (typically just typing the search term of interest) and quickly (typically seconds, couple minutes tops). I address that in this answer.

What is Raku equivalent for "Perl dot matches newline"?

Just .

If you run the following Raku program:

1/./s
2

you'll see the following error message:

1/./s
2Unsupported use of /s.  In Raku please use: .  or \N.
3

If you type . in the doc site's search box it lists several entries. One of them is . (regex). Clicking it provides examples and says:

An unescaped dot . in a regex matches any single character. ...
Notably . also matches a logical newline \n

My guess is you either didn't look for answers before asking here on SO (which is fair enough -- I'm not saying don't; that said you can often easily get good answers nearly instantly if you look in the right places, which I'll cover in this answer) or weren't satisfied by the answers you got (in which case, again, read on).

In case I've merely repeated what you've already read, or it's not enough info, I will provide a better answer below, after I write up an initial attempt to give a similar answer for your \Q...\E question -- and fail when I try the doc step.

What is Raku equivalent for Perl \Q...\E?

'...', or $foo if the ... was metasyntax for a variable name.

If you run the following Raku program:

1/./s
2Unsupported use of /s.  In Raku please use: .  or \N.
3/\Qfoo\E/
4

you'll see the following error message:

1/./s
2Unsupported use of /s.  In Raku please use: .  or \N.
3/\Qfoo\E/
4Unsupported use of \Q as quotemeta.  In Raku please use: quotes or
5literal variable match.
6

If you type \Q...\E in the doc site's search box it lists just one entry: Not in Index (try site search). If you go ahead and try the search as suggested, you'll get matching pages according to google. For me the third page/match listed (Perl to Raku guide - in a nutshell: "using String::ShellQuote (because \Q…\E is not completely right) ...") is the only true positive match of \Q...\E among 27 matches. And it's obviously not what you're interested in.

So, searching the doc for \S...\E appears to be a total bust.


How does one get answers to a question like "what is the Raku equivalent of Perl's \Q...\E?" if the doc site ain't helpful (and one doesn't realize Rakudo happens to have a built in error message dedicated to the exact thing of interest and/or isn't sure what the error message means)? What about questions where neither Rakudo nor the doc site are illuminating?

SO is one option, but what lets folk interested in Raku frequently get good/great answers to their questions easily and quickly when they can't get them from the doc site because the answer is hard to find or simply doesn't exist in the docs?

Easily get better answers more quickly than asking SO Qs

The docs website doesn't always yield a good answer to simple questions. Sometimes, as we clearly see with the \Q...\E case, it doesn't yield any answer at all for the relevant search term.

Fortunately there are several other easily searchable sources of rich and highly relevant info that often work when the doc site does not for certain kinds of info/searches. This is especially likely if you've got precise search terms in mind such as /s or \Q...\E and/or are willing browse info provided it's high signal / low noise. I'll introduce two of these resources in the remainder of this answer.

Archived "spec" docs

Raku's design was written up in a series of "spec" docs written principally by Larry Wall over a 2 decade period.

(The word "specs" is short for "specification speculations". It's both ultra authoritative detailed and precise specifications of the Raku language, authored primarily by Larry Wall himself, and mere speculations -- because it was all subject to implementation. And the two aspects are left entangled, and now out-of-date. So don't rely on them 100% -- but don't ignore them either.)

The "specs", aka design docs, are a fantastic resource. You can search them using google by entering your search terms in the search box at design.raku.org.


A search for /s lists 25 pages. The only useful match is Synopsis 5: Regexes and Rules ("24 Jun 2002 — There are no /s or /m modifiers (changes to the meta-characters replace them - see below)." Click it. Then do an in-page search for /s (note the space). You'll see 3 matches:

There are no /s or /m modifiers (changes to the meta-characters replace them - see below)

A dot . now matches any character including newline. (The /s modifier is gone.)

. matches an anything, while \N matches an anything except what \n matches. (The /s modifier is gone.) In particular, \N matches neither carriage return nor line feed.


A search for \Q...\E lists 7 pages. The only useful match is again Synopsis 5: Regexes and Rules ("24 Jun 2002 — \Q$var\E / ..."). Click it. Then do an in-page search for \Q. You'll see 2 matches:

In Raku / $var / is like a Perl / \Q$var\E /

\Q...\E sequences are gone.

Chat logs

I've expanded the Quicker answers section of my answer to one of your earlier Qs to discuss searching the Raku "chat logs". They are an incredibly rich mine of info with outstanding search features. Please read that section of my prior answer for clear general guidance. The rest of this answer will illustrate for /s and \Q...\E.


A search for the regex / newline . ** ^200 '/s' / in the old Raku channel from 2010 thru 2015 found this match:

. matches an anything, while \N matches an anything except what \n matches. (The /s modifier is gone.) In particular, \N matches neither carriage return nor line feed.

Note the shrewdness of my regex. The pattern is the word "newline" (which is hopefully not too common) followed within 200 characters by the two character sequence /s (which I suspect is more common than newline). And I constrained to 2010-2014 because a search for that regex of the entire 15 years of the old Raku channel would tax Liz's server and time out. I got that hit I've quoted above within a couple minutes of trying to find some suitable match of /s (not end-of-sarcasm!).


A search for \Q in the old Raku channel was an immediate success. Within 30 seconds of the thought "I could search the logs" I had a bunch of useful matches.

Source https://stackoverflow.com/questions/70996158

QUESTION

How can the Raku behavior on capturing group in alternate be the same as Perl

Asked 2022-Jan-29 at 21:40

How can Raku behavior on capturing group in alternate be just like Perl regex' one e.g.

1&gt; 'abefo' ~~ /a [(b) | (c) (d)] (e)[(f)|(g)]/
2「abef」
3 0 =&gt; 「b」
4 2 =&gt; 「e」
5 3 =&gt; 「f」
6

needed to be 'usual' Perl regex result (let index system stay Raku):

1&gt; 'abefo' ~~ /a [(b) | (c) (d)] (e)[(f)|(g)]/
2「abef」
3 0 =&gt; 「b」
4 2 =&gt; 「e」
5 3 =&gt; 「f」
6 $0 = 'b'
7 $1 = undef
8 $2 = undef 
9 $3 = e
10 $4 = f
11

Please give useful guide.

ANSWER

Answered 2022-Jan-29 at 15:38

Quoting the Synopsis 5: Regexes and Rules design speculation document:

it is still possible to mimic the monotonic Perl 5 capture indexing semantics

Inserting a $3= for the (e):

1&gt; 'abefo' ~~ /a [(b) | (c) (d)] (e)[(f)|(g)]/
2「abef」
3 0 =&gt; 「b」
4 2 =&gt; 「e」
5 3 =&gt; 「f」
6 $0 = 'b'
7 $1 = undef
8 $2 = undef 
9 $3 = e
10 $4 = f
11/ a [ (b) | (c) (d) ] $3=(e) [ (f) | (g) ] /
12
13andthen say 'abefo' ~~ $_
14
15「abef」
16 0 =&gt; 「b」
17 3 =&gt; 「e」
18 4 =&gt; 「f」
19

I've briefly looked for a mention of this in the doc but didn't see it.

So maybe we should file doc issues for mentioning this, presumably in Capture numbers and $ ($1, $2, ...).

Source https://stackoverflow.com/questions/70904460

QUESTION

Difference in Perl regex variable $+{name} and $-{name}

Asked 2022-Jan-18 at 09:12

What is the difference between Perl regex variables $+{name} and $-{name} when both are used to refer to the same regex group from Perl statement/expression code?

ANSWER

Answered 2022-Jan-18 at 06:36

While $+{name} holds the captured substring referred by name as a scalar value, $-{name} refers to an array which holds capture groups with the name.
Here is a tiny example:

1#!/usr/bin/perl
2
3use strict;
4use warnings;
5
6'12' =~ /(?&lt;foo&gt;\d)(?&lt;foo&gt;\d)/; # '1' and '2' will be captured individually
7
8print $+{'foo'}, &quot;\n&quot;;          # prints '1'
9
10for (@{$-{'foo'}}) {            # $-{'foo'} is a reference to an array
11    print $_, &quot;\n&quot;;             # prints '1' and '2'
12}
13

As $+{name} can hold only a single scalar value, it is assigned to the first (leftmost) element of the capture groups.

Source https://stackoverflow.com/questions/70750715

QUESTION

Combine 2 string columns in pandas with different conditions in both columns

Asked 2021-Dec-21 at 13:18

I have 2 columns in pandas, with data that looks like this.

1code fx         category
2AXD  AXDG.R     cat1
3AXF  AXDG_e.FE  cat1 
4333  333.R      cat1
5....
6

There are other categories but I am only interested in cat1.

I want to combine everything from the code column, and everything after the . in the fx column and replace the code column with the new combination without affecting the other rows.

1code fx         category
2AXD  AXDG.R     cat1
3AXF  AXDG_e.FE  cat1 
4333  333.R      cat1
5....
6code    fx         category
7AXD.R   AXDG.R     cat1
8AXF.FE  AXDG_e.FE  cat1
9333.R   333.R      cat1
10.....
11

Here is my code, I think I have to use regex but I'm not sure how to combine it in this way.

1code fx         category
2AXD  AXDG.R     cat1
3AXF  AXDG_e.FE  cat1 
4333  333.R      cat1
5....
6code    fx         category
7AXD.R   AXDG.R     cat1
8AXF.FE  AXDG_e.FE  cat1
9333.R   333.R      cat1
10.....
11df.loc[df['category']== 'cat1', 'code'] = df[df['category'] == 'cat1']['code'].str.replace(r'[a-z](?=\.)', '', regex=True).str.replace(r'_?(?=\.)','', regex=True).str.replace(r'G(?=\.)', '', regex=True)
12

I'm not sure how to select the second column also. Any help would be greatly appreciated.

ANSWER

Answered 2021-Dec-19 at 18:10

We can get the expected result using split like so :

1code fx         category
2AXD  AXDG.R     cat1
3AXF  AXDG_e.FE  cat1 
4333  333.R      cat1
5....
6code    fx         category
7AXD.R   AXDG.R     cat1
8AXF.FE  AXDG_e.FE  cat1
9333.R   333.R      cat1
10.....
11df.loc[df['category']== 'cat1', 'code'] = df[df['category'] == 'cat1']['code'].str.replace(r'[a-z](?=\.)', '', regex=True).str.replace(r'_?(?=\.)','', regex=True).str.replace(r'G(?=\.)', '', regex=True)
12&gt;&gt;&gt; df['code'] = df['code'] + '.' + df['fx'].str.split(pat=&quot;.&quot;, expand=True)[1]
13&gt;&gt;&gt; df
14    code    fx          category    
150   AXD.R   AXDG.R      cat1        
161   AXF.FE  AXDG_e.FE   cat1        
172   333.R   333.R       cat1    
18

To filter only on cat1, as @anky did very well, we can add a where statement:

1code fx         category
2AXD  AXDG.R     cat1
3AXF  AXDG_e.FE  cat1 
4333  333.R      cat1
5....
6code    fx         category
7AXD.R   AXDG.R     cat1
8AXF.FE  AXDG_e.FE  cat1
9333.R   333.R      cat1
10.....
11df.loc[df['category']== 'cat1', 'code'] = df[df['category'] == 'cat1']['code'].str.replace(r'[a-z](?=\.)', '', regex=True).str.replace(r'_?(?=\.)','', regex=True).str.replace(r'G(?=\.)', '', regex=True)
12&gt;&gt;&gt; df['code'] = df['code'] + '.' + df['fx'].str.split(pat=&quot;.&quot;, expand=True)[1]
13&gt;&gt;&gt; df
14    code    fx          category    
150   AXD.R   AXDG.R      cat1        
161   AXF.FE  AXDG_e.FE   cat1        
172   333.R   333.R       cat1    
18&gt;&gt;&gt; df['code'] = (df['code'] + '.' + df['fx'].str.split(pat=&quot;.&quot;, expand=True)[1]).where(df['category'].eq(&quot;cat1&quot;), df['code'])
19

Source https://stackoverflow.com/questions/70413959

QUESTION

Lookaround regex and character consumption

Asked 2021-Dec-20 at 12:26

Based on the documentation for Raku's lookaround assertions, I read the regex / <?[abc]> <alpha> / as saying "starting from the left, match but do not not consume one character that is a, b, or c and, once you have found a match, match and consume one alphabetic character."

Thus, this output makes sense:

1'abc' ~~ / &lt;?[abc]&gt; &lt;alpha&gt; /     # OUTPUT: «「a」␤ alpha =&gt; 「a」»
2

Even though that regex has two one-character terms, one of them does not capture so our total capture is only one character long.

But next expression confuses me:

1'abc' ~~ / &lt;?[abc]&gt; &lt;alpha&gt; /     # OUTPUT: «「a」␤ alpha =&gt; 「a」»
2'abc' ~~ / &lt;?[abc\s]&gt; &lt;alpha&gt; /     # OUTPUT: «「ab」␤ alpha =&gt; 「b」»
3

Now, our total capture is two characters long, and one of those isn't captured by <alpha>. So is the lookaround capturing something after all? Or am I misunderstanding something else about how the lookaround works?

ANSWER

Answered 2021-Dec-20 at 12:26

<?[ ]> and <![ ]> does not seem to support some backslashed character classes. \n, \s, \d and \w show similar results.

<?[abc\s]> behaves the same as <[abc\s]> when \n, \s, \d or \w is added.

\t, \h, \v, \c[NAME] and \x61 seem to work as normal.

Source https://stackoverflow.com/questions/69004383

QUESTION

Parsing binary files in Raku

Asked 2021-Nov-09 at 11:34

I would like to parse binary files in Raku using its regex / grammar engine, but I didn't found how to do it because the input is coerce to string.

Is there a way to avoid this string coercion and use objects of type Buf or Blob ?

I was thinking maybe it is possible to change something in the Metamodel ?

I know that I can use unpack but I would really like to use the grammar engine insted to have more flexibility and readability.

Am I hitting an inherent limit to Raku capabilities here ?

And before someone tells me that regexes are for string and that I shouldn't do it, it should point out that perl's regex engine can match bytes as far as I know, and I could probably use it with Regexp::Grammars, but I prefer not to and use Raku instead.

Also, I don't see any fundamental reason why regex should be reserved only to string, a NFA of automata theory isn't intriscally made for characters instead of bytes.

ANSWER

Answered 2021-Nov-09 at 11:34

Is there a way to avoid this string coercion and use objects of type Buf or Blob ?

Unfortunately not at present. However, one can use the Latin-1 encoding, which gives a meaning to every byte, so any byte sequence will decode to it, and could then be matched using a grammar.

Also, I don't see any fundamental reason why regex should be reserved only to string, a NFA of automata theory isn't intriscally made for characters instead of bytes.

There isn't one; it's widely expected that the regex/grammar engine will be rebuilt at some point in the future (primarily to deal with performance limitations), and that would be a good point to also consider handling bytes and also codepoint level strings (Uni).

Source https://stackoverflow.com/questions/69892053

QUESTION

Regex to match nothing but zeroes after the first zero

Asked 2021-Nov-03 at 11:42

Using regular expressions, how can I make sure there are nothing but zeroes after the first zero?

1ABC1000000 - valid
23212130000 - valid
30000000000 - valid
4ABC1000100 - invalid
50001000000 - invalid
6

The regex without validation would be something like this - [A-Z0-9]{10}, making sure it is 10 characters.

ANSWER

Answered 2021-Nov-03 at 11:42

You could update the pattern to:

1ABC1000000 - valid
23212130000 - valid
30000000000 - valid
4ABC1000100 - invalid
50001000000 - invalid
6^(?=[A-Z0-9]{10}$)[A-Z1-9]*0+$
7

The pattern matches:

  • ^ Start of string
  • (?=[A-Z0-9]{10}$) Positive looakhead, assert 10 allowed chars
  • [A-Z1-9]* Optionally match any char of [A-Z1-9]
  • 0+ Match 1+ zeroes
  • $ End of string

Regex demo

If a value without zeroes is also allowed, the last quantifier can be * matching 0 or more times (and a bit shorter version by the comment of @Deduplicator using a negated character class):

1ABC1000000 - valid
23212130000 - valid
30000000000 - valid
4ABC1000100 - invalid
50001000000 - invalid
6^(?=[A-Z0-9]{10}$)[A-Z1-9]*0+$
7^(?=[A-Z0-9]{10}$)[^0]*0*$
8

An example with JavaScript:

1ABC1000000 - valid
23212130000 - valid
30000000000 - valid
4ABC1000100 - invalid
50001000000 - invalid
6^(?=[A-Z0-9]{10}$)[A-Z1-9]*0+$
7^(?=[A-Z0-9]{10}$)[^0]*0*$
8const regex = /^(?=[A-Z0-9]{10}$)[^0]*0*$/;
9["ABC1000000", "3212130000", "0000000000", "ABC1000100", "0001000000"]
10.forEach(s =&gt;
11  console.log(`${s} --&gt; ${regex.test(s)}`)
12);

As an alternative without lookarounds, you could also match what you don't want, and capture in group 1 what you want to keep.

To make sure there are nothing but zeroes after the first zero, you could stop the match as soon as you match 0 followed by 1 char of the same range without the 0.

In the alternation, the second part can then capture 10 chars of range A-Z0-9.

1ABC1000000 - valid
23212130000 - valid
30000000000 - valid
4ABC1000100 - invalid
50001000000 - invalid
6^(?=[A-Z0-9]{10}$)[A-Z1-9]*0+$
7^(?=[A-Z0-9]{10}$)[^0]*0*$
8const regex = /^(?=[A-Z0-9]{10}$)[^0]*0*$/;
9["ABC1000000", "3212130000", "0000000000", "ABC1000100", "0001000000"]
10.forEach(s =&gt;
11  console.log(`${s} --&gt; ${regex.test(s)}`)
12);^(?:[A-Z1-9]*0+[A-Z1-9]|([A-Z0-9]{10})$)
13

The pattern matches:

  • ^ Start of string
  • (?: Non capture group for the alternation |
    • [A-Z1-9]*0+[A-Z1-9] Match what should not occur, in this case a zero followed by a char from the range without a zero
    • | Or
    • ([A-Z0-9]{10}) Capture group 1, match 10 chars in range [A-Z0-9]
  • $ End of string
  • ) Close non capture group

Regex demo

Source https://stackoverflow.com/questions/69798705

QUESTION

python regex where a set of options can occur at most once in a list, in any order

Asked 2021-Nov-03 at 10:04

I'm wondering if there's any way in python or perl to build a regex where you can define a set of options can appear at most once in any order. So for example I would like a derivative of foo(?: [abc])*, where a, b, c could only appear once. So:

1foo a b c
2foo b c a
3foo a b
4foo b
5

would all be valid, but

1foo a b c
2foo b c a
3foo a b
4foo b
5foo b b
6

would not be

ANSWER

Answered 2021-Oct-08 at 07:56

You may use this regex with a capture group and a negative lookahead:

For Perl, you can use this variant with forward referencing:

1foo a b c
2foo b c a
3foo a b
4foo b
5foo b b
6^foo((?!.*\1) [abc])+$
7

RegEx Demo

RegEx Details:

  • ^: Start
  • foo: Match foo
  • (: Start a capture group #1
    • (?!.*\1): Negative lookahead to assert that we don't match what we have in capture group #1 anywhere in input
    • [abc]: Match a space followed by a or b or c
  • )+: End capture group #1. Repeat this group 1+ times
  • $: End

As mentioned earlier, this regex is using a feature called Forward Referencing which is a back-reference to a group that appears later in the regex pattern. JGsoft, .NET, Java, Perl, PCRE, PHP, Delphi, and Ruby allow forward references but Python doesn't.


Here is a work-around of same regex for Python that doesn't use forward referencing:

1foo a b c
2foo b c a
3foo a b
4foo b
5foo b b
6^foo((?!.*\1) [abc])+$
7^foo(?!.* ([abc]).*\1)(?: [abc])+$
8

Here we use a negative lookahead before repeated group to check and fail the match if there is any repeat of allowed substrings i.e. [abc].

RegEx Demo 2

Source https://stackoverflow.com/questions/69486648

Community Discussions contain sources that include Stack Exchange Network

Tutorials and Learning Resources in Regex

Tutorials and Learning Resources are not available at this moment for Regex

Share this Page

share link

Get latest updates on Regex