Содержание:
Lesson # 12. Theory
- Regular Expressions allow us to search for specific patterns of text.
- Pattern string contains the wildcards.
using System.Text.RegularExpressions; // main namespace |
.NET classes for regular expressions:
Static methods of Regex class
1. Match Regex.Match(s,pattern) 2. bool Regex.IsMatch(s,pattern) 3. MatchCollection Regex.Matches(s,pattern) 4. string Regex.Replace(s,pattern,replace_s) 5. string[] Regex.Split(s,pattern)
Instance methods (for reusable use of a single pattern)
var r = new Regex(pattern); r.Match(s) r.IsMatch(s) r.Matches(s) r.Replace(s,replace_s) |
Match class variables and their properties
string s = "one two three four two five alice two"; Match m = Regex.Match(s, "two"); //1. m.Success Console.WriteLine(m.Success); // True //2. m.Value Console.WriteLine(m.Value); // two //3. m.Index Console.WriteLine(m.Index); // 4 //4. m.Length Console.WriteLine(m.Length); // 2 //5. m.NextMatch().Index Console.WriteLine(m.NextMatch().Index); // 19 |
MatchCollection class – Count
property.
foreach (var m in MatchCollection) // here m is Match type |
Sample 1:
//... // including the main namespace of RegularExpressions using System.Text.RegularExpressions; //... // just some string string s = " one two three four two five alice two"; var m = Regex.Match(s, "two"); // pattern // methods: Console.WriteLine(m.Index); // output: 5 Console.WriteLine(m.NextMatch().Index); // output: 20 Console.WriteLine(m.NextMatch().NextMatch().Index); // output: 35 |
Sample 2:
// including the main namespace of RegularExpressions using System.Text.RegularExpressions; //... // just some string string s = " one two three four two five alice two"; // Using a loop to iterate through text foreach (Match m in Regex.Matches(s, "two")) { Console.Write(m.Index + " "); // output: 5 20 35 } |
Sample 3:
// including the main namespace of RegularExpressions using System.Text.RegularExpressions; //... // just some string string s = " one two three four two five alice two"; var ss = s.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries); var ss = Regex.Split(s, " +"); Console.WriteLine(ss); |
Examples of regular expressions
The text where to find | Pattern string to find |
---|---|
asdasdasdhelloasdasdasd | @“hello” |
hello | @“^hello$” |
asdasdelholasdasdasd
asdasdeeeeeedfgdgdg |
@“[hello]” or @“[hello]{6}” |
SdfuiyuiewrR345 | @“[a-zA-Z0-9]” or @“\w” |
452341 | @“[0-9]” or @“\d” |
jsdf8H?& | @“.” |
Example:
string s = "asdasdasdhelloasdasdasd"; Match m = Regex.Match(s, @"hello"); Console.WriteLine(m.Success); // True m = Regex.Match(s, @"^hello"); Console.WriteLine(m.Success); // False |
Metacharacters and escaping
The following metacharacters have a special purpose in regular expressions:
( ) { } [ ] ? * + - ^ $ . | \
If you want these characters to mean literally (e.g. .
as a period), you may need to do what is called «escaping». This is done by preceding the character with a \
.
Of course, a \
is also an escape character for C# string literals. To get a literal \
, you need to double it in your string literal (i.e. «\\
» is a string of length one). Alternatively, C# also has what is called verbatim @
string literals, where escape sequences are not processed. Thus, the following two strings are equal:
"c:\\Docs\\Source\\a.txt" @"c:\Docs\Source\a.txt" |
Quantifiers
Wildcards | Explanation | Example | Sample Match |
---|---|---|---|
\d | one digit from 0 to 9 | file_\d\d | file_25 |
\w | «word character»: Unicode letter, ideogram, digit, or connector | \w—\w\w\w |
A-b_1 |
\s | «whitespace character»: any Unicode separator | a\sb\sc | a b c |
\D | One character that is not a digit | \D\D\D | ABC |
\W | One character that is not a word character as defined by your engine’s \w | \W\W\W\W\W | *-+=) |
\S | One character that is not a whitespace character as defined by your engine’s \s | \S\S\S\S | Yoyo |
\b | Word boundaries | ||
\B | Non-word boundaries | ||
. | Any character except line break | a.c | abc |
\. | A period (special character: needs to be escaped by a \) | a\.c | a.c |
\ | Escapes a special character | \[\{\(\)\}\] | [{()}] |
Quantifier | Explanation | Example | Sample Match |
---|---|---|---|
+ | One or more | Version \w-\w+ | Version A-b1_1 |
{3} | Exactly three times | \D{3} | ABC |
{2,4} | Two to four times | \d{2,4} | 156 |
{3,} | Three or more times | \w{3,} | regex_tutorial |
* | Zero or more times | A*B*C* | AAACC |
? | Once or none | plurals? | plural |
[…] | Any character within the braces | ||
| | A character before | OR after it |
cat|dog | sdfcatsdf |
Zero-length directives | |||
^ | Search from the beginning of the line | ||
$ | Search to the end of the line | ||
\b | position on a word boundary |
Replacements with a help of regular expressions
string s = "10+2=12"; s = Regex.Replace(s, @"\d+", "<$0>"); // <10>+<2>=<12> s = Regex.Replace(s, @"\d+", m => (int.Parse(m.Value) * 2).ToString()); // 20+4=24 |
Examples
Example 1:
The following example matches words that start with ‘S’
using System; using System.Text.RegularExpressions; namespace RegExApplication { class Program { private static void showMatch(string text, string expr) { Console.WriteLine("The Expression: " + expr); MatchCollection mc = Regex.Matches(text, expr); foreach (Match m in mc) { Console.WriteLine(m); } } static void Main(string[] args) { string str = "A Thousand Splendid Suns"; Console.WriteLine("Matching words that start with 'S': "); showMatch(str, @"\bS\S*"); Console.ReadKey(); } } } |
Result:
Matching words that start with 'S': The Expression: \bS\S* Splendid Suns
Example 2:
The following example matches words that start with ‘m’ and ends with ‘e’
using System; using System.Text.RegularExpressions; namespace RegExApplication { class Program { private static void showMatch(string text, string expr) { Console.WriteLine("The Expression: " + expr); MatchCollection mc = Regex.Matches(text, expr); foreach (Match m in mc) { Console.WriteLine(m); } } static void Main(string[] args) { string str = "make maze and manage to measure it"; Console.WriteLine("Matching words start with 'm' and ends with 'e':"); showMatch(str, @"\bm\S*e\b"); Console.ReadKey(); } } } |
Result:
Matching words start with 'm' and ends with 'e': The Expression: \bm\S*e\b make maze manage measure
Example 3:
This example replaces extra white space
Live Demo using System; using System.Text.RegularExpressions; namespace RegExApplication { class Program { static void Main(string[] args) { string input = "Hello World "; string pattern = "\\s+"; string replacement = " "; Regex rgx = new Regex(pattern); string result = rgx.Replace(input, replacement); Console.WriteLine("Original String: {0}", input); Console.WriteLine("Replacement String: {0}", result); Console.ReadKey(); } } } |
Result:
Original String: Hello World Replacement String: Hello World
Labs and Tasks
To do: Ask user to input a phone number. Check to see if the entered number is a Rostov phone number in Federal format (if it is, so the number must have a format:
+7 (863) 3**-**-**
or +7 (863) 2**-**-**
). Where *
means any digit. Write a function that returns a Boolean value (true
or false
).
Note 1: The string must not contain any other text except a phone number, so the corresponding regular expression must contain the ^
and $
markers.
Note 2: Since the +
, (
and )
symbols have a special value in regular expressions, they must be escaped, like it is here: \+
.
Note 3: To check to see if the program works properly you must make three or four automatic tests. Use Debug.Assert
to do it.
Result example:
Tests are done well Please input phone number: +7 (863) 323-22-12 True +++++++++++++++++ Tests are done well Please input phone number: 7 (863) 323-22-12 False +++++++++++++++++ Tests are done well Please input phone number: +7 (8634) 323-22-12 False
[Solution and Project name: Lesson_12Lab1
, file name L12Lab1.cs
]
✍ How to do:
- Create a new project with a name and file name as it is given in the task.
- Place your cursor at the top of the code, after the place where classes and namespaces are included. Include the following namespace to use regular expressions’ methods:
- To make automatic tests one more class must be added. Place the following code after the previous:
- Let’s consider the phone number symbol by symbol:
+7 (863) 3**-**-**
or+7 (863) 2**-**-**
+
: it is a special character, it means that to use it in our pattern we need to put\
before it to escape the special character. So we have:
\+
7
we have a particular number, so we don’t need to use some quantifier or character. Now we have:whitespace character
: We have a special quantifier\s
to use a whitespace in a pattern:(
: it is a special character, it means that to use it in our pattern we need to put\
before it to escape the special character:863
: they are particular numbers, so we don’t need to use some quantifier or character. Now we have:)
: we need to put\
before it to escape the special character:whitespace character
: We have a special quantifier\s
to use a whitespace in a pattern:3 or 2
: We have place[]
braces within a pattern, that means any character inside the braces:any digit
: We have a special quantifier\d
to use one digit from 0 to 9 in a pattern. To combine 3 digits (we have *** in the phone number) we will use curly braces with a specified number of the digits:-
: it is a particular character, so we don’t need to use some quantifier or character. Now we have:any digit
: We have a special quantifier\d
to use one digit from 0 to 9 in a pattern. To combine 2 digits we will use curly braces with a specified number of the digits:-
: it is a particular character:any digit
:- We have a note in the task, that the text with a phone number must not contain any other text except a phone number. So it has to begin with
^
quantifier and end with$
quantifier, which mean the beginning and end of our template: - Create a function named
IsPhonenumber()
that has one parameter — inputted string, and returns the boolean type —true
orfalse
: - Inside the created function we’re going to use
IsMatch
static method of Regex class that has two parameters, they are input string and regular expression. - Within the
Main
function we’re going to call createdIsPhonenumber
method. But we need to do it using the automatic test: the methodAssert
ofDebug
class.Assert(bool)
checks for a condition; if the condition isfalse
, it displays a message box that shows the call stack. If the condition istrue
, a failure message is not sent, and the message box is not displayed. - First, let’s call the method with an incorrect phone number. To have
true
as a result we’ll use negative boolean sign!
: - Run the application. There is no any output. It means that the phone number was incorrect, but since we placed negative
!
we have no error message. - After, let’s call the method with the correct phone number:
- And then, we call the method with incorrect phone number one more time:
- After we’ve placed all automatic tests, we should output the message, that tests are done:
- Run the application again and check the output. There hasn’t to be any output.
- And at last, we need to ask user to enter the number and to check to see if it is correct:
- Run the application again and check the output.
- Add comments with the text of the task and save the project. Download file
.cs
to the moodle system.
//...
using System.Text.RegularExpressions;
//...
//... using System.Text.RegularExpressions; using System.Diagnostics; //...
\+7
\+7\s
\+7\s\(
\+7\s\(863
\+7\s\(863\)
\+7\s\(863\)\s
\+7\s\(863\)\s[32]
\+7\s\(863\)\s\d{3}
\+7\s\(863\)\s\d{3}-
\+7\s\(863\)\s\d{3}-\d{2}
\+7\s\(863\)\s\d{3}-\d{2}-
\+7\s\(863\)\s\d{3}-\d{2}-\d{2}
^\+7\s\(863\)\s\d{3}-\d{2}-\d{2}$
static IsPhonenumber(string number) { ... }
IsMatch
method indicates whether the specified regular expression finds a match in the specified input string. The method returns true
or false
.static bool IsPhonenumber(string stringNumber) { return Regex.IsMatch(stringNumber, @"^\+7\s\(863\)\s\d{3}-\d{2}-\d{2}$"); }
Debug.Assert(!IsPhonenumber("+7 (800) 231-45-84"));
Debug.Assert(IsPhonenumber("+7 (863) 231-45-84"));
Debug.Assert(!IsPhonenumber("+7 (8631) 21-45-84"));
Console.WriteLine("Tests are done well");
Console.WriteLine("Please input phone number:"); string number = Console.ReadLine(); Console.WriteLine(IsPhonenumber(number));
To do: Ask user to input a date. Check to see if the date has the format dd-mm-yyyy
. Where :
dd
means the digits of a date, if there is only one digit it has to be as following: e.g. 02
; mm
means the digits of a month, also starting with 0 if there is only one, andyyyy
means the digits of a year
Note: Create a function to check the input. To check to see if the program works properly you must make three or four automatic tests. Use Debug.Assert
to do it.
Expected output:
Tests are done well Please input a date: 12/03/1975 The date format is incorrect +++++++++ Tests are done well Please input a date: 2-3-1975 The date format is incorrect +++++++++ Tests are done well Please input a date: 12-03-1975 The date format is correct +++++++++
[Solution and Project name: Lesson_12Task1
, file name L12Task1.cs
]
To do: Create a function that determines how many zip codes are there within the specified string (the zip code consists of 6 digits in a row).
Note 1: Create a method to make the calculations.
Note 2: The Count
method of Regex
class must be used.
Note 3: To check to see if the program works properly you must make three or four automatic tests. Use Debug.Assert
to do it.
Result example:
Tests are done well For the string '123: zip code 367824 is norther than 123712' we have 2 zip codes
[Solution and Project name: Lesson_12Lab2
, file name L12Lab2.cs
]
✍ How to do:
- Create a new project with a name and file name as it is given in the task.
- Place your cursor at the top of the code, after the place where classes and namespaces are included. Include the following namespace to use regular expressions’ methods:
- To make automatic tests one more class must be added. Place the following code after the previous:
- Create a function called
CountZip
with one argument — the inputted string. The function has to return an integer value — the number of all occurrences: - Now we’re going to create a pattern.
- First, we have to put the word boundaries to start the string and to finish it:
"\b...\b"
- Zip codes must have 6 digits in a row. So we can use
\d
for any digit and{6}
means that there have to be 6 digits: - So what we have in our pattern:
- We’re going to use
Matches
standard method to check to see how many times our pattern will match the string. - Place the following code inside the created method:
- Within the
Main
function we’re going to call createdCountZip
method. But we need to do it using the automatic test: the methodAssert
ofDebug
class.Assert(bool)
checks for a condition; if the condition is false, it displays a message box that shows the call stack. If the condition is true, a failure message is not sent and the message box is not displayed. - First, let’s call the method with a string with two zip codes. To have true as a result we’ll need to check to see if it is equal to 2:
- Run the application. There is no any output. It means that the test is done well.
- After, let’s call the method with string with no zip code inside of it:
- That’s enough. Let’s output the message that the tests are done:
- Run the application again and check the output. There hasn’t to be any output.
- And at last, we need to output the number of zip codes within the particular string. So we’ll declare a variable and assign that string to it:
- Run the application again and check the output.
- Add comments with the text of the task and save the project. Download file
.cs
to the moodle system.
//...
using System.Text.RegularExpressions;
//...
//...
using System.Text.RegularExpressions;
using System.Diagnostics;
//...
static CountZip(string zip) { ... }
"\b\d{6}\b"
\b begin the match at a word boundary \d{6} any digit, 6 of them in a row \b end the match at a word boundary
Matches(String)
method searches the specified input string for all occurrences of a regular expression. It returns a collection of the Match objects found by the search. If no matches are found, the method returns an empty collection object.var m = Regex.Matches(zip, @"\b\d{6}\b"); return m.Count;
Debug.Assert(CountZip("344113 34116 15 152566 14254124 12515 hello") == 2);
Debug.Assert(CountZip("hello") == 0); |
Console.WriteLine("Tests are done well"); |
string zipCode = "123: zip code 367824 is norther than 123712"; Console.WriteLine($"For string '{zipCode}' we have {CountZip(zipCode)} zip codes"); |
To do: Create a function that calculates how many emoticons are there within the specified string.
The emoticons can consist of the following characters:
;
(semicolon) or :
(colon) exactly once;-
(minus) symbol can go as many times as you want (including the minus symbol can go zero times);(
,)
, [
,]
;
Note 1: Create a method to make the calculations.
Note 2: The Count
method of Regex
class must be used.
Note 3: To check to see if the program works properly you must make three or four automatic tests. Use Debug.Assert
to do it.
Result example:
Tests are done well For the string 'Hello, daddy :) I miss you :-(' we have 2 emoticons
[Solution and Project name: Lesson_12Task2
, file name L12Task2.cs
]
To do: Create a function that delets extra white spaces from the specified string (there can be double, triple white spaces in a row, or any number of white spaces in a row).
Replace
method must be used.
Note 1: Create a method with three arguments to make the replacement. The three arguments are: the original string, the pattern string, and the replacement string. The replacement string has to be equal to " "
(single white space has to placed instead of some white spaces in a row).
Note 2: The Replace
method of Regex
class must be used.
Note 3: To check to see if the program works properly you must make three or four automatic tests. Use Debug.Assert
to do it.
Result example:
Tests are done well Original String: 'Hello World ' Replacement String: 'Hello World '
[Solution and Project name: Lesson_12Lab3
, file name L12Lab3.cs
]
✍ How to do:
- Create a new project with a name and file name as it is given in the task.
- Place your cursor at the top of the code, after the place where classes and namespaces are included. Include the following namespace to use regular expressions’ methods:
- To make automatic tests one more class must be added. Place the following code after the previous:
- Create a function called
ReplaceSpaces
with three arguments — the inputted string, the pattern, and the string to replace the pattern. The function has to return a string value — the replacement (resulting) string: - Now we’re going to create a pattern.
We have a special quantifier\s
to use a white space in a pattern. But there can be many white spaces in a row, for this reason, we need to use+
that means one or more characters. The pattern will be: - Within the
Main
function assign the created pattern to a variable called pattern: - After, declare one more variable to store the replacement string, the extra spaces in a row later will be replaced by
" "
— single white space: - We’re going to use
Replace
standard method to make a task. - Place the following code inside the created method:
- Within the
Main
function we’re going to call createdReplaceSpaces
method. But we need to do it using the automatic test first: the methodAssert
ofDebug
class.Assert(bool)
checks for a condition; if the condition is false, it displays a message box that shows the call stack. If the condition is true, a failure message is not sent, and the message box is not displayed. - Let’s call the method providing it a string with extra white space. To have true as a result we’ll need to place the following code as a condition of the
Assert
method (you must do it inside theMain
function): - Run the application. There is only the label «Tests are done well» on the console. It means that the function works properly.
- And at last, we need to replace extra white space within the particular string. So we’ll declare a variable and assign that string to it:
- Run the application again and check the output.
- Add comments with the text of the task and save the project. Download file
.cs
to the moodle system.
//... using System.Text.RegularExpressions; //... |
//... using System.Text.RegularExpressions; using System.Diagnostics; //... |
static string ReplaceSpaces(string input, string pattern, string replacement) { ... } |
"\s+"
string pattern = @"\s+"; |
string replacement = " "; |
Replace(string input, string replacement)
: In a specified input string, replaces all strings that match a regular expression pattern with a specified replacement string.Regex rgx = new Regex(pattern); string result = rgx.Replace(input, replacement); return result; |
Debug.Assert(ReplaceSpaces(" Good day !",pattern, replacement) == " Good day !"); Console.WriteLine("Tests are done well"); |
string input = "Hello World "; Console.WriteLine($"Original String: {input}"); Console.WriteLine($"Replacement String: {ReplaceSpaces(input, pattern, replacement)}"); Console.ReadKey(); // to stop the console window while using a debugging mode |
To do: Check the value of a string type variable to see if it contains a text frames with asterisks. Replace this text with the tag <em></em>
. Do not change text in double asterisks.
Note: Create a function to make these replacements (with a signature: static void ConvertText(ref string s)
). To check to see if the program works properly you must make three or four automatic tests. Use Debug.Assert
to do it.
Expected output:
Tests are done well input: *this is italic* output: <em>this is italic</em> +++++++++ input: **bold text (not italic)** output: **bold text (not italic)**
[Solution and Project name: Lesson_12Task3
, file name L12Task3.cs
]
To do: A string with a value is given. Find all IPv4 addresses (in decimal notation with dots as a separator) and store them to a new variable of a string type. Print out the value of this variable.
Note 1: IPv4 addresses in decimal notation with dots as a separator have a format: xxx.255.255.255
(the first part must be a three-digits number (from 100 to 255), each other part can be from 1 through 255 maximum).
Note 2: Create a function to make a search (with a signature: static string FindAddresses(string s)
).
Note 3: To have a beatiful output don’t forget to use an escape symbol \n
to have a new line.
Expected output:
for text 444.34.56.78 125.34.56.78 125.34.56.78 12.34.56.78 words 255.133.255.133 255.1333.255.133 addresses are: 125.34.56.78 125.34.56.78 255.133.255.133
[Solution and Project name: Lesson_12Task4
, file name L12Task4.cs
]
To do: Determine whether the string is a domain name with http
and https
protocols, with an optional slash
(\
) at the end.
Note: Create a function which returns a boolean type to determine it.
Expected output:
for text http://example.com/ result is true for text http:/example.com/ result is false for text http//example.com/ result is false for text https://example.com/ result is true for text https://example.ru result is true for text http://exampleru/ result is false
[Solution and Project name: Lesson_12Task5
, file name L12Task5.cs
]