Содержание:
Lesson # 12. Theory
- Regular Expressions allow us to search for specific patterns of text.
- Pattern string contains the wildcards.
using System.Text.RegularExpressions; // main namespace |
.NET classes for regular expressions:
Static methods of Regex class
1. Match Regex.Match(s,pattern) 2. bool Regex.IsMatch(s,pattern) 3. MatchCollection Regex.Matches(s,pattern) 4. string Regex.Replace(s,pattern,replace_s) 5. string[] Regex.Split(s,pattern)
Instance methods (for reusable use of a single pattern)
var r = new Regex(pattern); r.Match(s) r.IsMatch(s) r.Matches(s) r.Replace(s,replace_s) |
Match class variables and their properties
string s = "one two three four two five alice two"; Match m = Regex.Match(s, "two"); //1. m.Success Console.WriteLine(m.Success); // True //2. m.Value Console.WriteLine(m.Value); // two //3. m.Index Console.WriteLine(m.Index); // 4 //4. m.Length Console.WriteLine(m.Length); // 2 //5. m.NextMatch().Index Console.WriteLine(m.NextMatch().Index); // 19 |
MatchCollection class – Count property.
foreach (var m in MatchCollection) // here m is Match type |
Sample 1:
//... // including the main namespace of RegularExpressions using System.Text.RegularExpressions; //... // just some string string s = " one two three four two five alice two"; var m = Regex.Match(s, "two"); // pattern // methods: Console.WriteLine(m.Index); // output: 5 Console.WriteLine(m.NextMatch().Index); // output: 20 Console.WriteLine(m.NextMatch().NextMatch().Index); // output: 35 |
Sample 2:
// including the main namespace of RegularExpressions using System.Text.RegularExpressions; //... // just some string string s = " one two three four two five alice two"; // Using a loop to iterate through text foreach (Match m in Regex.Matches(s, "two")) { Console.Write(m.Index + " "); // output: 5 20 35 } |
Sample 3:
// including the main namespace of RegularExpressions using System.Text.RegularExpressions; //... // just some string string s = " one two three four two five alice two"; var ss = s.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries); var ss = Regex.Split(s, " +"); Console.WriteLine(ss); |
Examples of regular expressions
| The text where to find | Pattern string to find |
|---|---|
| asdasdasdhelloasdasdasd | @“hello” |
| hello | @“^hello$” |
| asdasdelholasdasdasd
asdasdeeeeeedfgdgdg |
@“[hello]” or @“[hello]{6}” |
| SdfuiyuiewrR345 | @“[a-zA-Z0-9]” or @“\w” |
| 452341 | @“[0-9]” or @“\d” |
| jsdf8H?& | @“.” |
Example:
string s = "asdasdasdhelloasdasdasd"; Match m = Regex.Match(s, @"hello"); Console.WriteLine(m.Success); // True m = Regex.Match(s, @"^hello"); Console.WriteLine(m.Success); // False |
Metacharacters and escaping
The following metacharacters have a special purpose in regular expressions:
( ) { } [ ] ? * + - ^ $ . | \
If you want these characters to mean literally (e.g. . as a period), you may need to do what is called «escaping». This is done by preceding the character with a \.
Of course, a \ is also an escape character for C# string literals. To get a literal \, you need to double it in your string literal (i.e. «\\» is a string of length one). Alternatively, C# also has what is called verbatim @ string literals, where escape sequences are not processed. Thus, the following two strings are equal:
"c:\\Docs\\Source\\a.txt" @"c:\Docs\Source\a.txt" |
Quantifiers
| Wildcards | Explanation | Example | Sample Match |
|---|---|---|---|
| \d | one digit from 0 to 9 | file_\d\d | file_25 |
| \w | «word character»: Unicode letter, ideogram, digit, or connector | \w—\w\w\w |
A-b_1 |
| \s | «whitespace character»: any Unicode separator | a\sb\sc | a b c |
| \D | One character that is not a digit | \D\D\D | ABC |
| \W | One character that is not a word character as defined by your engine’s \w | \W\W\W\W\W | *-+=) |
| \S | One character that is not a whitespace character as defined by your engine’s \s | \S\S\S\S | Yoyo |
| \b | Word boundaries | ||
| \B | Non-word boundaries | ||
| . | Any character except line break | a.c | abc |
| \. | A period (special character: needs to be escaped by a \) | a\.c | a.c |
| \ | Escapes a special character | \[\{\(\)\}\] | [{()}] |
| Quantifier | Explanation | Example | Sample Match |
|---|---|---|---|
| + | One or more | Version \w-\w+ | Version A-b1_1 |
| {3} | Exactly three times | \D{3} | ABC |
| {2,4} | Two to four times | \d{2,4} | 156 |
| {3,} | Three or more times | \w{3,} | regex_tutorial |
| * | Zero or more times | A*B*C* | AAACC |
| ? | Once or none | plurals? | plural |
| […] | Any character within the braces | ||
| | | A character before | OR after it |
cat|dog | sdfcatsdf |
| Zero-length directives | |||
| ^ | Search from the beginning of the line | ||
| $ | Search to the end of the line | ||
| \b | position on a word boundary | ||
Replacements with a help of regular expressions
string s = "10+2=12"; s = Regex.Replace(s, @"\d+", "<$0>"); // <10>+<2>=<12> s = Regex.Replace(s, @"\d+", m => (int.Parse(m.Value) * 2).ToString()); // 20+4=24 |
Examples
Example 1:
The following example matches words that start with ‘S’
using System; using System.Text.RegularExpressions; namespace RegExApplication { class Program { private static void showMatch(string text, string expr) { Console.WriteLine("The Expression: " + expr); MatchCollection mc = Regex.Matches(text, expr); foreach (Match m in mc) { Console.WriteLine(m); } } static void Main(string[] args) { string str = "A Thousand Splendid Suns"; Console.WriteLine("Matching words that start with 'S': "); showMatch(str, @"\bS\S*"); Console.ReadKey(); } } } |
Result:
Matching words that start with 'S': The Expression: \bS\S* Splendid Suns
Example 2:
The following example matches words that start with ‘m’ and ends with ‘e’
using System; using System.Text.RegularExpressions; namespace RegExApplication { class Program { private static void showMatch(string text, string expr) { Console.WriteLine("The Expression: " + expr); MatchCollection mc = Regex.Matches(text, expr); foreach (Match m in mc) { Console.WriteLine(m); } } static void Main(string[] args) { string str = "make maze and manage to measure it"; Console.WriteLine("Matching words start with 'm' and ends with 'e':"); showMatch(str, @"\bm\S*e\b"); Console.ReadKey(); } } } |
Result:
Matching words start with 'm' and ends with 'e': The Expression: \bm\S*e\b make maze manage measure
Example 3:
This example replaces extra white space
Live Demo using System; using System.Text.RegularExpressions; namespace RegExApplication { class Program { static void Main(string[] args) { string input = "Hello World "; string pattern = "\\s+"; string replacement = " "; Regex rgx = new Regex(pattern); string result = rgx.Replace(input, replacement); Console.WriteLine("Original String: {0}", input); Console.WriteLine("Replacement String: {0}", result); Console.ReadKey(); } } } |
Result:
Original String: Hello World Replacement String: Hello World
Labs and Tasks
To do: Ask user to input a phone number. Check to see if the entered number is a Rostov phone number in Federal format (if it is, so the number must have a format:
+7 (863) 3**-**-** or +7 (863) 2**-**-**). Where * means any digit. Write a function that returns a Boolean value (true or false).
Note 1: The string must not contain any other text except a phone number, so the corresponding regular expression must contain the ^ and $ markers.
Note 2: Since the +, ( and ) symbols have a special value in regular expressions, they must be escaped, like it is here: \+.
Note 3: To check to see if the program works properly you must make three or four automatic tests. Use Debug.Assert to do it.
Result example:
Tests are done well Please input phone number: +7 (863) 323-22-12 True +++++++++++++++++ Tests are done well Please input phone number: 7 (863) 323-22-12 False +++++++++++++++++ Tests are done well Please input phone number: +7 (8634) 323-22-12 False
[Solution and Project name: Lesson_12Lab1, file name L12Lab1.cs]
✍ How to do:
- Create a new project with a name and file name as it is given in the task.
- Place your cursor at the top of the code, after the place where classes and namespaces are included. Include the following namespace to use regular expressions’ methods:
- To make automatic tests one more class must be added. Place the following code after the previous:
- Let’s consider the phone number symbol by symbol:
+7 (863) 3**-**-**or+7 (863) 2**-**-**+: it is a special character, it means that to use it in our pattern we need to put\before it to escape the special character. So we have:
\+
7we have a particular number, so we don’t need to use some quantifier or character. Now we have:whitespace character: We have a special quantifier\sto use a whitespace in a pattern:(: it is a special character, it means that to use it in our pattern we need to put\before it to escape the special character:863: they are particular numbers, so we don’t need to use some quantifier or character. Now we have:): we need to put\before it to escape the special character:whitespace character: We have a special quantifier\sto use a whitespace in a pattern:3 or 2: We have place[]braces within a pattern, that means any character inside the braces:any digit: We have a special quantifier\dto use one digit from 0 to 9 in a pattern. To combine 3 digits (we have *** in the phone number) we will use curly braces with a specified number of the digits:-: it is a particular character, so we don’t need to use some quantifier or character. Now we have:any digit: We have a special quantifier\dto use one digit from 0 to 9 in a pattern. To combine 2 digits we will use curly braces with a specified number of the digits:-: it is a particular character:any digit:- We have a note in the task, that the text with a phone number must not contain any other text except a phone number. So it has to begin with
^quantifier and end with$quantifier, which mean the beginning and end of our template: - Create a function named
IsPhonenumber()that has one parameter — inputted string, and returns the boolean type —trueorfalse: - Inside the created function we’re going to use
IsMatchstatic method of Regex class that has two parameters, they are input string and regular expression. - Within the
Mainfunction we’re going to call createdIsPhonenumbermethod. But we need to do it using the automatic test: the methodAssertofDebugclass.Assert(bool)checks for a condition; if the condition isfalse, it displays a message box that shows the call stack. If the condition istrue, a failure message is not sent, and the message box is not displayed. - First, let’s call the method with an incorrect phone number. To have
trueas a result we’ll use negative boolean sign!: - Run the application. There is no any output. It means that the phone number was incorrect, but since we placed negative
!we have no error message. - After, let’s call the method with the correct phone number:
- And then, we call the method with incorrect phone number one more time:
- After we’ve placed all automatic tests, we should output the message, that tests are done:
- Run the application again and check the output. There hasn’t to be any output.
- And at last, we need to ask user to enter the number and to check to see if it is correct:
- Run the application again and check the output.
- Add comments with the text of the task and save the project. Download file
.csto the moodle system.
//...
using System.Text.RegularExpressions;
//...
//... using System.Text.RegularExpressions; using System.Diagnostics; //...
\+7
\+7\s
\+7\s\(
\+7\s\(863
\+7\s\(863\)
\+7\s\(863\)\s
\+7\s\(863\)\s[32]
\+7\s\(863\)\s\d{3}
\+7\s\(863\)\s\d{3}-
\+7\s\(863\)\s\d{3}-\d{2}
\+7\s\(863\)\s\d{3}-\d{2}-
\+7\s\(863\)\s\d{3}-\d{2}-\d{2}
^\+7\s\(863\)\s\d{3}-\d{2}-\d{2}$
static IsPhonenumber(string number) { ... }
IsMatch method indicates whether the specified regular expression finds a match in the specified input string. The method returns true or false.
static bool IsPhonenumber(string stringNumber)
{
return Regex.IsMatch(stringNumber, @"^\+7\s\(863\)\s\d{3}-\d{2}-\d{2}$");
}
Debug.Assert(!IsPhonenumber("+7 (800) 231-45-84"));
Debug.Assert(IsPhonenumber("+7 (863) 231-45-84"));
Debug.Assert(!IsPhonenumber("+7 (8631) 21-45-84"));
Console.WriteLine("Tests are done well");
Console.WriteLine("Please input phone number:"); string number = Console.ReadLine(); Console.WriteLine(IsPhonenumber(number));
To do: Ask user to input a date. Check to see if the date has the format dd-mm-yyyy. Where :
dd means the digits of a date, if there is only one digit it has to be as following: e.g. 02; mm means the digits of a month, also starting with 0 if there is only one, andyyyy means the digits of a year
Note: Create a function to check the input. To check to see if the program works properly you must make three or four automatic tests. Use Debug.Assert to do it.
Expected output:
Tests are done well Please input a date: 12/03/1975 The date format is incorrect +++++++++ Tests are done well Please input a date: 2-3-1975 The date format is incorrect +++++++++ Tests are done well Please input a date: 12-03-1975 The date format is correct +++++++++
[Solution and Project name: Lesson_12Task1, file name L12Task1.cs]
To do: Create a function that determines how many zip codes are there within the specified string (the zip code consists of 6 digits in a row).
Note 1: Create a method to make the calculations.
Note 2: The Count method of Regex class must be used.
Note 3: To check to see if the program works properly you must make three or four automatic tests. Use Debug.Assert to do it.
Result example:
Tests are done well For the string '123: zip code 367824 is norther than 123712' we have 2 zip codes
[Solution and Project name: Lesson_12Lab2, file name L12Lab2.cs]
✍ How to do:
- Create a new project with a name and file name as it is given in the task.
- Place your cursor at the top of the code, after the place where classes and namespaces are included. Include the following namespace to use regular expressions’ methods:
- To make automatic tests one more class must be added. Place the following code after the previous:
- Create a function called
CountZipwith one argument — the inputted string. The function has to return an integer value — the number of all occurrences: - Now we’re going to create a pattern.
- First, we have to put the word boundaries to start the string and to finish it:
"\b...\b"
- Zip codes must have 6 digits in a row. So we can use
\dfor any digit and{6}means that there have to be 6 digits: - So what we have in our pattern:
- We’re going to use
Matchesstandard method to check to see how many times our pattern will match the string. - Place the following code inside the created method:
- Within the
Mainfunction we’re going to call createdCountZipmethod. But we need to do it using the automatic test: the methodAssertofDebugclass.Assert(bool)checks for a condition; if the condition is false, it displays a message box that shows the call stack. If the condition is true, a failure message is not sent and the message box is not displayed. - First, let’s call the method with a string with two zip codes. To have true as a result we’ll need to check to see if it is equal to 2:
- Run the application. There is no any output. It means that the test is done well.
- After, let’s call the method with string with no zip code inside of it:
- That’s enough. Let’s output the message that the tests are done:
- Run the application again and check the output. There hasn’t to be any output.
- And at last, we need to output the number of zip codes within the particular string. So we’ll declare a variable and assign that string to it:
- Run the application again and check the output.
- Add comments with the text of the task and save the project. Download file
.csto the moodle system.
//...
using System.Text.RegularExpressions;
//...
//...
using System.Text.RegularExpressions;
using System.Diagnostics;
//...
static CountZip(string zip) { ... }
"\b\d{6}\b"
\b begin the match at a word boundary \d{6} any digit, 6 of them in a row \b end the match at a word boundary
Matches(String) method searches the specified input string for all occurrences of a regular expression. It returns a collection of the Match objects found by the search. If no matches are found, the method returns an empty collection object.var m = Regex.Matches(zip, @"\b\d{6}\b"); return m.Count;
Debug.Assert(CountZip("344113 34116 15 152566 14254124 12515 hello") == 2);
Debug.Assert(CountZip("hello") == 0); |
Console.WriteLine("Tests are done well"); |
string zipCode = "123: zip code 367824 is norther than 123712"; Console.WriteLine($"For string '{zipCode}' we have {CountZip(zipCode)} zip codes"); |
To do: Create a function that calculates how many emoticons are there within the specified string.
The emoticons can consist of the following characters:
; (semicolon) or : (colon) exactly once;- (minus) symbol can go as many times as you want (including the minus symbol can go zero times);(,), [,];
Note 1: Create a method to make the calculations.
Note 2: The Count method of Regex class must be used.
Note 3: To check to see if the program works properly you must make three or four automatic tests. Use Debug.Assert to do it.
Result example:
Tests are done well
For the string 'Hello, daddy :) I miss you :-('
we have 2 emoticons
[Solution and Project name: Lesson_12Task2, file name L12Task2.cs]
To do: Create a function that delets extra white spaces from the specified string (there can be double, triple white spaces in a row, or any number of white spaces in a row).
Replace method must be used.
Note 1: Create a method with three arguments to make the replacement. The three arguments are: the original string, the pattern string, and the replacement string. The replacement string has to be equal to " " (single white space has to placed instead of some white spaces in a row).
Note 2: The Replace method of Regex class must be used.
Note 3: To check to see if the program works properly you must make three or four automatic tests. Use Debug.Assert to do it.
Result example:
Tests are done well Original String: 'Hello World ' Replacement String: 'Hello World '
[Solution and Project name: Lesson_12Lab3, file name L12Lab3.cs]
✍ How to do:
- Create a new project with a name and file name as it is given in the task.
- Place your cursor at the top of the code, after the place where classes and namespaces are included. Include the following namespace to use regular expressions’ methods:
- To make automatic tests one more class must be added. Place the following code after the previous:
- Create a function called
ReplaceSpaceswith three arguments — the inputted string, the pattern, and the string to replace the pattern. The function has to return a string value — the replacement (resulting) string: - Now we’re going to create a pattern.
We have a special quantifier\sto use a white space in a pattern. But there can be many white spaces in a row, for this reason, we need to use+that means one or more characters. The pattern will be: - Within the
Mainfunction assign the created pattern to a variable called pattern: - After, declare one more variable to store the replacement string, the extra spaces in a row later will be replaced by
" "— single white space: - We’re going to use
Replacestandard method to make a task. - Place the following code inside the created method:
- Within the
Mainfunction we’re going to call createdReplaceSpacesmethod. But we need to do it using the automatic test first: the methodAssertofDebugclass.Assert(bool)checks for a condition; if the condition is false, it displays a message box that shows the call stack. If the condition is true, a failure message is not sent, and the message box is not displayed. - Let’s call the method providing it a string with extra white space. To have true as a result we’ll need to place the following code as a condition of the
Assertmethod (you must do it inside theMainfunction): - Run the application. There is only the label «Tests are done well» on the console. It means that the function works properly.
- And at last, we need to replace extra white space within the particular string. So we’ll declare a variable and assign that string to it:
- Run the application again and check the output.
- Add comments with the text of the task and save the project. Download file
.csto the moodle system.
//... using System.Text.RegularExpressions; //... |
//... using System.Text.RegularExpressions; using System.Diagnostics; //... |
static string ReplaceSpaces(string input, string pattern, string replacement) { ... } |
"\s+"
string pattern = @"\s+"; |
string replacement = " "; |
Replace(string input, string replacement): In a specified input string, replaces all strings that match a regular expression pattern with a specified replacement string.Regex rgx = new Regex(pattern); string result = rgx.Replace(input, replacement); return result; |
Debug.Assert(ReplaceSpaces(" Good day !",pattern, replacement) == " Good day !"); Console.WriteLine("Tests are done well"); |
string input = "Hello World "; Console.WriteLine($"Original String: {input}"); Console.WriteLine($"Replacement String: {ReplaceSpaces(input, pattern, replacement)}"); Console.ReadKey(); // to stop the console window while using a debugging mode |
To do: Check the value of a string type variable to see if it contains a text frames with asterisks. Replace this text with the tag <em></em>. Do not change text in double asterisks.
Note: Create a function to make these replacements (with a signature: static void ConvertText(ref string s)). To check to see if the program works properly you must make three or four automatic tests. Use Debug.Assert to do it.
Expected output:
Tests are done well input: *this is italic* output: <em>this is italic</em> +++++++++ input: **bold text (not italic)** output: **bold text (not italic)**
[Solution and Project name: Lesson_12Task3, file name L12Task3.cs]
To do: A string with a value is given. Find all IPv4 addresses (in decimal notation with dots as a separator) and store them to a new variable of a string type. Print out the value of this variable.
Note 1: IPv4 addresses in decimal notation with dots as a separator have a format: xxx.255.255.255 (the first part must be a three-digits number (from 100 to 255), each other part can be from 1 through 255 maximum).
Note 2: Create a function to make a search (with a signature: static string FindAddresses(string s)).
Note 3: To have a beatiful output don’t forget to use an escape symbol \n to have a new line.
Expected output:
for text 444.34.56.78 125.34.56.78 125.34.56.78 12.34.56.78 words 255.133.255.133 255.1333.255.133 addresses are: 125.34.56.78 125.34.56.78 255.133.255.133
[Solution and Project name: Lesson_12Task4, file name L12Task4.cs]
To do: Determine whether the string is a domain name with http and https protocols, with an optional slash (\) at the end.
Note: Create a function which returns a boolean type to determine it.
Expected output:
for text http://example.com/ result is true for text http:/example.com/ result is false for text http//example.com/ result is false for text https://example.com/ result is true for text https://example.ru result is true for text http://exampleru/ result is false
[Solution and Project name: Lesson_12Task5, file name L12Task5.cs]