Lesson # 12. Regular expressions

Programming on c sharp in Microsoft Visual Studio. Using Regular expressions.

Lesson # 12. Theory

  • Regular Expressions allow us to search for specific patterns of text.
  • Pattern string contains the wildcards.
using System.Text.RegularExpressions; // main namespace

.NET classes for regular expressions:

Static methods of Regex class

    1. Match Regex.Match(s,pattern)
    2. bool Regex.IsMatch(s,pattern)
    3. MatchCollection Regex.Matches(s,pattern)
    4. string Regex.Replace(s,pattern,replace_s)
    5. string[] Regex.Split(s,pattern)
    

    Instance methods (for reusable use of a single pattern)

    var r = new Regex(pattern);
    r.Match(s) 
    r.IsMatch(s) 
    r.Matches(s) 
    r.Replace(s,replace_s)

    Match class variables and their properties

    string s = "one two three four two five alice two";
    Match m = Regex.Match(s, "two"); 
     
    //1.  m.Success
    Console.WriteLine(m.Success); // True
    //2. m.Value
    Console.WriteLine(m.Value); //  two  
    //3. m.Index 
    Console.WriteLine(m.Index); //  4    
    //4. m.Length
    Console.WriteLine(m.Length); //  2  
    //5. m.NextMatch().Index
    Console.WriteLine(m.NextMatch().Index); //  19

    MatchCollection classCount property.

    foreach (var m in MatchCollection) 
    // here m is Match type

    Sample 1:

    //...
    // including the main namespace of RegularExpressions
    using System.Text.RegularExpressions;
    //...
    // just some string
    string s = " one two three four two five alice two";
    var m = Regex.Match(s, "two"); // pattern
     
    // methods:
    Console.WriteLine(m.Index); // output: 5
    Console.WriteLine(m.NextMatch().Index); // output: 20
    Console.WriteLine(m.NextMatch().NextMatch().Index); // output: 35

    Sample 2:

    // including the main namespace of RegularExpressions
    using System.Text.RegularExpressions;
    //...
    // just some string
    string s = " one two three four two five alice two";
    // Using a loop to iterate through text
    foreach (Match m in Regex.Matches(s, "two"))
                {
                    Console.Write(m.Index + " "); // output: 5 20 35
                }

    Sample 3:

    // including the main namespace of RegularExpressions
    using System.Text.RegularExpressions;
    //...
    // just some string
    string s = " one two three four two five alice two";
     
    var ss = s.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
    var ss = Regex.Split(s, " +");
    Console.WriteLine(ss);

Examples of regular expressions

The text where to find Pattern string to find
asdasdasdhelloasdasdasd @“hello”
hello @“^hello$”
asdasdelholasdasdasd

asdasdeeeeeedfgdgdg

@“[hello]”  or   @“[hello]{6}”
SdfuiyuiewrR345 @“[a-zA-Z0-9]”  or  @“\w”
452341 @“[0-9]”    or   @“\d”
jsdf8H?& @“.”

Example:

string s = "asdasdasdhelloasdasdasd";
Match m = Regex.Match(s, @"hello");
Console.WriteLine(m.Success); //  True 
m = Regex.Match(s, @"^hello");
Console.WriteLine(m.Success); //  False

Metacharacters and escaping

The following metacharacters have a special purpose in regular expressions:

(   )   {   }   [   ]   ?   *   +   -   ^   $   .   |   \

If you want these characters to mean literally (e.g. . as a period), you may need to do what is called «escaping». This is done by preceding the character with a \.

Of course, a \ is also an escape character for C# string literals. To get a literal \, you need to double it in your string literal (i.e. «\\» is a string of length one). Alternatively, C# also has what is called verbatim @ string literals, where escape sequences are not processed. Thus, the following two strings are equal:

"c:\\Docs\\Source\\a.txt"
@"c:\Docs\Source\a.txt"

Quantifiers

    Characters
    Wildcards Explanation Example Sample Match
    \d one digit from 0 to 9 file_\d\d file_25
    \w «word character»: Unicode letter, ideogram, digit, or connector \w\w\w\w A-b_1
    \s «whitespace character»: any Unicode separator a\sb\sc a b
    c
    \D One character that is not a digit \D\D\D ABC
    \W One character that is not a word character as defined by your engine’s \w \W\W\W\W\W *-+=)
    \S One character that is not a whitespace character as defined by your engine’s \s \S\S\S\S Yoyo
    \b Word boundaries
    \B Non-word boundaries
    . Any character except line break a.c abc
    \. A period (special character: needs to be escaped by a \) a\.c a.c
    \ Escapes a special character \[\{\(\)\}\] [{()}]
    Quantifiers
    Quantifier Explanation Example Sample Match
    + One or more Version \w-\w+ Version A-b1_1
    {3} Exactly three times \D{3} ABC
    {2,4} Two to four times \d{2,4} 156
    {3,} Three or more times \w{3,} regex_tutorial
    * Zero or more times A*B*C* AAACC
    ? Once or none plurals? plural
    [] Any character within the braces
    | A character before | OR after it cat|dog sdfcatsdf
    Zero-length directives
    ^ Search from the beginning of the line
    $ Search to the end of the line
    \b position on a word boundary

    Replacements with a help of regular expressions

    string s = "10+2=12";
    s = Regex.Replace(s, @"\d+", "<$0>"); // <10>+<2>=<12>
    s = Regex.Replace(s, @"\d+",
            m => (int.Parse(m.Value) * 2).ToString()); // 20+4=24

     

    Examples

    Example 1:
    The following example matches words that start with ‘S’

    using System;
    using System.Text.RegularExpressions;
     
    namespace RegExApplication {
       class Program {
          private static void showMatch(string text, string expr) {
             Console.WriteLine("The Expression: " + expr);
             MatchCollection mc = Regex.Matches(text, expr);
     
             foreach (Match m in mc) {
                Console.WriteLine(m);
             }
          }
          static void Main(string[] args) {
             string str = "A Thousand Splendid Suns";
     
             Console.WriteLine("Matching words that start with 'S': ");
             showMatch(str, @"\bS\S*");
             Console.ReadKey();
          }
       }
    }

    Result:

      Matching words that start with 'S':
      The Expression: \bS\S*
      Splendid
      Suns
      

    Example 2:
    The following example matches words that start with ‘m’ and ends with ‘e’

    using System;
    using System.Text.RegularExpressions;
     
    namespace RegExApplication {
       class Program {
          private static void showMatch(string text, string expr) {
             Console.WriteLine("The Expression: " + expr);
             MatchCollection mc = Regex.Matches(text, expr);
     
             foreach (Match m in mc) {
                Console.WriteLine(m);
             }
          }
          static void Main(string[] args) {
             string str = "make maze and manage to measure it";
     
             Console.WriteLine("Matching words start with 'm' and ends with 'e':");
             showMatch(str, @"\bm\S*e\b");
             Console.ReadKey();
          }
       }
    }

    Result:

      Matching words start with 'm' and ends with 'e':
      The Expression: \bm\S*e\b
      make
      maze
      manage
      measure
      

    Example 3:
    This example replaces extra white space

    Live Demo
    using System;
    using System.Text.RegularExpressions;
     
    namespace RegExApplication {
       class Program {
          static void Main(string[] args) {
             string input = "Hello   World   ";
             string pattern = "\\s+";
             string replacement = " ";
     
             Regex rgx = new Regex(pattern);
             string result = rgx.Replace(input, replacement);
     
             Console.WriteLine("Original String: {0}", input);
             Console.WriteLine("Replacement String: {0}", result);    
             Console.ReadKey();
          }
       }
    }

    Result:

      Original String: Hello World   
      Replacement String: Hello World
      

Labs and Tasks

Lab 1. Regex class and patterns
  
To do: Ask user to input a phone number. Check to see if the entered number is a Rostov phone number in Federal format (if it is, so the number must have a format: +7 (863) 3**-**-** or +7 (863) 2**-**-**). Where * means any digit. Write a function that returns a Boolean value (true or false).

Note 1: The string must not contain any other text except a phone number, so the corresponding regular expression must contain the ^ and $ markers.

Note 2: Since the +, ( and ) symbols have a special value in regular expressions, they must be escaped, like it is here: \+.

Note 3: To check to see if the program works properly you must make three or four automatic tests. Use Debug.Assert to do it.  

Result example:

Tests are done well
Please input phone number:
+7 (863) 323-22-12
True
+++++++++++++++++
Tests are done well
Please input phone number:
7 (863) 323-22-12
False
+++++++++++++++++
Tests are done well
Please input phone number:
+7 (8634) 323-22-12
False

 
[Solution and Project name: Lesson_12Lab1, file name L12Lab1.cs]

✍ How to do:

  1. Create a new project with a name and file name as it is given in the task.
  2. Place your cursor at the top of the code, after the place where classes and namespaces are included. Include the following namespace to use regular expressions’ methods:
  3. //...
    using System.Text.RegularExpressions;
    //...
    
  4. To make automatic tests one more class must be added. Place the following code after the previous:
  5. //...
    using System.Text.RegularExpressions;
    using System.Diagnostics;
    //...
    
  6. Let’s consider the phone number symbol by symbol:
    +7 (863) 3**-**-** or +7 (863) 2**-**-**

    • + : it is a special character, it means that to use it in our pattern we need to put \ before it to escape the special character. So we have:
    • \+
      
    • 7 we have a particular number, so we don’t need to use some quantifier or character. Now we have:
    • \+7
      
    • whitespace character : We have a special quantifier \s to use a whitespace in a pattern:
    • \+7\s
      
    • ( : it is a special character, it means that to use it in our pattern we need to put \ before it to escape the special character:
    • \+7\s\(
      
    • 863 : they are particular numbers, so we don’t need to use some quantifier or character. Now we have:
    • \+7\s\(863
      
    • ) : we need to put \ before it to escape the special character:
    • \+7\s\(863\)
      
    • whitespace character : We have a special quantifier \s to use a whitespace in a pattern:
    • \+7\s\(863\)\s
      
    • 3 or 2 : We have place [] braces within a pattern, that means any character inside the braces:
    • \+7\s\(863\)\s[32]
      
    • any digit : We have a special quantifier \d to use one digit from 0 to 9 in a pattern. To combine 3 digits (we have *** in the phone number) we will use curly braces with a specified number of the digits:
    • \+7\s\(863\)\s\d{3}
      
    • -: it is a particular character, so we don’t need to use some quantifier or character. Now we have:
    • \+7\s\(863\)\s\d{3}-
      
    • any digit : We have a special quantifier \d to use one digit from 0 to 9 in a pattern. To combine 2 digits we will use curly braces with a specified number of the digits:
    • \+7\s\(863\)\s\d{3}-\d{2}
      
    • -: it is a particular character:
    • \+7\s\(863\)\s\d{3}-\d{2}-
      
    • any digit:
    • \+7\s\(863\)\s\d{3}-\d{2}-\d{2}
      
  7. We have a note in the task, that the text with a phone number must not contain any other text except a phone number. So it has to begin with ^ quantifier and end with $ quantifier, which mean the beginning and end of our template:
  8. ^\+7\s\(863\)\s\d{3}-\d{2}-\d{2}$
    
  9. Create a function named IsPhonenumber() that has one parameter — inputted string, and returns the boolean typetrue or false:
  10. static bool IsPhonenumber(string number)
      {
        ...
      }
    
  11. Inside the created function we’re going to use IsMatch static method of Regex class that has two parameters, they are input string and regular expression.
  12. IsMatch method indicates whether the specified regular expression finds a match in the specified input string. The method returns true or false.
    static bool IsPhonenumber(string stringNumber)
      {
        return Regex.IsMatch(stringNumber, @"^\+7\s\(863\)\s\d{3}-\d{2}-\d{2}$");
      }
    
  13. Within the Main function we’re going to call created IsPhonenumber method. But we need to do it using the automatic test: the method Assert of Debug class.
    Assert(bool) checks for a condition; if the condition is false, it displays a message box that shows the call stack. If the condition is true, a failure message is not sent, and the message box is not displayed.
  14. First, let’s call the method with an incorrect phone number. To have true as a result we’ll use negative boolean sign !:
  15. Debug.Assert(!IsPhonenumber("+7 (800) 231-45-84"));
    
  16. Run the application. There is no any output. It means that the phone number was incorrect, but since we placed negative ! we have no error message.
  17. After, let’s call the method with the correct phone number:
  18. Debug.Assert(IsPhonenumber("+7 (863) 231-45-84"));
    
  19. And then, we call the method with incorrect phone number one more time:
  20. Debug.Assert(!IsPhonenumber("+7 (8631) 21-45-84"));
    
  21. After we’ve placed all automatic tests, we should output the message, that tests are done:
  22. Console.WriteLine("Tests are done well");
    
  23. Run the application again and check the output. There hasn’t to be any output.
  24. And at last, we need to ask user to enter the number and to check to see if it is correct:
  25. Console.WriteLine("Please input phone number:");
    string number = Console.ReadLine();
    Console.WriteLine(IsPhonenumber(number));
    
  26. Run the application again and check the output.
  27. Add comments with the text of the task and save the project. Download file .cs to the moodle system.

Task 1:

To do: Ask user to input a date. Check to see if the date has the format dd-mm-yyyy. Where :

  • dd means the digits of a date, if there is only one digit it has to be as following: e.g. 02;
  • mm means the digits of a month, also starting with 0 if there is only one, and
  • yyyy means the digits of a year
  •   
    Note: Create a function to check the input. To check to see if the program works properly you must make three or four automatic tests. Use Debug.Assert to do it.
         

    Expected output:

    Tests are done well
    Please input a date:
    12/03/1975
    The date format is incorrect
    +++++++++
    Tests are done well
    Please input a date:
    2-3-1975
    The date format is incorrect
    +++++++++
    Tests are done well
    Please input a date:
    12-03-1975
    The date format is correct
    +++++++++
    

    [Solution and Project name: Lesson_12Task1, file name L12Task1.cs]

    Lab 2. Count method of Regex class
      
    To do: Create a function that determines how many zip codes are there within the specified string (the zip code consists of 6 digits in a row).

    Note 1: Create a method to make the calculations.

    Note 2: The Count method of Regex class must be used.

    Note 3: To check to see if the program works properly you must make three or four automatic tests. Use Debug.Assert to do it.  

    Result example:

    Tests are done well
    For the string '123: zip code 367824 is norther than 123712' 
    we have 2 zip codes
    

     [Solution and Project name: Lesson_12Lab2, file name L12Lab2.cs]

    ✍ How to do:

    1. Create a new project with a name and file name as it is given in the task.
    2. Place your cursor at the top of the code, after the place where classes and namespaces are included. Include the following namespace to use regular expressions’ methods:
    3. //...
      using System.Text.RegularExpressions;
      //...
      
    4. To make automatic tests one more class must be added. Place the following code after the previous:
    5. //...
      using System.Text.RegularExpressions;
      using System.Diagnostics;
      //...
      
    6. Create a function called CountZip with one argument — the inputted string. The function has to return an integer value — the number of all occurrences:
    7.  
      static int CountZip(string zip)
              {
                  ...
              }
      
    8. Now we’re going to create a pattern.
      • First, we have to put the word boundaries to start the string and to finish it:
      • "\b...\b"
        
      • Zip codes must have 6 digits in a row. So we can use \d for any digit and {6} means that there have to be 6 digits:
      • "\b\d{6}\b"
        
      • So what we have in our pattern:
      • \b	begin the match at a word boundary
        \d{6}   any digit, 6 of them in a row
        \b	end the match at a word boundary
        
    9. We’re going to use Matches standard method to check to see how many times our pattern will match the string.
    10. Matches(String) method searches the specified input string for all occurrences of a regular expression. It returns a collection of the Match objects found by the search. If no matches are found, the method returns an empty collection object.
    11. Place the following code inside the created method:
    12.  
      var m = Regex.Matches(zip, @"\b\d{6}\b");
      return m.Count;
      
    13. Within the Main function we’re going to call created CountZip method. But we need to do it using the automatic test: the method Assert of Debug class.
      Assert(bool) checks for a condition; if the condition is false, it displays a message box that shows the call stack. If the condition is true, a failure message is not sent and the message box is not displayed.
    14. First, let’s call the method with a string with two zip codes. To have true as a result we’ll need to check to see if it is equal to 2:
    15. Debug.Assert(CountZip("344113 34116 15 152566  14254124    12515 hello") == 2);
      
    16. Run the application. There is no any output. It means that the test is done well.
    17. After, let’s call the method with string with no zip code inside of it:
    18. Debug.Assert(CountZip("hello") == 0);
    19. That’s enough. Let’s output the message that the tests are done:
    20. Console.WriteLine("Tests are done well");
    21. Run the application again and check the output. There hasn’t to be any output.
    22. And at last, we need to output the number of zip codes within the particular string. So we’ll declare a variable and assign that string to it:
    23. string zipCode = "123: zip code 367824 is norther than 123712";
      Console.WriteLine($"For string '{zipCode}' we have {CountZip(zipCode)} zip codes");
    24. Run the application again and check the output.
    25. Add comments with the text of the task and save the project. Download file .cs to the moodle system.
    Task 2:

    To do: Create a function that calculates how many emoticons are there within the specified string.
    The emoticons can consist of the following characters:

  • the first character is either ; (semicolon) or : (colon) exactly once;
  • then the - (minus) symbol can go as many times as you want (including the minus symbol can go zero times);
  • in the end, there must be a certain number (at least one) of identical brackets from the following set: (,), [,];
  • no other characters can be found inside the emoticons.
  •   
    Note 1: Create a method to make the calculations.

    Note 2: The Count method of Regex class must be used.

    Note 3: To check to see if the program works properly you must make three or four automatic tests. Use Debug.Assert to do it.  

    Result example:

    Tests are done well
    For the string 'Hello, daddy :) I miss you :-(' 
    we have 2 emoticons
    

    [Solution and Project name: Lesson_12Task2, file name L12Task2.cs]

    Lab 3. Replace method of Regex class
      
    To do: Create a function that delets extra white spaces from the specified string (there can be double, triple white spaces in a row, or any number of white spaces in a row). Replace method must be used.

    Note 1: Create a method with three arguments to make the replacement. The three arguments are: the original string, the pattern string, and the replacement string. The replacement string has to be equal to " " (single white space has to placed instead of some white spaces in a row).

    Note 2: The Replace method of Regex class must be used.

    Note 3: To check to see if the program works properly you must make three or four automatic tests. Use Debug.Assert to do it.  

    Result example:

    Tests are done well
    Original String: 'Hello   World   '
    Replacement String: 'Hello World '
    

     
    [Solution and Project name: Lesson_12Lab3, file name L12Lab3.cs]

    ✍ How to do:

    1. Create a new project with a name and file name as it is given in the task.
    2. Place your cursor at the top of the code, after the place where classes and namespaces are included. Include the following namespace to use regular expressions’ methods:
    3. //...
      using System.Text.RegularExpressions;
      //...
    4. To make automatic tests one more class must be added. Place the following code after the previous:
    5. //...
      using System.Text.RegularExpressions;
      using System.Diagnostics;
      //...
    6. Create a function called ReplaceSpaces with three arguments — the inputted string, the pattern, and the string to replace the pattern. The function has to return a string value — the replacement (resulting) string:
    7. static string ReplaceSpaces(string input, string pattern, string replacement)
              {
                  ...
              }
    8. Now we’re going to create a pattern.
      We have a special quantifier \s to use a white space in a pattern. But there can be many white spaces in a row, for this reason, we need to use + that means one or more characters. The pattern will be:
    9. "\s+"
      
    10. Within the Main function assign the created pattern to a variable called pattern:
    11. string pattern = @"\s+";
    12. After, declare one more variable to store the replacement string, the extra spaces in a row later will be replaced by " " — single white space:
    13. string replacement = " ";
    14. We’re going to use Replace standard method to make a task.
    15. Replace(string input, string replacement): In a specified input string, replaces all strings that match a regular expression pattern with a specified replacement string.
    16. Place the following code inside the created method:
    17. Regex rgx = new Regex(pattern);
      string result = rgx.Replace(input, replacement);
      return result;
    18. Within the Main function we’re going to call created ReplaceSpaces method. But we need to do it using the automatic test first: the method Assert of Debug class.
      Assert(bool) checks for a condition; if the condition is false, it displays a message box that shows the call stack. If the condition is true, a failure message is not sent, and the message box is not displayed.
    19. Let’s call the method providing it a string with extra white space. To have true as a result we’ll need to place the following code as a condition of the Assert method (you must do it inside the Main function):
    20. Debug.Assert(ReplaceSpaces(" Good day  !",pattern, replacement) == " Good day !");
      Console.WriteLine("Tests are done well");
    21. Run the application. There is only the label «Tests are done well» on the console. It means that the function works properly.
    22. And at last, we need to replace extra white space within the particular string. So we’ll declare a variable and assign that string to it:
    23. string input = "Hello   World   ";
      Console.WriteLine($"Original String: {input}");
      Console.WriteLine($"Replacement String: {ReplaceSpaces(input, pattern, replacement)}");
      Console.ReadKey(); // to stop the console window while using a debugging mode
    24. Run the application again and check the output.
    25. Add comments with the text of the task and save the project. Download file .cs to the moodle system.
    Task 3:

    To do: Check the value of a string type variable to see if it contains a text frames with asterisks. Replace this text with the tag <em></em>. Do not change text in double asterisks.

    Note: Create a function to make these replacements (with a signature: static void ConvertText(ref string s)). To check to see if the program works properly you must make three or four automatic tests. Use Debug.Assert to do it.
         

    Expected output:

    Tests are done well
    input:
    *this is italic*
    output:
    <em>this is italic</em>
    +++++++++
    input:
    **bold text (not italic)**
    output:
    **bold text (not italic)**
    

    [Solution and Project name: Lesson_12Task3, file name L12Task3.cs]

    Task 4:

    To do: A string with a value is given. Find all IPv4 addresses (in decimal notation with dots as a separator) and store them to a new variable of a string type. Print out the value of this variable.

    Note 1: IPv4 addresses in decimal notation with dots as a separator have a format: xxx.255.255.255 (the first part must be a three-digits number (from 100 to 255), each other part can be from 1 through 255 maximum).

    Note 2: Create a function to make a search (with a signature: static string FindAddresses(string s)).

    Note 3: To have a beatiful output don’t forget to use an escape symbol \n to have a new line.      

    Expected output:

    for text 444.34.56.78 125.34.56.78  125.34.56.78   12.34.56.78 words 255.133.255.133 255.1333.255.133
    addresses are: 
    125.34.56.78
    125.34.56.78
    255.133.255.133
    

    [Solution and Project name: Lesson_12Task4, file name L12Task4.cs]

    Task 5:

    To do: Determine whether the string is a domain name with http and https protocols, with an optional slash (\) at the end.

    Note: Create a function which returns a boolean type to determine it.

    Expected output:

    for text http://example.com/ result is true
    for text http:/example.com/ result is false 
    for text http//example.com/ result is false
    for text https://example.com/ result is true
    for text https://example.ru result is true
    for text http://exampleru/ result is false
    

    [Solution and Project name: Lesson_12Task5, file name L12Task5.cs]