Regular Expression Matching

To solve this coding challenge, we need to implement a regular expression matching algorithm that supports two special characters:
  • ‘.’ which matches any single character.
  • ‘*’ which matches zero or more of the preceding element.
The goal is to determine if the input string
s
matches the pattern
p
entirely, rather than partially. This algorithm can be efficiently solved using dynamic programming.

Explanation

We will utilize a 2D boolean array (
dp
), where
dp[i][j]
indicates whether the substring
s[0...i-1]
matches the pattern
p[0...j-1]
. Here, we use bottom-up dynamic programming approach.
  1. Initialize the
    dp
    array:
    • dp[0][0]
      is
      True
      because an empty string matches an empty pattern.
    • For patterns where `j` is greater than `0` and the pattern contains `
      , we need to use the property of
      ` to match the preceding character zero times, thus setting values for `dp[0][j]`.
  2. Filling up the
    dp
    array:
    • Iterate through each character in the string
      s
      and the pattern
      p
      .
    • For each character:
      • If the current characters in both string and pattern match (
        s[i-1] == p[j-1]
        or
        p[j-1] == '.'
        ), set
        dp[i][j]
        to
        dp[i-1][j-1]
        .
      • If the current character in the pattern is
        '*'
        , we check two conditions:
        • If the pattern without this
          '*'
          and its preceding character matches (i.e.,
          dp[i][j-2]
          ).
        • If the preceding character of
          '*'
          in pattern matches the current character in the string or the preceding character itself is a ‘.’ and the substring
          s[0...i-1]
          excluding this character matches the pattern
          p[0...j]
          (i.e.,
          dp[i-1][j]
          ).
  3. Final check:
    • Return
      dp[len(s)][len(p)]
      which shows whether the entire string matches the pattern.
Here is the detailed pseudocode along with comments:
                                            
# Initialize a 2D array dp where dp[i][j] will be True if s[0...i-1] matches p[0...j-1]
Initialize dp array of size (len(s) + 1) x (len(p) + 1) to False

# An empty string matches an empty pattern
dp[0][0] = True

# Handle patterns with '*' that can match zero elements
for j from 1 to length of p:
    if p[j - 1] == '*':
        dp[0][j] = dp[0][j - 2]

# Iterate over each character of s and p
for i from 1 to length of s:
    for j from 1 to length of p:
        # When characters match or when there is a '.'
        if p[j - 1] == s[i - 1] or p[j - 1] == '.':
            dp[i][j] = dp[i - 1][j - 1]
        # When there is a '*' character in pattern
        elif p[j - 1] == '*':
            # Case when '*' matches zero preceding characters
            dp[i][j] = dp[i][j - 2]
            # Case when '*' matches one or more preceding characters
            if p[j - 2] == s[i - 1] or p[j - 2] == '.':
                dp[i][j] = dp[i][j] or dp[i - 1][j]

# Return whether the entire string matches the pattern
return dp[len(s)][len(p)]

                                        

Step-by-Step Explanation/Detailed Steps in Pseudocode

Step 1: Initialize the dp array
  • Create a 2D array
    dp
    of size
    (len(s) + 1) x (len(p) + 1)
    initialized to
    False
    .
  • Set
    dp[0][0]
    to
    True
    because an empty string matches an empty pattern.
Step 2: Handle patterns with '*'
  • Loop through the pattern
    p
    from index
    1
    to
    length of p
    :
    • If the current character in the pattern is
      *
      , it can either match zero elements of the preceding character. This means for
      dp[0][j]
      , check
      dp[0][j - 2]
      .
Step 3: Iterate through each character of s and p to fill dp array
  • Loop through the string
    s
    from index
    1
    to
    length of s
    .
    • Loop through the pattern
      p
      from index
      1
      to
      length of p
      .
      • If the character at
        s[i-1]
        matches
        p[j-1]
        or
        p[j-1]
        is
        .
        :
        • Update
          dp[i][j]
          as
          dp[i-1][j-1]
          .
      • If the character at
        p[j-1]
        is
        *
        :
        • Initially, assume
          *
          matches zero characters, i.e.,
          dp[i][j] = dp[i][j-2]
          .
        • Check if the preceding character in
          p
          matches the current character in
          s
          or if it is a
          .
          , then update
          dp[i][j]
          to
          dp[i][j]
          or
          dp[i-1][j]
          .
Step 4: Return final result
  • Return the value
    dp[len(s)][len(p)]
    , indicating whether the entire string
    s
    matches the pattern
    p
    .
By following these detailed steps and using the pseudocode, we can implement the regular expression matching algorithm efficiently.