Substring With Concatenation Of All Words

To solve this coding challenge, we need to identify all starting indices in string
s
from which a concatenated substring of
words
begins. The concatenated substring must include every string in
words
exactly once and in any order.

Explanation:

To tackle this problem, we need to employ a sliding window technique combined with hash maps to efficiently check for the required concatenation of substrings.
Steps to Solve:
  1. Input Checks and Early Exit :
    • First, check if the input string
      s
      or the list of words
      words
      is empty. If so, immediately return an empty list as there can be no valid substrings.
  2. Initial Setup :
    • Calculate the length of an individual word (
      word_len
      ).
    • Find the total count of words (
      words_count
      ).
    • Calculate the length of the entire concatenated substring (
      concat_len
      ), which is simply
      word_len
      multiplied by
      words_count
      .
  3. Frequency Dictionary for Words :
    • Build a frequency dictionary
      word_freq
      that stores how many times each word appears in the list
      words
      .
  4. Sliding Window :
    • Iterate through the string
      s
      from index
      0
      to
      (len(s) - concat_len + 1)
      . This ensures that we only consider starting points of substring windows within bounds.
    • For each starting point, initialize an empty dictionary
      seen
      to keep track of the words seen in the current window.
  5. Check Words in Window :
    • For each starting point, check fixed-length segments of the string
      s
      that match the length of words in
      words
      .
    • Update the
      seen
      dictionary with the counts of each segment.
    • If a segment isn't part of the words list or if any word appears more times than it should, break out of the loop early.
  6. Validity Check :
    • If the segment perfectly matches the required frequency of words, record the start index in the result list.
  7. Return the Result :
    • After we have checked all possible starting points, return the list of starting indices.
    Pseudocode:
                                                
    # Check if input string or words list is empty
    if s is empty or words is empty:
    return empty list
    
    # Initialize required variables
    word_len = length of the first word in words
    words_count = number of words in words
    concat_len = word_len * words_count
    
    # Build frequency dictionary for words
    word_freq = empty dictionary
    for each word in words:
    if word is in word_freq:
    increment word_freq[word] by 1
    else:
    set word_freq[word] to 1
    
    result = empty list
    
    # Iterate through each possible starting index in s
    for i from 0 to (length of s - concat_len + 1):
    seen = empty dictionary
    for j from 0 to words_count - 1:
    word_start = i + j * word_len
    word = substring of s from word_start to word_start + word_len
    
    # Check if word is in word_freq
    if word is in word_freq:
    if word is in seen:
    increment seen[word] by 1
    else:
    set seen[word] to 1
    
    # If seen word count exceeds expected count in word_freq, break
    if seen[word] > word_freq[word]:
    break
    else:
    break
    
    # If we have checked all words and counts match, add start index to result
    if j + 1 == words_count:
    append i to result
    
    # Return the result list of starting indices
    return result
    
                                            
    Detailed Steps in Pseudocode:
  8. Start by checking if the input string
    s
    and
    words
    list are empty. Return an empty list if true.
  9. Calculate the length of the words and the total expected length of the concatenated substring (
    concat_len
    ).
  10. Build a frequency dictionary to count occurrences of each word in
    words
    .
  11. Use a for loop to iterate through the string
    s
    from the start to
    (length of s - concat_len + 1)
    .
  12. For each starting index
    i
    , initialize an empty
    seen
    dictionary.
  13. Use an inner for loop to check fixed-length segments from the current starting index
    i
    .
  14. Extract the segment and check if it exists in the frequency dictionary.
  15. Update the
    seen
    dictionary and ensure the word count doesn't exceed what's expected.
  16. If all words match perfectly in the current window, record the starting index.
  17. Finally, return the resultant list of valid starting indices.
By following these detailed steps within the pseudocode, we ensure the solution is efficient, understandable, and accurate.