Can someone help me understand this code please?

asked Apr 6, 2022 by Etude Udem (140 points)

I am new to progamming and there is a lot of variables which I cannot follow. Thank you!

Given two sequences X and Y , their shuffle product, denoted XY , is the set of words obtained in interleaving the letters of X and Y in all possible ways. For example, if X = ACA and Y = GT then, X Y = {GTACA, GATCA, GACTA, GACAT, AGTCA, AGCTA, AGCAT, ACGTA, ACGAT, ACAGT}

Give the pseudo-code of a backtracking algorithm which returns these X and Y sequences.

Thank you!

Here is the code below.

#include <iostream>

#include<vector>

using namespace std;

vector<int> ans;

// Check whether the matching subsequence we have obtained is actually the result of two strings

bool check(vector<int> &v1, string &t, string &s) {

    string x, y;

    for (int i = 0, j = 0; i < s.size(); ++i) {

        if (j < v1.size() && v1[j] == i) {

            ++j;

            x += s[i];

        } else {

            y += s[i];

        }

    }

    int k, j;

    for ( j = 0, k = 0; j < t.size(); ++j) {

        if (y[k] == t[j])

            ++k;

    }

    if (k == y.size())

        return true;

    return false;

}

bool lcs(string &s, string &t, vector<int> arr, int idx1, int idx2) {

// when we have reached the end of either of the indice

    if (idx1 == s.size() || idx2 == t.size()) {

    // call check function that checks if the indices we have stored are correct

        if (check(arr, t, s)) {

        // if this is true then this array of indices is our answer

            ans = arr;

            return true;

        }

        return false;

    }

    // if there is a match then push this index into the array and increment both indices

    if (s[idx1] == t[idx2]) {

        arr.push_back(idx1);

        return lcs(s, t, arr, idx1 + 1, idx2 + 1);

    }

    // else increment both indices and return the OR value

    return lcs(s, t, arr, idx1 + 1, idx2) || lcs(s, t, arr, idx1, idx2 + 1);

}

int main() {

    string s, t;     // We create the strings S and T and we ask for an input on the next line

    cin >> s >> t;   // Input of S and T from the terminal





    // If not lcs

    if(!lcs(s, t, {}, 0, 0)) {

    cout << "Not possible to separate into two\n";

    return 0;

    }

    string x, y;

    // Finally reconstruct the two strings from the indices

    for (int i = 0, j = 0; i < s.size(); ++i) {

        if (j < ans.size() && ans[j] == i) {

            ++j;

            x += s[i];

        } else {

            y += s[i];

        }

    }

    cout << x << '\n' << y << '\n';

}

2 Answers

answered Apr 11, 2022 by Peter Minarik (86,040 points)

Problems with the question

First of all, the original writer of the code could best explain what the code is doing. Maybe s/he is not available to ask?

I've spent now quite the time trying to understand this code. It's indeed not simple and I cannot claim I have a 100% correct understanding.

Let me start by stating that the code does not quite do what the problem asked for "Give the pseudo-code of a backtracking algorithm which returns these X and Y sequences."

First of all, the code you shared is not a pseudo-code, but an actual C++ code.
The code you shared only returns one of the possible X, Y sequences, not all possibilities.

Also, the original description was faulty in a way that it did not specify the input and output formats.

To be continued in another post as I've exceeded the 8000 character limit...

answered Apr 11, 2022 by Peter Minarik (86,040 points)

... continued from the other post of mine as we exceeded the 8000 characters limit.

The code

I've got a few notes here too.

Though the program has comments that (somewhat) help, not everything is commented properly (e.g. what exactly is lcs -- longest common sequence? --, what does it do). Other times trivial things are commented (e.g. string s, t;) such comments are not helpful. :)

Variable names x, y, i, j, k, v1, etc are again poor choices as their names convey not much meaning. Most of the time it is more important of what we store in a variable than references to the type of the variable (as most IDEs can help with the latter).

What does it do?

Enough of those complaints. Let's see what the code does:

There are 3 functions here:

main(): to drive the program, gets the user input and prints the output
lcs(): It seems to be a longest common sequence implementation, i.e., finding character sequences in order (but may potentially skip characters) that are in the same order in both input strings
check(): it is meant to verify that the found solution for the problem is a product of the two input strings of this method.

check()

Parameters:

vector<int> &v1: a proposed solution to check
string &t: one of the strings build from the solution
string &s: the other string build from the solution

What it does:

Checks if the two input strings are fully used up if one takes the indices from the proposed solution vector.

In my opinion, this is an overcomplicated function. All it should do is just check if both of the input strings (s, t) are fully processed, i.e., the relevant indices reached the end of both of the strings.

check() iterates through the string s via index i and separates s into two other strings x and y, depending on if the vector v1's next element (v1[j]) is matching the value of index i.

Then the function iterates through the string t via index j, and steps over every element in y if it is contained by t in the same order.

In the end, if the index k (denoting the last march index of y[k] == t[j]) is the same as the length of the string y, then every character is processed, we have a valid solution in the input vector v1.

lcs()

Parameters:

string &s: an input string of a possible combination of X and Y yielding one specific X Y
string &t: another input string of a possible combination of X and Y yielding one(other) specific X Y
vector<int> arr: the solution vector
int idx1: the last processed index in s
int idx2: the last processed index in t

What it does:

Tries to create a solution vector (arr) that lists the indices where s[idx1] and t[idx2] match.

Being a backtrack function, it should have a step that stops the recursion. And indeed, the first condition serves this purpose: if either input strings are at its end it calls check() to verify the validity of the proposed solution vector. If the proposed solution vector (arr) checks out good, the solution is set (ans = arr;) and the function returns true. Otherwise, false is returned (without setting the solution).

And now, the recursive part comes. If the next elements in the input strings match (s[idx1] == t[idx2]) the last checked index of string s is added to the proposed solution vector (arr.push_back(idx1);) and we call lcs() again by both indices incremented and return whatever value this provides.

If the above condition failed and did not enter the body of the if, then we return with calling lcs() first by incrementing the left index, then, if the left path failed, by incrementing the right index.

The above recursive logic traverses the entirety of both s and t strings and produces a proposed solution vector arr.

main()

It reads two input strings (t, s), that are both supposed to be a combination of X and Y, i.e., a valid X Y.

lcs() is called to try to find a valid X and Y input strings that resulted in two X Y: t and s. If it is not possible, a message is printed and the program exits (return 0;).

If an lcs() can be acquired, then the X and Y original strings (as x and y) are reconstructed and printed to the user.

Final words

I know this has been a lot. Read it a few times, and let it sink in. I hope in the end you get closer to understanding what the code is supposed to do.

Good luck!