Algorithms and Data Structures 2024/2025 (L.EIC011) - DCC/FCUP & DEI/FEUP

Practical Class #02 - Complexity and Asymptotic Analysis + Mooshak
(30/09 to 04/10)

Exercises for submission

In what concerns the continuous evaluation solving exercises during the semester, the exercises you can submit in this class are:

[AED000] This is not a problem! (weight 20%) (main exercises)
[AED001] Prime Numbers (weight 40%) (main exercises)
[AED002] Sequence Game (weight 40%) (main exercises)
[AED003] Saving Pipefish (weight 40%) (extra exercises)
[AED004] Number Spiral (weight 40%) (challenge exercises)

Submission Deadline: October 19th (submit to AED's Mooshak)

You are encouraged to talk to the professor and to your colleagues if you have difficulties. However, any more direct help you have received from other colleagues should be acknowledged in the comments of the code you submit.
After the deadline the problems will still be available on Mooshak, but the submissions will not count towards your grade.
Each class is worth 10 per cent of the grade for this component. As there will be 11 classes with submissions, you can get full marks even if you haven't done everything.
For a problem to count, you have to get all the tests right (i.e. accepted). Even if you solve all the problems, the maximum in a single class is 100 per cent.
it is guaranted that the main exercises on each class will sum to at least 100% of the class grade.

Lectures Content

This class is about complexity and asymptotic analysis. It is therefore convenient that you know what was discussed on the lectures:

Slides: 2 - Complexity and Asymptotic Analysis

Recall that:

\( f(n) \in O(g(n)) \) iff there exist constants \( c \in \mathbb R^+ \) and \(n_0 \in \mathbb Z^+ \) such that \( f(n) \leq cg(n) \), for all \(n \geq n_0 \).
\( f(n) \in \Omega(g(n)) \) iff there exist constants \( c \in \mathbb R^+ \) and \( n_0 \in \mathbb Z^+ \) such that \( f(n) \geq cg(n) \), for all \(n \geq n_0 \).
\( f(n) \in \Theta(g(n)) \) iff there exist constants \( c_1,c_2 \in \mathbb R^+ \) and \( n_0 \in \mathbb Z^+ \) such that \( c_1 g(n) \leq f(n) \leq c_2 g(n) \), for all \(n \geq n_0 \).

1. Explain why each statement is true, using the definition of the classes \( O(\cdot) \), \( \Omega(\cdot) \) and \( \Theta(\cdot) \).

\(3n^2+100 \in \Theta(n^2)\)
\(100n+3n\log_2 n \in \Theta(n\log_2 n)\)
\(f(n) \in O(f(n))\), \(\; f(n) \in \Omega(f(n))\), and \(\; f(n) \in \Theta(f(n))\), for all functions \(f\).
\(\frac{3}{5}n^2-\frac{1}{2} \in \Omega(n^2) \cap O(n^2)\)
\(O(an) = O(bn)\), for all constants \(a,b\in\mathbb R^+\)
\(O(n) \subseteq O(n^p)\), for all \(p\in \mathbb Z ^+\)
\(\Theta(\log_2 n) = \Theta(\log_2(n^p))\), for all \(p\in \mathbb Z^+\)

2. Justify the truthfulness or falsity of each statement.

\(50n^3-2n+100 \in \Theta(\frac{1}{100}n^3) \)
\(3n^2-n+10 \in \Omega(20n^2)\)
\(\log_2 n \in \Omega(1000+\log_2 n)\)
\(n \notin \Omega(2^{10}\log_2n)\) because \(n<2^{10}\log_{2} (n)\), for all \(1 < n \leq 2^{13}\) (actually for \(1 < n \leq 14115\))

3. Characterize the asymptotic time complexity of the function remove, where the array x has at least \(n\) elements. Describe what characterizes best case and worst case instances.

int remove(int x[], int n, int v) {
    int t = 0;
    for (int j=0; j<n; j++) 
       if (x[j] <= v) 
           x[t++] = x[j];
    return t;
}

4. Characterize the asymptotic time complexity of the function symmetric, where x is a square matrix defined with \( \mathtt N \) columns at most, and \(1 \leq \mathtt n\leq \mathtt N\). Describe best case and worst case instances.

bool symmetric(int x[][N], int n) {
    for (int i=1; i < n; i++) 
        for(int j = 0; j < i; j++)
           if (x[i][j] != x[j][i]) return false;
    return true;
}

5. Characterize the asymptotic time complexity of the function product, where x, y and z square matrices (all of the same dimension \(n\)), defined with \(\mathtt N\) columns at most, and \( 1 \leq \mathtt n \leq \mathtt N\). Describe best case and worst case instances.

void product(int x[][N], int y[][N], int z[][N], int n) {
    for (int i=0; i < n; i++) 
        for(int j = 0; j < n; j++) {
           z[i][j] = 0;
           for(int k = 0; k < n; k++) 
              z[i][j] += x[i][k]*y[k][j];
        }
}

6. Characterize the asymptotic time complexity of my_3SUM, as a function of the size of vector x. Describe best case and worst case instances.

#include <iostream>
#include <vector>

using namespace std;

bool my_3SUM(const vector<int> & x) {
    int n = x.size();
    for (int i=0; i < n-2; i++) 
        for(int j = i+1; j < n-1; j++)
           for(int k = j+1; k < n; k++) 
              if (x[i]+x[j]+x[k] == 0) {
                  cout << i << ' ' << j << ' ' << k << endl;
                  return true;
              }
    return false;
}

7. Assume that v and w are vectors and both contain a sequence of nonnegative integers that ends with 0 (and has no other zeros). Let \(a_1, a_2,\ldots, a_m,0\) and \(b_1,b_2,\ldots, b_n,0\) be such sequences, with \(m\geq 1\) and \(n\geq 1\). The function returns an integer which is either positive, negative or 0, if the sequence given by v is lexicographically greater than, less than or equal to the sequence given by w, respectively. Recall that all \(a_i\)'s and \(b_i\)'s are positive integers.

int lex_comp(const vector<int> &v, const vector<int> &w) {
    int j = 0;
    while (w[j] == v[j] && v[j] != 0) 
        j++;
    return v[j]-w[j];
}

Characterize the asymptotic time complexity of lex_comp, as a function of \(m\) and \(n\) (you can use \(\min(m,n)\) or \(\max(m,n)\) in the expression, if you need it). Start by describing instances that lead to the best case and the worst case.
(*) Write a loop invariant that allows us to conclude that the function returns a correct value for all instances. Give a brief explanation.

8. Recall the problem \(\mathtt{Partition}\) which, given a set \(S=\{a_1,a_2,\ldots, a_n\}\) of \(n\) positive integers, asks whether there is a set \(A\subset S\) such that \(\sum_{x\in A} x = \sum_{x \in S\setminus A} x\). The following function solves the problem when we call search(x,0,n,0,0,res,nres), with nres=0, and x and res being arrays with at least \(n\geq 2\) elements. In addition, it returns a set \(A\) in res. It implements a brute force search algorithm.

bool search(int x[],int i, int n, int s, int r, int res[], int & nres) {
    if (i == n) return s == r;
    s += x[i];  res[nres++] = i;
    if (search(x,i+1,n,s,r,res,nres)) return true;
    s -= x[i];   nres--;   r += x[i];
    return search(x,i+1,n,s,r,res,nres);
}

Explain why the time complexity is \(\Theta(2^{n})\) in the worst case and \(\Theta(n)\) in the best case. Describe best case and worst case instances.
Write an improved (still exponential time) version search2 in which \(A=\{\mathtt{res}[i]\;|\; 0\leq i < \mathtt{nres}\}\) and \(\texttt{s} = \sum_{i\in A} \tt x[i]\), as above, but \(\texttt{r} =\sum_{i\in S\setminus A} \tt x[i]\), in each (recursive) call. When we call search2 for the first time, \(\texttt{s}\) must be 0 and \(\texttt{r}\) must be \(\sum_{i=0}^{n-1} \mathtt x[i]\). If \(\mathtt s > \mathtt r\), at some call, then search2 returns false in that call, because it is not possible to complete the partial solution. This is the improvement, as it can backtrack before \(i=n\) to explore another branch. If \(\mathtt s = \mathtt r\), at some call, then res defines a solution.

9. My first Mooshak submissions

On this course we will use Mooshak to automatically evaluate your C++ code (see Mooshak's help).

For all your submissions you can visualize the detailed result obtained in each of the test cases made. It is also possible to access any public tests for which your program did not receive the Accepted veredict. For seeing all this detailed feedback you only need to click the result of your submission that appears underlined in the listing.
To access the system you can use your login (with the word "up") and password that was sent to your institutional email (message sent on the 30th of September with the title "[AED] Mooshak password - login")
(if you have trouble accessing it, send a message on discord to @PedroRibeiro)
We will not enforce any editor on this course. We do however advise you to use an editor that is available on your computer labs, so that you get acquainted to the environment you will have available on the practice test days. We suggest you to use VScode or CLion. You should already know how to compile and run C++ code from your previous courses, but if you need any help you can ask in Discord.

In this problem the goal is to make your first submissions in Mooshak, all for the [AED000] This is not a problem!. The problem statements will also be available inside Mooshak. Always start by carefully reading the problem statement given!

Start by submitting an Accepted in Mooshak. You can use the following example code given in aed000.cpp to get you started (you can right click the file name and save it on a suitable directory). The file is already correct and you can submit it directly 🙂

You can experiment on your own computer with the provided input examples to check the result (just copy+paste the inputs)
If you saved your input in a file named input.txt you can also use redirection to feed your program with that input using executable < input.txt
All problem statement will follow roughly the same format, specifying the problem, the input, the output, the constraints and giving at least one input example and its corresponding output.
You do not need to verify the constraints on your code. On the contrary, the constraints give you guarantees on the input that will be given to your program. You can assume the test cases we will feed your code obey the constraints (e.g. this allows to know which variable types to use on this program, since you know the given integers will never exceed 10⁹).
All the lines of input and output should be terminated with the newline character (including the last line).
You should only output what is given in the statement. Any extra output will can your code be marked as incorrect.
After submitting, you can see that the word Accepted is underlined on the listing (see this picture for an example of a list). If you click on it you will se a table detailing your result on each input (see this picture for an example of the feedback given) .
You can click on the submission number on the left of each listing row to recover your submitted code.
If you press on more... you can see an interface for seeing submissions that allows you to select only a certain student, a certain problem, etc.
If you submit a file that is exactly the same as one that you already submitted, Mooshak will not accept and report a Duplicate submission.

You should now experiment with some common errors to see the behavior of Mooshak in these cases (see the detailed feedback to access the first input where your program fails):

Try to get a Compile Time Error - for instance you could remove the last } or the last } and resubmit.
Try to get a Wrong Answer - for instance you could change the output line to cout << (count+1) << endl;
Try to get a Time Limit Exceeded - for instance you could add while(true); before the final return 0 to make the program get stuck on a loop
Try to get a Presentation Error - for instance you could change the output line to cout << count;
This would make your output not have the endline character, as requested on all lines.
If your output is correct with the exception of any whitespace (e.g. spaces, newlines, tabs) Mooshak will mark your submissions with a Presentation Error).

10. My first Mooshak problem in which efficiency matters

For this exercise you will be solving the problem [AED001] Prime Numbers.

Start by submitting the example code given in aed0001.cpp and check the result.
Can you understand why the temporal complexity of your code is \(O(Q \times N)\)? (where \(Q\) is the quantity of numbers and \(N\) is the maximum possible number)
Unless explicitly stated on the problem statement, the default time limit per problem is 1 second (the default memory limit is 256MB).
This is the reason why this program gets a Time Limit Exceeded veredict, and only pass the first tests (failing the larger one.
(you can test the first line your program fails to see how much time it would take)
Your task is now to improve your program to the desired time complexity of \(O(Q \times \sqrt{N})\), which will give you an Accepted.
Inside the isPrime function, do you really need to test all possible divisors up until \(n\)? Until which number can you go and why?

11. Maximum Subarray Sum

For this exercise you will be solving the problem [AED002] Sequence Game.

The goal of this exercise is to produce algorithms with different time complexities and check their efficiency when submitting. You should start by carefully reading the problem statement.

A first solution with temporal complexity \(\Theta(n^3)\)

What are all the possible contiguous subsequences? Let v be a vector containing the sequence and \(n\) its length. The possible sequences are therefore all the subsequences \(v[i \dots j]\) such that \(0 \leq i \leq j \lt n\).

An exhaustive "brute force" would be to go through all the subsequences and for each of them compute the respective sum, choosing the best possible sum.
Supposing we already read the input to a vector v, a way of doing this would be calling the following the following maxSum function that returns the best sum:

int maxSum(const vector<int> &v) {
  int n = v.size();

  int maxSoFar = v[0]; // why is this a good choice for the best initial sum?
  for (int i=0; i<n; i++) // all possible initial positions
    for (int j=i; j<n; j++) { // all possible final positions
      int sum = 0;
      for (int k=i; k<=j; k++) // compute sum between positions i and j
        sum += v[k];
      if (sum > maxSoFar) maxSoFar = sum;
    }
  
  return maxSoFar;
}

Understand what the code above does, implement the rest of the code (reading the input and calling the function) and submit it to Mooshak. See how many tests this code passes and why it fails on the bigger tests (remember you can click on the result to see detailed feedback on the submission).

Improving the solution to \(\Theta(n^2)\)
Intuitively, looking at the previous code, we can note we are repeating to many calculations. When we go from from \(sum(v[i \ldots j])\) to \(sum(v[i \ldots j+1])\) we do not need to recompute everything (starting again at position \(i\)), and we can just add \(v[j+1]\) to the previous sum.

In other words, \(sum(v[i \ldots j+1]) = sum(v[i \ldots j]) + v[j+1]\). We can use this to remove the third cycle with \(k\) that we had on our previous solution..

Implement this new solution and submit it. Can you see you now passed in more tests? Why? But we still need to improve to have an accepted solution...

Improving even further to \(\Theta(n)\)

A quadratic solution is still not enough and need to improve it to linear time. For that we will use Kadane's algorithm.

Considere that \(best(i)\) represents the best subsequence that ends on position \(i\). We know as a base case that \(best(0) = v[0]\) (it's the only possible subsequence that ends on the first position).

If we know how to compute \(best(i)\), how can we compute the value of \(best(i+1)\)?

If \(best(i)\) is a positive value, then \(best(i+1) = best(i) + v[i+1]\), since using the best that terminates exactly "behind" will improve the solution. Imagine for instance the vector \([4,-2,1,5]\):

\(i\)	0	1	2	3
\(v[i]\)	4	-2	1	5
\(best(i)\)	4	2	3	8

\(best(2) = 3\), obtained with the subsequence \([4,-2,1]\).
\(best(3) = best(2) + v[3] = 3 + 5 = 8\), obtained with the subsequence \([4,-2,1,5]\) (we extend the previous subsequence).

If \(best(i)\) is a negative value, then \(best(i+1) = v[i+1]\), since what is "behind" can only decrease the sum of the subsequence terminating at position \(i+1\). Imagine for instance the vector \([-5,2,-3,4]\):

\(i\)	0	1	2	3
\(v[i]\)	-5	2	-3	4
\(best(i)\)	-5	2	-1	4

\(best(2) = -1\), obtained with the subsequence \([2,-1]\).
\(best(3) = v[3] = 4\), obtido através do subarray \([4]\) (we did not extended the previous subsequence).

At the end, the best sum is the maximum value of \(best(i)\) for any possible \(i\) (the best subsequence can terminate in any position).

To compute this we can traverse only once the vector v and in each iteration compute in constant time the value of the best sum terminating on the current position using the best sum terminating on the previous position. The temporal complexity of this approach is linear.

Implement this solution, submit and verify it passes all test cases, obtaining your deserved Accepted 🙂

Extra Exercise for Consolidating your knowledge [extra]

AED Mooshak classes will usually include at least one extra exercise that you can use of you want to keep testing your knowledge of the material. For this class we propose another problem for which temporal efficiency is needed and the exhaustive "brute force" solution is not enough.

For this exercise you will be solving the problem [AED003] Saving Pipefish.

This exercise is an adaptation of a problem that was the easiest one on the qualification for a past National Olympiads in Informatics qualification phase, an high-school algorithmic contest.

A "brute force" solution that tries all possible sections and for each of them tests all positions to see how many of them are larger than \(T\) is not enough, since it would spend \(N^2\) time (can you see why? imagine a case with \(K=N/2\)). We need a better solution...

(click to see hints; try to solve the problem first if you do not want spoilers)

Hints

- Imagine you already processed the section \([a,b]\), having obtained \(Q_a\), the amount of elements larger or equal than \(T\) in that section.

- The next section is \([a+1,b+1]\). What changes in that interval? Only the position \(a\) is removed and the position \(b+1\) is inserted. The (new) value of \(Q_{a+1}\) is therefore equal to \(Q_a\), subtracting 1 in case the depth of position \(a\) is \(\ge T\) and adding 1 in case the depth of position \(b+1\) is \(\ge T\).

- For each new interval we only need to make a constant number of operations (considering the element to remove and the element to insert). Regarding efficiency this corresponds to an initial cycle for the first section of size \(K\) and for the remaining \(N-K\) sections we spend constant time. Another way of reasoning about this is that each position is added only once to the current section, and removed once from it. This is therefore a solution in time \(\Theta(N)\) obtaining the desired Accepted results.

- This idea of reusing the result of the previous section to compute the result of the next one is known as the sliding window technique, since we are "sliding" the windows of the current section along the input.

Challenge Exercise [challenge]

AED Mooshak classes will usually include one challenge exercise, which can be much harder the previous ones and might even need knowledge that was not given explicitly on the course.

The main idea is to give you the opportunity of improving your problem solving abilities.

For this exercise you will be solving the problem [AED004] Number Spiral.

Because this is the first challenge, the selected problem is not too hard and can be solved by any student without any knowledge of advanced algorithms or data structures. You just need to think a little bit and of course go beyond the naive "brute force" approach 😉

Since this is a challenge, I will not give you any hints, but if you are stuck just contact @PedroRibeiro on Discord. Feel free to also contact @PedroRibeiro to discuss your accepted solution.