Algorithms and Data Structures 2024/2025 (L.EIC011) - DCC/FCUP & DEI/FEUP

Practical Class #11 - Hash Tables
(09/12 to 13/12)

Exercises for submission

In what concerns the continuous evaluation solving exercises during the semester, the exercises you can submit in this class are:

[AED067] Hash Tables with Separate Chaining (weight 50%) (main exercises)
[AED068] DNA Motifs (weight 50%) (main exercises)
[AED069] Hash Tables with Open Addressing (weight 50%) (extra exercises)
[AED070] Tower of Babel (weight 50%) (extra exercises)
[AED071] Alien Words (weight 50%) (challenge exercises)

Submission Deadline: January 4th (submit to AED's Mooshak)

You are encouraged to talk to the professor and to your colleagues if you have difficulties. However, any more direct help you have received from other colleagues should be acknowledged in the comments of the code you submit.
After the deadline the problems will still be available on Mooshak, but the submissions will not count towards your grade.
Each class is worth 10 per cent of the grade for this component. As there will be 11 classes with submissions, you can get full marks even if you haven't done everything.
For a problem to count, you have to get all the tests right (i.e. accepted). Even if you solve all the problems, the maximum in a single class is 100 per cent.
it is guaranted that the main exercises on each class will sum to at least 100% of the class grade.

Lectures Content

This class is about hash tables. It is therefore convenient that you know what was discussed on the lectures:

Slides: 13 - Hash Tables

Given that this is a week with the 2nd practical test and that we are near the end of the semester, you will have a lower amount of problems to solve 😉

1. Implementing Hash Tables with Separate Chaining (a.k.a. Open Hashing)

For this exercise you will be solving the problem [AED067] Hash Tables with Separate Chaining .
Desired time complexity: \(O(1)\) in average for each of the 3 asked methods, assuming the hash function is "good" and the time it takes to compute the hash function is also constant.

The way a hash table with separate chaining works was explained in the lectures: 13 - Hash Tables [slides 12 and 13]

If you want you can have a look at an Open Hashing visualization.

You should start by carefully seeing the base code given: class HashTableSC<KeyType> (see code | download code). Your task is to implement the methods contains, insert and remove,

Hints

- contains is simply going to the list in table[hash(k) % size] and seeing if the key is there

- insert is similar: if the element is not there, just do a push_back(k) on the respective list; don't forget to increase numberKeys and to return true/false

- remove is also similar: find the element and remove using for instance the erase method of the respective list; don't forget to decrease numberKeys and to return true/false

2. DNA Motifs

For this exercise you will be solving the problem [AED068] DNA Motifs.
Desired time complexity: \(O(K \times |S|)\) (on average, where K is the motif size and S the DNA string)

This exercise is designed to let you try the hash tables as implemented by C++ STL:

Which one(s) should you use here?

Note that conceptually these containers work like their binary search tree "orderer" counterparts (set, map, multiset and multimap), with the main difference that the unordered versions use hash table and therefore have constant average expected time complexity - \(O(1)\) - but they can be linear in the worst case - \(O(n)\) - while the ordered versions have guaranteed logarithmic time complexity - \(O(\log n)\).

We encourage you to try to use the hash table (unordered) versions of the containers, but this problem is also solvable with the other containers (the ordered ones).

Hints

- Go trough the DNA string and generate all possible K-strings using for instance the substr function

- Use an unordered_map<string, int> to keep the frequencies: for each motif just increment the respective frequency

- At the end check what is the maximum frequency and how many motifs have that frequency (you can traverse the map - see here for a simple example)

Extra Exercises for Consolidating your knowledge [extra]

3. Implementing Hash Tables with Open Addressing (a.k.a. Closed Hashing)

For this exercise you will be solving the problem [AED069] Hash Tables with Open Addressing.
Desired time complexity: \(O(1)\) in average for each of the 3 asked methods, assuming the hash function is "good", the table has a low load factor and the time it takes to compute the hash function is also constant.

The way a hash table with open addressing works was explained in the lectures: 13 - Hash Tables [slides 16 to 24]

If you want you can have a look at an Closed Hashing visualization.

You should start by carefully seeing the base code given: class HashTableOA<KeyType> (see code | download code). Your task is to implement the methods contains, insert and remove,

Hints

This code is trickier than separate chaining

- contains: after staring on position hash(k) % size, do linear probing, going one position by one and check if you reach the desired key (returning true) or an empty bucket (returning false); be careful with deleted buckets, since you also need to traverse them

- insert: first see if it is contained: only if it's not should you continue; in that case do linear probing until you reach the first non-active bucket (it can be deleted or empty) and insert the key there; don't forget to update numberActive and numberDeleted and to return true/false; be careful with an almost full table: you can only insert if after inserting the table would still have an empty bucket

- remove: can you do this one by yourself? 😊

4. Tower of Babel

For this exercise you will be solving the problem [AED070] Tower of Babel.
Desired time complexity: \(O(|T| \times N)\) (on average, where T is the text and N the number of languages)

Can you think on how to use the STL hash table containers on this problem? The "hardest" part is to parse the text...

Hints

- Each language can be represented by a string name, an integer count and a unordered_set<string> containing the respective list of words

- To parse the text just find the non-processed first letter (e.g. using the isalpha function) and build a string until you find the first non-letter (or the end of the text, always storing the lower case of the character (e.g. using the tolower method) to make it easier to compare to the stored words; keep repeating this to find all the words

- For each word just check on the unordered_set of all the languages if it exists there and increment its count in that case

- In the end you just need to traverse the languages and print their name and count

5. More Exercise Suggestions

You wanted more exercises? Here are extra suggestions if you want explore more this topic of hash tables:

Using customized classes with the unordered containers: in the context of this class we only use standard types (e.g. int and string) that already have implemented hash functions in C++. What if the type we want to put in an unordered_set, for example, is a class made by you? How can we make our own hash and equality function and then pass that to the STL?

Explore this possibility by trying to create an unordered_set or unordered_map where the key is of the type of a class/struct made by you.

Some reading suggestions:
. std::hash on cppreference (see code example)
. How to create an unordered_map of user defined class in C++? (GeeksforGeeks)
. Combining Hash Values (Boost C++ Libraries)
Complete the hash tables implementation: there are still many things left to do in order to get "usable" hash tables classes from your implementations of exercises 1 and 3. For instance:

. Implement table resizing and rehashing (when should you do it? to what size will it make sense to increase? what if the table is very empty after removing?)
. Implement other types of probing on open addressing such as the ones discussed on the lectures (quadratic probing and double hashing)

Challenges Exercises [challenge]

The main idea of the challenge problems is to give you the opportunity of improving your problem solving abilities.

For this exercise you will be solving the problem [AED071] Alien Words.

For this class, the problem is not too hard, and it can help to use hash tables (of course 😁)

Since this is a challenge, I will not give you any more hints for now, but if you are stuck just contact @PedroRibeiro on Discord. Feel free to also contact @PedroRibeiro to discuss your accepted solution.

Happy coding! 😊

Algorithms and Data Structures 2024/2025 (L.EIC011) - DCC/FCUP & DEI/FEUP

Practical Class #11 - Hash Tables (09/12 to 13/12)

Exercises for submission

Lectures Content

Extra Exercises for Consolidating your knowledge [extra]

Challenges Exercises [challenge]

Practical Class #11 - Hash Tables
(09/12 to 13/12)