In what concerns the continuous evaluation solving exercises grade during the semester, you should submit until 23:59 of January 4th
(this exercise will still be available for submission after that deadline, but without couting towards your grade)
[to understand the context of this problem, you should read the class #11 exercise sheet]
Given a piece of text, can you figure out what language it was written in? One possible methodology is to see how many words in the text belong to each known language. To help you, in addition to a text string T describing the text to be analyzed in this problem, you also receive a dictionary of each language you need to consider, that is, a list of words of that language.
You need to discover how many words of the text are belong to each of the known languages (if a word appears twice in the text, it also counts twice in the answer).
The words in the dictionaries are guaranteed to be in lowercase and are made up only of letters, but the text can contain characters other than letters, and also uppercase and lower case letters (you are however guaranteed that there will be no accentuated characters). For the purposes of this problem, a word is a contiguous sequence of letters separated by non-letter characters. When seeing if a word appears in a dictionary, you should ignore the case of the letters.
Given a text string T and N dictionaries of different languages, each containing Wi words, your task is to compute how many words of text T belong to each of the N languages, that is, appears in the respective dictionary.
The first line of input contains an integer N, indicating the amount of languages to consider.
The following N lines each describe one language, first with a string of lowercase letters indicating the name of the language, then an integer Wi indicating the quantity of words in that language, followed by the Wi words in that dictionary, all formed by lowercase letters and separated by a single space.
The last line of the input contains the text T to be analyzed and can contains both letters and non-letter characters (but will never contain accentuated characters).
The output should contain N lines.
Each of these lines should describe the amount of words of a text in a language (in the same order as the languages come in the input) in the format "LANGUAGE_NAME: NUMBER_WORDS".The following limits are guaranteed in all the test cases that will be given to your program:
1 ≤ |T| ≤ 500 000 | Size of the text to consider | |
1 ≤ N ≤ 10 | Number of languages | |
1 ≤ W1 + W2 + ... + WN≤ 200 000 | Sum of the words in all languages |
Example Input | Example Output |
3 pt 5 algoritmos estruturas dados tabela dispersao es 5 algoritmos estructuras datos tabla dispersion en 5 algorithms structures data table hash Algoritmos e Estruturas de Dados: uma UC da LEIC :) |
pt: 3 es: 1 en: 0 |
Explanation of Example Input
The text is "Algoritmos e Estruturas de Dados: uma UC da LEIC :)" with 9 words: Algoritmos, e, Estruturas, de, Dados, uma, UC, da, LEIC
3 of the words appear in the "pt" dictionary: Algoritmos, Estruturas e Dados.
1 of the words appears in the "es" dictionary: Algoritmos.
None of the words appear in the "en" dictionary.