next up previous
Next: What to deliver?

Artificial Intelligence: Assignment 4 - Decision Trees
Submit by: 28/05/2017
April 25th, 2017

Decision Trees

Using as a reference the material available in Sections 18.1 to 18.3 of our textbook, Artificial Intelligence: a Modern Approach, by Peter Norvig and Stuart Russell (3ed), implement and algorithm for induction of decision trees (similar to the ID3, shown in the book, Figure 18.5). Use as node selection function the information gain defined in page 704.

The input to your algorithm will be a set of examples in CSV (Comma Separated Value) format. This set will have several attributes (columns of your CSV table), being the last one the variable of interest for classification. The output of your program must be in the following format:

<attribute>
    value1:
        <attribute>
            value1: class1 (counter1)
            value2: class2 (counter2)
    value2: class3 (counter3)
    value3:
        <attribute>
            value1:
               <attribute>
                   value1: class4 (counter4)
                   value2: class2 (counter5)
            value2: class3 (counter6)

Where, attribute is the root of each subtree, value# is one of the values of the attribute (one of the branches of your tree), class# is the class value assigned to that branch in the tree and counter# is a counter of the number of examples corresponding to that tree branch.

Examples of datasets for testing can be found in the following files: http://www.dcc.fc.up.pt/~ines/aulas/1617/IA/t4/restaurant.csv, http://www.dcc.fc.up.pt/~ines/aulas/1617/IA/t4/weather.csv and http://www.dcc.fc.up.pt/~ines/aulas/1617/IA/t4/iris.csv, described below.

In addition to generate a decision tree, your program must also be prepared to accept as input a file with test examples, i.e., after generating your tree, you must be able to apply your tree to new examples and be able to classify them appropriately. For example, suppose you generated a tree to the restaurant problem. Now, you can enter new examples (without any class/label) and be able to give them a proper class.

Important note: The dataset iris contains numerical values. You need to implement a way of discretizing these values in order to minimize the size of your decision tree.




next up previous
Next: What to deliver?
Inês Dutra 2017-04-25