Next: About this document ...
Up: t1-V1
Previous: Data Dictionary
A report containing:
- Raw data
- describe variables according to their types:
interval-scaled, binary, nominal, ordinal, ratio-scaled. Be
aware that there are specific methods suitable to each type of
variable.
- Preliminary analysis (summaries, histograms, boxplots,
spread measures, density). These are interesting to be applied
to the raw data to ``uncover'' inconsistencies, outliers,
duplicates etc.
- List of main changes needed to be performed with the raw data.
- Preprocessed data
- Basic description (summaries, histograms, boxplots, spread measures, density).
- Analysis
- Bivariate analysis (correlations, regression)
- Multivariate analysis (multiple variable regression, mutual
information, cluster analysis - once more, be aware that some
methods used to calculate similarity (dissimilarity) depend on
the type of the variable. In the context of this dataset, when
performing custer analysis we are interested to
know if there are groups of similar patients)
- Predictive Models (use of supervised machine
learning. Suggestion: use WEKA)
- Decision Tree learning
- Support Vector Machines
- Comparison
- Discussion and Main Conclusions
This work is to be performed by groups of at most two people.
Next: About this document ...
Up: t1-V1
Previous: Data Dictionary
InĂªs de Castro Dutra
2015-10-21