Data Structure Implementation and Empirical Analysis
For your 1st project of this curricular unit you are expected to implement some of the data structures we talked about in the programming language of your preference. You should send the project by email to me and schedule a (remote) presentation of your work (at a date/time convenient for both).
- Deadline: 14th of April (presentations: preferably before the 20th of April)
- Groups: preferably with 2 students (doing individually is allowed) [send me an email when you have formed a group]
- Programming language: you can use whatever language you want (but you must use the same language for all data structures implemented, for comparison purposes)
The expected deliverables are the following:
- Code (with at least basic comments allowing me for looking at what you did)
- Instructions/Quick Manual on how to compile/execute your code (ex: Makefile or README text file)
- Datasets that you used to test your code (ex: generated by you)
- Brief report on the algorithmic ideas used, complexities associated and basic running and/or memory empirical results using the datasets you used/created. This report should be delivered as PDF, preferably prepared with LaTeX (if you use other software, please be sure to make it presentable). You should include thinks like worst and average case scenario, comparison between different data structures or variations, etc.
You are expected to produce the code on your own. Any ideas/code borrowed by someone else should be explicitly pointed out by you on the code or report, complete with references to the material you used (ex: urls).
You should implement at least two dynamic set data structures OR two spatial data structures (of the following) and do some tests to show pros and cons. You should try to implement the same basic set of operations (at last inserting and searching; possibly also removing and range search) so that you directly compare efficiency of data structures on the datasets.
Dynamic Set Data Structures:
- Deterministic Balanced Binary Search Trees (at least one):
- AVL Trees
- Red Black Trees
- Splay Trees
- Probabilistic Data Structures (at least one):
- Treaps (with randomized priorities)
- Skip Lists
- Bloom Filters
Spatial Data Structures:
- Quadtrees (and variations)
- KD-trees
- 2D range trees
If you want to add value to your work, you can do the following (in this order of priority - if you have other improvement ideas let me know before you start doing them):
- Add more data structures and/or variations (ex: cover both topics and/or add more than two data structures from the list or from some other suggestion you have - make sure you message me so that I can help you evaluate the amount of work needed)
- Add more operations (removing, range search, lower/upper bound, minimum, maximum, frequency count, etc)
- Add more general support for several data types (allow any comparable type and not just ints, strings, etc)
- Add a generic number of dimensions (as a parameter of the spatial data structures)
- Add visualization (show visually the state of the data structures)