EAD model
Classes
April 15th
Data at scale
(9:00 am recorded class), (11:30 am recorded class)
To complement Machine learning Recap: Nilsson's draft book - Chapter 1
To read for class of Wednesday 22nd:
April 20th
Apache Beam (9:00 am recorded class), (11:30 am recorded class)
Practical #1 (you will need a text file for this practical. I am using this):
(9:00 am recorded practical class)
(11:30h practical is together with the 11:30h theoretical part above)
April 22nd
Worksheet #1: Par DBs x MR (Proposed answers)
Recap on Parallel Programming (up to slide 11) (9:00 am recorded class), (11:30 am recorded class)
To read:
Software engineering for scientific big data analysis
April 27th
Worksheet #2: Software engineering for scientific big data analysis (Proposed answers)
ML pipeline using Apache Beam (9:00 am recorded class), (11:30 am recorded class)
Tensorflow and Apache Beam
Practical #2: full ML pipeline
Colab Notebook to run Molecules ML pipeline
April 29th
Recap on Parallel Programming (cont.) (from slide 11 to 28) (9:00 am recorded class), (11:30 am recorded class)
May 4th
Practical #3: Multiprocessing in python (9:00 am recorded class), (11:30 am recorded class)
To read:
Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence
From Persistent Identifiers to Digital Objects to Make Data Science More Efficient
May 6th
Worksheet #3: Tools for DS, ML (paper 1) + Digital Objects (paper 2) (Proposed answers)
Recap on Parallel Programming (cont.) (from slide 29 to end) (9:00 am recorded class), (11:30 am recorded class)
A View of Value and Veracity
To read:
A Scalable Data Science Workflow Approach for Big Data Bayesian Network Learning
May 11th
Practical #4 (9:00 am recorded class), (11:30 am recorded class)
May 13th
A View of Value and Veracity (cont.) (9:00 am recorded class), (11:30 am recorded class)
May 18th
Practical #5 Revisiting our full ML pipeline (9:00 am recorded class), (11:30 am recorded class)
Have you ever tried these tasks?
May 20th
A View of Value and Veracity (cont. from slide 26 till the end) (9:00 am recorded class), (11:30 am recorded class)
GPGPUs
May 25th
GPGPUs (cont. from slide 14 till the end)
Practical #6 (9:00 am recorded class), (11:30 am recorded class)
May 27th
Worksheet #4: A Scalable Data Science Workflow Approach for Big Data Bayesian Network Learning (Proposed answers)
Utilizing Database Architecture and Data Cleaning to Increase ‘Time to Science’ and Decrease Resources Needed in Research Data Science Workflows (to complement the slides) (work by Christine Kirkpatrick)
atSNP Infrastructure, a Case Study for Searching Billions of Records While Providing Significant Cost Savings over Cloud Providers) (to complement the slides) (work by Christopher Harrison)
(9:00 am recorded class), (11:30 am recorded class)
Recommended Reading