BDCC - project 1

Big Data & Cloud Computing

Eduardo R. B. Marques, DCC/FCUP

Updates:

SUBMISSION INSTRUCTIONS AVAILABLE HERE

Summary

In this project you will have to program an AppEngine app that provides information about images taken from the Open Images dataset. Additionally, the app employs a TensorFlow model for image classification derived using AutoML Vision.

A partially completed application is provided as a starting point, including a functional TensorFlow model, and a demo application with all the features can be tested at https://bdcc22project.appspot.com.

The primary work will consist in:

Some additional challenges are also proposed (15 % of the grade).

Project work and delivery

Getting started

Create a new project

Recommended: in the Google cloud console, create a new project for all developments to avoid confusion with other work you may have done previously.

Get the code

Download the ZIP file for the initial application code - app.zip.

You may then upload it to the Cloud Shell environment, and then unzip it.

$ unzip app.zip

Setup Cloud shell

Make sure you have the Pillow package installed.

$ pip3 install Pillow

Run the app for the first time

$ cd app
$ python3 main.py

Use the Web preview for development, and deploy only from time to time to test more stable versions.

Deploy it for the first time

$ gcloud app deploy

Note:

Data model

To get access to the data in the BigQuery console use Add Data > Pin a project > Enter project name and specify bdcc22project.

You can use the bdcc22project.openimages BigQuery data set for initial development of the app, but at some point define your own BigQuery data set PROJECT_ID.openimages and use it in your app. For that purpose, write a Python Script or Colab notebook that defines the necessary BigQuery data set. Check the following links to obtain the data in CSV format:

References:

Application endpoints

The following application endpoints are already implemented:

You must implement the remaining ones:

The visual aspect of generated HTML pages may differ, but the displayed data and the functionality provided through HTML links should be similar.

References:

TensorFlow model using AutoML

Using a Python script or Colab notebook you must derive an AutoML dataset for a minimum of 10 image classes such that:

An example CSV format for an AutoML data set is provided in static/tflite/automl.csv (and also here: https://bdcc22project/appspot.com/static/tflite/automl.csv)

You may use the score_image.py program to test your TensorFlow model from the Cloud Shell command line.

$ ./score_image.py file1.jpg file2.jpg ... 

Additional challenges

Use of the Cloud Vision API

Develop an alternative app endpoint for image classification that makes use of label detection through the Google Cloud Vision API using the corresponding Python client API.

Explain what you did in the project report.

Define a Docker image for the app

AppEngine is enabled by containers internally. Why not define your own container explicitly for the app? All it takes is a DockerFile :)

The app should run in the Cloud Shell environment as a container instance, and then also through Cloud Run.

Explain what you did in the project report, in particular the meaning of commands in your DockerFile.