{ "cells": [ { "cell_type": "markdown", "id": "048b6704", "metadata": {}, "source": [ "## Some NetworkX exercises\n", "\n", "Made for the network science course @ DCC/FCUP (2024/2025 edition).\n", "\n", "The exercises are not mandatory but can give you hands-on experience on using NetworkX while you (hopefully) you have some fun with synthetic and real world datasets.\n", "\n", "### Main Links\n", "\n", "- [NetworkX website](https://networkx.org/) | [Install](https://networkx.org/documentation/stable/install.html) | [Reference](https://networkx.org/documentation/stable/reference/index.html) | [Tutorial](https://networkx.org/documentation/stable/tutorial.html) | [Gallery](https://networkx.org/documentation/stable/auto_examples/index.html)\n", "\n", "### Importing the library into Python\n", "\n", "After starting Python, you can import the networkx module:" ] }, { "cell_type": "code", "execution_count": 18, "id": "60f40b59", "metadata": {}, "outputs": [], "source": [ "import networkx as nx" ] }, { "cell_type": "markdown", "id": "ae1372eb", "metadata": {}, "source": [ "### 1) Revisiting homework #01 key network properties\n", "\n", "Start by downloading the [homework #01](https://www.dcc.fc.up.pt/~pribeiro/aulas/ns2425/homework/homework1.pdf) statement.\n", "\n", "**1.1)** Create a NetworkX graph representing the network given in exercise 1 of the homemwork and draw it with labels on the nodes, checking if the output seems correct." ] }, { "cell_type": "code", "execution_count": 19, "id": "93b1a03d", "metadata": {}, "outputs": [], "source": [ "# Code for 1.1)\n" ] }, { "cell_type": "markdown", "id": "511f3603", "metadata": {}, "source": [ "**1.2)** Print the **degree** and (normalized) **degree distribution**." ] }, { "cell_type": "code", "execution_count": 20, "id": "0d490c90", "metadata": {}, "outputs": [], "source": [ "# Code for 1.2)" ] }, { "cell_type": "markdown", "id": "2e36ad4b", "metadata": {}, "source": [ "**1.3)** Print the **local clustering coefficient** (one node per line, labeled and rounded to 3 decimal places) and the **average clustering coefficient** of each node on the last line" ] }, { "cell_type": "code", "execution_count": 21, "id": "aae1abd4", "metadata": {}, "outputs": [], "source": [ "# Code for 1.3)" ] }, { "cell_type": "markdown", "id": "0101c345", "metadata": {}, "source": [ "**1.4)** Compute the (normalized) **betweenness** and **closeness** centrality. Print each one in decreasing order of importance as a list of tuples or a dictionary." ] }, { "cell_type": "code", "execution_count": 22, "id": "fea1dc7d", "metadata": {}, "outputs": [], "source": [ "# Code for 1.4)" ] }, { "cell_type": "markdown", "id": "225294a2", "metadata": {}, "source": [ "### 2) The Network Science Co-Authorship Network\n", "\n", "Start by downloading the [co-authorhips in network science graph file](https://websites.umich.edu/~mejn/netdata/netscience.zip) available in Mark Newman's [Network Data page](https://websites.umich.edu/~mejn/netdata/). You should answer all the following questions using only NetworkX.\n", "\n", "**2.1)** Read the file in NetworkX. How many **nodes** and **edges** does it have?" ] }, { "cell_type": "code", "execution_count": 23, "id": "fcd317b9", "metadata": {}, "outputs": [], "source": [ "# Code for 2.1)" ] }, { "cell_type": "markdown", "id": "40e3c669", "metadata": {}, "source": [ "**2.2)** How many **connected components** exist? What is the size of the **largest component**? How many components have less that 5 nodes?" ] }, { "cell_type": "code", "execution_count": 24, "id": "482de253", "metadata": {}, "outputs": [], "source": [ "# Code for 2.3)" ] }, { "cell_type": "markdown", "id": "9670aa6b", "metadata": {}, "source": [ "**2.3)** Consider only the largest component and compute for it the values of **closeness** centrality, **betweenness** centrality and **pagerank**. Print a table with the top-10 scientists for each metric. Are there any differences? Are there nodes which are always important (check in google scholar if the respective scientists are highly citeed and what type of work they produce)? Can you produce a sorted list of nodes in which the value is a weighted average of the three metrics?" ] }, { "cell_type": "code", "execution_count": 25, "id": "23fbc856", "metadata": {}, "outputs": [], "source": [ "# Code for 2.3) " ] }, { "cell_type": "markdown", "id": "eea05d16", "metadata": {}, "source": [ "**2.4)** Consider only the largest component and compute two possible partitions into **non-overlapping communities** using at least two different methods (we suggest louvain and label propagation). Were both equaly fast to execute? Which ones gives an higher value of modularity? Can you draw the network giving different colors for different partitions? (extra: if you wanted to attribute meaning to the community, what could you do?)" ] }, { "cell_type": "code", "execution_count": 26, "id": "2e017524", "metadata": {}, "outputs": [], "source": [ "# Code for 2.4) " ] }, { "cell_type": "markdown", "id": "a624d33e", "metadata": {}, "source": [ "### 3) Movie Galaxies\n", "\n", "For this exercise you will prepare something for your homework #2.\n", "\n", "Start by downloading the [Movie Galaxies dataset](https://doi.org/10.7910/DVN/T4HBA3).\n", "\n", "**3.1**) Run the Louvain algorithm (and any method you would also like to try) for all **773 movie networks** and compute the **modularity** of the respective partition. Print a list of all movies in decreasing order of modularity. Is there any high modularity movie that you know of? Perhaps you can use it for the homework? :)" ] }, { "cell_type": "code", "execution_count": 27, "id": "b3d3746c", "metadata": {}, "outputs": [], "source": [ "# Code for 3.1) " ] }, { "cell_type": "markdown", "id": "92d4de18", "metadata": {}, "source": [ "### 4) European Road Network\n", "\n", "For this (extra) exercise we will use a real world network of major european roads. Start by downloading the [euroroads network](http://konect.cc/networks/subelj_euroroad/) from [Konect](http://konect.cc/).\n", "\n", "**4.1)** Your first task is to create a NetworkX from the data, using the city names as labels (or at least as attributes) of the nodes. How many nodes and edges are there? What is the degree distribution? Does it make sense? What are the most central nodes in terms of degree, closeness, betweenness and pagerank centrality? Are they what you were expecting? Is there a single connnected component or more than one?" ] }, { "cell_type": "code", "execution_count": 28, "id": "658ed4dd", "metadata": {}, "outputs": [], "source": [ "# Code for 4.1) " ] }, { "cell_type": "markdown", "id": "16ad4ef4", "metadata": {}, "source": [ "**4.2)** Compute the shortest path from Porto to Stockholm. Follow the nodes in the path and check if it makes sense. Compute the average path length. Check the shortest path for other pairs of cities that you might find interesting." ] }, { "cell_type": "code", "execution_count": 29, "id": "c88ccae1", "metadata": {}, "outputs": [], "source": [ "# Code for 4.2) " ] }, { "cell_type": "markdown", "id": "c85b12e8", "metadata": {}, "source": [ "**4.3)** Update the graph so that the **edge weights reflect actual physical distance** between cities. We suggest you use the [geopy](https://geopy.readthedocs.io/en/stable/) module, which allows you for instance to query for the latitude and longitude of a city ([example](https://www.tutorialspoint.com/how-to-get-the-longitude-and-latitude-of-a-city-using-python)) and to [compute distances](https://geopy.readthedocs.io/en/stable/#module-geopy.distance).\n", "\n", "Repeat the previous questions calculations now incorporating the weights and see if they make sense and if the functions you are using are incorporating the weights." ] }, { "cell_type": "code", "execution_count": 30, "id": "e341a016", "metadata": {}, "outputs": [], "source": [ "# Code for 4.3) " ] }, { "cell_type": "markdown", "id": "4f9eebbe", "metadata": {}, "source": [ "**4.4)** Using geopy can you also collect the countries for each node and **add `country` as an attribute**? Can you create and draw a compressed graph were each node is a country and there is an edge between two countries if they have at least one road between them? The edge weight should represent the number of roads between the two countries and the drawing should represent that weight (both as a label and as the thickness of the respetive line). Can you also make a layout that incorporates the geographical location of the countries in order to give a realistic depiction?" ] }, { "cell_type": "code", "execution_count": 31, "id": "532ea3bf", "metadata": {}, "outputs": [], "source": [ "# Code for 4.4)" ] }, { "cell_type": "markdown", "id": "c153fd9c", "metadata": {}, "source": [ "-----------------------------------------------\n", "If you want more exercises just choose a (relatively small) dataset from you preferred network respository and experiment with it, or toy a bit with the many graph generators available. The possibilities are almost endless :)\n", "\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 5 }