Consider again the LA Restaurants and Market Health data set available at Kaggle: https://www.kaggle.com/cityofLA/la-restaurant-market-health-data. In particular, we consider the two CSV files: one for inspections; the other for violations.
1. Write an SQL script "firstname_lastname_load.sql" that does the following:
a. Creates a table, "inspections" for the inspection data set; and a table violations for the violation data set. Your tables should be stored in a database called inf551 with both user & password being inf551.
b. Loads the data in the csv files into the respective tables. You may refer to: https://dev.mysql.com/doc/refman/5.7/en/load-data.html for details on "load data" statement in MySQL.
Note that load.sql will assume the two data sets are located at the same directory in the name of "violations.csv" and inspections.csv, no need to give arguments from the command line.
2. Write an SQL query for each of the following questions. Submit the same in sql files, "firstname_lastname_a.sql", firstname_lastname_b.sql, and so on.
a. Find out names of facilities whose name contains "cafe" (case insensitive) and had a violation with code F030.
b. Find out names of facilities that have the highest inspection scores.
c. Find out which facility (by id) has the largest number of violations. Output the names of such facilities (ascending order).
d. Find out which facilities that had inspections done but do not have any violations (as recorded in the violations data set). Output names of such facilities (ascending order).
e. For each different letter grade in inspections, output the average score of facilities receiving the letter grade.
3. Write a Python script "firstname_lastname_good.py" that answers the question 2.d above. Note that your script should use Python MySQL connector to connect to the inf551 database mentioned above. Output the results to a file whose name is specified in command line.