You also have the optional choice to redesign your database based on the feedback you have received from the instructor and the TAs and your experience developing the TBP App so far. If you do choose to redesign your database in a significant way, please, include your new ER diagram, named TBP_ERD_v2.??? with your submission. Note that adding or deleting several attributes or adding a few new relationships is not a significant redesign. If you do choose these option, please, make sure that the relations in your database correspond (i.e. have the same names) to the entity sets and relationships in your ER diagram.
Develop a larger amount of data for your database and load it into your relations using the SQL load command. The data you generate and load should be on the order of:
To create more data for your relations, you can write a program in any programming language you like. The program should create large files consisting of records in an ASCII text-only format acceptable to the MySQL bulk loader. Then you should load the data into your TBP relations. If you are using real data, your program will need to transform the data into files of records conforming to your TBP schema. Otherwise you will write a program to fabricate data: your program will generate either random or nonrandom (e.g., sequential) records conforming to your schema. Note that it is acceptable for your data values to be meaningless (i.e. random sequence of characters instead of actual names).
If the semantics of your schema includes relations that are expected to be relatively small (e.g., states in the US), it is fine to use some small relations, but please ensure that you have relations of the sizes prescribed above as well. When writing a program to fabricate data, there are two important points to keep in mind:
You will need to submit your new create_db_large.sql and populate_db_large.sql scripts along with all data files referenced in your populate_db_large.sql script.
To demonstrate the size of your new relations, please, generate a script file called count.sql that finds the number of tuples in every one of your relations.
Extra credit: Develop a single query that returns the name of every table along with its size. For example, if you had only three tables: Users, Tweets, and Follows and their sizes were 10000, 50000, and 100000, the result of your single query should look like this:
+------------+------------+ | table_name | table_size | +------------+------------+ | Users | 10000 | +------------+------------+ | Tweets | 50000 | +------------+------------+ | Follows | 100000 | +------------+------------+
Develop and test at least ten SQL queries using the JOIN syntax discussed in class. At least five of your queries should involve aggregation. To receive full credit, you should use all JOIN variations discussed in class (ON, USING, NATURAL, OUTER) and aggregation with and without GROUP BY clause. Please, generate a script file called query5.sql with all of your join and aggregation queries. Your queries must not produce empty results and also must complete in a reasonable amount of time (a few minutes). If any of your queries takes a long time to complete, consider modifying it by adding more conditions that limit the size of the intermediate results or final result. For example, if your query finds all pairs of users that follow at least one user in common:
SELECT DISTINCT F_1.username_follower AS user1, F_2.username_follower AS user2 FROM Follows F_1 JOIN Follows F_2 USING (username_leader) WHERE F_1.username_follower <> F_2.username_follower;and it takes 30 minutes or more to produces millions of results, consider adding some conditions that limit the users from F_1 and F_2. For example, you can only consider usernames starting with the letters "a" and "b":
SELECT DISTINCT F_1.username_follower AS user1, F_2.username_follower AS user2 FROM Follows F_1 JOIN Follows F_2 ON (F_1.username_leader = F_2.username_leader AND F_1.username_follower LIKE "a%" AND F_2.username_follower LIKE "b%") WHERE F_1.username_follower <> F_2.username_follower;
Note that you do need to submit the scripts that populates your large database as well as the corresponding data files
You will submit your homework using the hwsubmit command. Please, create a new directory called hw5_<your_username>, e.g. hw5_evtimov, and move there all of the relevant scripts and all data files. For this assignment, there should be the four scripts: create_db_large.sql, populate_db_large.sql, count.sql and query5.sql, all the large data files and, if you have chosen to redesign your database, your new ER diagram TBP_ERD_v2.???. Then, you can submit your work by typing (<path_to> is the directory path to your hw5 directory).
hwsubmit cspp53001 <path_to>/hw5_<your_username>
Please make sure that all of the relevant files and no others are in that directory.