========= SQL Notes ========= Creating tables --------------- Create a table with information about candidates for federal election using data from the FEC: .. code:: sql CREATE TABLE candidate (committee_id varchar(10), lastname varchar(50), firstname varchar(50), party varchar(6), city varchar(20), state varchar(2), zip integer, cand_id varchar(10), district varchar(2), constraint pk_candidate primary key (committee_id)); Import data into the table using ``sqlite3`` command-line interface: .. code:: .separator "," .import candidate.csv candidate Create a table for contribution data from the same course. .. code:: SQL CREATE TABLE contribution (cand_id varchar(10), amount integer, city varchar(20), state varchar(10), zip integer, month integer, year integer, constraint fk_contribution foreign key (cand_id) references candidate (cand_id)); Basic SELECT-FROM-WHERE queries: -------------------------------- List all candidates from a specific party using basic selection: .. code:: SQL SELECT * FROM candidate WHERE party = "REP"; Add clause to indicate that case should be ignored when doing the equality check: .. code:: SQL SELECT * FROM candidate WHERE party = "rep" COLLATE NOCASE; List the names of all the candidates using projection: .. code:: SQL SELECT lastname, firstname FROM candidate; Combine selection and projection to list the names of all the republican candidates: .. code:: SQL SELECT lastname, firstname FROM candidate WHERE party = "REP"; Add clause to limit the size of the output: .. code:: SQL SELECT lastname, firstname FROM candidate WHERE party = "REP" LIMIT 10; Basic Aggregation ----------------- Count the number of republicans using basic aggregation: .. code:: SQL SELECT count(*) FROM candidate WHERE party = "REP"; Add in a name for the aggregate value using AS: .. code:: SQL SELECT count(*) AS cnt FROM candidate WHERE party = "REP"; Count the number of candidates in each party using grouping: .. code:: SQL SELECT party, count(*) AS cnt FROM candidate GROUP BY party; Add in sorting by counts: .. code:: SQL SELECT party, count(*) AS cnt FROM candidate GROUP BY party ORDER BY cnt; Change sort order and use limit to pick out top 10 parties by number of candidates: .. code:: SQL SELECT party, count(*) AS cnt FROM candidate GROUP BY party ORDER BY cnt DESC LIMIT 10; Filter on group values using HAVING to find all the parties with at least 100 candidates. .. code:: SQL SELECT party, count(*) AS cnt FROM candidate GROUP BY party HAVING cnt > 100 ORDER BY cnt; Aggregation examples using contribution data -------------------------------------------- What was the total number of contributions made by residents of each state? .. code:: SQL SELECT state, count(*) AS total FROM contribution GROUP BY state ORDER BY state; If you want to compute the total amount per state: replace ``count(*)`` with ``sum(amount)``. If you want to compute the maximum contribution from each state replace ``count(*)`` with ``max(amount)``. Replacing ``count(*)`` with ``min(amount)`` illustrates an issue: -1 as a contribution amount. Here is a query that ignores contributions with negative amounts: .. code:: SQL SELECT state, min(amount) AS total FROM contribution WHERE amount > -1 GROUP BY state ORDER BY state; Basic joins ----------- Here are two example tables:: payroll_ledger: employee_number check_amount 001 $1850 002 $2200 003 $1000 003 $1000 employee_records: employee_number pay_per_period 001 $1850 003 $2000 004 $1975 005 $2450 Compute a cross product of these tables, that is, match every tuple from one table with every tuple from the other table: .. code:: SQL SELECT * FROM payroll_ledger,employee_records; Here is the result:: employee_number check_amount employee_number pay_per_period --------------- ------------ --------------- -------------- 001 1850 001 1850 001 1850 003 2000 001 1850 004 1975 001 1850 005 2450 002 2200 001 1850 002 2200 003 2000 ... 003 1000 004 1975 003 1000 005 2450 Add a JOIN condition to extract the rows in which the employee id is the same in both employee id columns: .. code:: SQL SELECT * FROM payroll_ledger AS pl JOIN employee_records AS er ON pl.employee_number = er.employee_number; Here is the result:: employee_number check_amount employee_number pay_per_period --------------- ------------ --------------- -------------- 001 1850 001 1850 003 1000 003 2000 003 1000 003 2000 This type of join is known as an inner-join. Notice that IDs 002, 004 and 005 do not appear in the result because they occur in one table but not the other. In a left outer join: every row in the left-hand table occurs in the output. If a row does not have a mate in the right-hand table then the associated columns will be NULL. For example, the query: .. code:: SQL SELECT * FROM payroll_ledger AS pl LEFT JOIN employee_records AS er ON pl.employee_number = er.employee_number yields:: employee_number check_amount employee_number pay_per_period --------------- ------------ --------------- -------------- 001 1850 001 1850 002 2200 003 1000 003 2000 003 1000 003 2000 Right outer joins are the same except the roles of left and right are swapped. ``sqlite3`` does not support right outer joins. Full joins are the obvious combination of left-outer and right-outer join. ``sqlite3`` does not support full joins. JOIN examples using the FEC data -------------------------------- Find the candidate name, candidate home state, contribution home state, and contribution amount for all contributions: .. code:: SQL SELECT cand.lastname, cand.firstname, cand.state AS cand_state, cont.state AS cont_state, cont.amount FROM candidate AS cand JOIN contribution AS cont ON cand.cand_id = cont.cand_id; Select the top three candidates in terms of total contributions: .. code:: SQL SELECT candidate.cand_id, lastname, firstname, SUM(amount) AS total FROM candidate JOIN contribution ON candidate.cand_id = contribution.cand_id GROUP BY candidate.cand_id ORDER BY total DESC LIMIT 3; Find the top three candidates in terms of instate contributions: .. code:: SQL SELECT candidate cand.id_lastname, firstname, SUM(amount) AS total FROM candidate, contribution ON candidate.cand_id = contribution.cand_id WHERE candidate.state = contribution.state GROUP BY candidate.cand_id ORDER BY total DESC LIMIT 3; Indexes ------- Indexes are used to speed up joins, which can be helpful, but they take-up space. .. code:: SQL CREATE INDEX cont_amount ON contribution (amount); Nested queries -------------- It is possible to nest queries. Here are a couple of examples: How many candidates had contributions? .. code:: SQL SELECT COUNT(*) FROM candidate WHERE candidate.cand_id IN (SELECT cand_id FROM contribution); List candidates who had at least one contribution of more than three times the average contribution .. code:: SQL SELECT DISTINCT lastname, firstname FROM candidate JOIN contribution ON candidate.cand_id = contribution.cand_id WHERE amount > 3 * (SELECT AVG(amount) FROM contribution); Conditional clause ------------------ You can use the ``CASE`` clause to specify values conditionally. Here is an example query that uses a nested query with a ``CASE`` to compute an intermediate relation that separates in-state from out-of-state contributions: .. code:: SQL SELECT lastname, candidate.state AS homestate, candidate.cand_id AS cand_id, CASE WHEN candidate.state= contribution.state THEN amount else 0 END AS isc, CASE WHEN candidate.state != contribution.state THEN amount else 0 end AS oosc FROM candidate INNER JOIN contribution ON candidate.cand_id = contribution.cand_id) We can use this query in a more complex query that computes a relation that includes a candidate's last name, his home state, the total number of in state contributions he received, and the total number of out of state contributions he received. .. code:: SQL SELECT lastname, homestate, SUM(isc), SUM(oosc) FROM (SELECT lastname, candidate.state AS homestate, candidate.cand_id AS cand_id, CASE WHEN candidate.state= contribution.state THEN amount else 0 END AS isc, CASE WHEN candidate.state != contribution.state THEN amount else 0 end AS oosc FROM candidate INNER JOIN contribution ON candidate.cand_id = contribution.cand_id) GROUP BY cand_id; Modifying tables ---------------- The most efficient way to add information to a table is to do a bulk import. It is also possible to add one tuple at a time, using the ``INSERT`` statement: .. code:: SQL INSERT INTO VALUES (X,Y,Z) Here is an example use: .. code:: SQL INSERT INTO candidate VALUES ("HHHH", "DUPONT", "TODD", "LIB", "CHICAGO", "IL", "60637", "CCCCC", "01"); You can also specify the particular columns that will be populated: .. code:: SQL INSERT INTO
(, , ) VALUES (X,Y,Z) You can update an existing tuple using the ``UPDATE`` statement: .. code:: SQL UPDATE
SET = , = , ... WHERE For example, .. code:: SQL UPDATE candidate SET committee_id = "H55555" WHERE lastname="DUPONT"; You can delete tuples from a table using the ``DELETE`` statement: .. code:: SQL DELETE FROM
WHERE For example: .. code:: SQL DELETE FROM candidate WHERE lastname="DUPONT";