SQL Notes¶
Creating tables¶
Create a table with information about candidates for federal election using data from the FEC:
CREATE TABLE candidate
(committee_id varchar(10),
lastname varchar(50),
firstname varchar(50),
party varchar(6),
city varchar(20),
state varchar(2),
zip integer,
cand_id varchar(10),
district varchar(2),
constraint pk_candidate primary key (committee_id));
Import data into the table using sqlite3
command-line interface:
.separator ","
.import candidate.csv candidate
Create a table for contribution data from the same course.
CREATE TABLE contribution
(cand_id varchar(10),
amount integer,
city varchar(20),
state varchar(10),
zip integer,
month integer,
year integer,
constraint fk_contribution foreign key (cand_id)
references candidate (cand_id));
Basic SELECT-FROM-WHERE queries:¶
List all candidates from a specific party using basic selection:
SELECT * FROM candidate WHERE party = "REP";
Add clause to indicate that case should be ignored when doing the equality check:
SELECT * FROM candidate WHERE party = "rep" COLLATE NOCASE;
List the names of all the candidates using projection:
SELECT lastname, firstname FROM candidate;
Combine selection and projection to list the names of all the republican candidates:
SELECT lastname, firstname FROM candidate WHERE party = "REP";
Add clause to limit the size of the output:
SELECT lastname, firstname FROM candidate WHERE party = "REP" LIMIT 10;
Basic Aggregation¶
Count the number of republicans using basic aggregation:
SELECT count(*) FROM candidate WHERE party = "REP";
Add in a name for the aggregate value using AS:
SELECT count(*) AS cnt FROM candidate WHERE party = "REP";
Count the number of candidates in each party using grouping:
SELECT party, count(*) AS cnt
FROM candidate
GROUP BY party;
Add in sorting by counts:
SELECT party, count(*) AS cnt
FROM candidate
GROUP BY party
ORDER BY cnt;
Change sort order and use limit to pick out top 10 parties by number of candidates:
SELECT party, count(*) AS cnt
FROM candidate
GROUP BY party
ORDER BY cnt DESC
LIMIT 10;
Filter on group values using HAVING to find all the parties with at least 100 candidates.
SELECT party, count(*) AS cnt
FROM candidate
GROUP BY party
HAVING cnt > 100
ORDER BY cnt;
Aggregation examples using contribution data¶
What was the total number of contributions made by residents of each state?
SELECT state, count(*) AS total
FROM contribution
GROUP BY state
ORDER BY state;
If you want to compute the total amount per state: replace
count(*)
with sum(amount)
. If you want to compute the maximum
contribution from each state replace count(*)
with
max(amount)
. Replacing count(*)
with min(amount)
illustrates an issue: -1 as a contribution amount.
Here is a query that ignores contributions with negative amounts:
SELECT state, min(amount) AS total
FROM contribution
WHERE amount > -1
GROUP BY state
ORDER BY state;
Basic joins¶
Here are two example tables:
payroll_ledger:
employee_number check_amount
001 $1850
002 $2200
003 $1000
003 $1000
employee_records:
employee_number pay_per_period
001 $1850
003 $2000
004 $1975
005 $2450
Compute a cross product of these tables, that is, match every tuple from one table with every tuple from the other table:
SELECT * FROM payroll_ledger,employee_records;
Here is the result:
employee_number check_amount employee_number pay_per_period
--------------- ------------ --------------- --------------
001 1850 001 1850
001 1850 003 2000
001 1850 004 1975
001 1850 005 2450
002 2200 001 1850
002 2200 003 2000
...
003 1000 004 1975
003 1000 005 2450
Add a JOIN condition to extract the rows in which the employee id is the same in both employee id columns:
SELECT *
FROM payroll_ledger AS pl JOIN employee_records AS er
ON pl.employee_number = er.employee_number;
Here is the result:
employee_number check_amount employee_number pay_per_period
--------------- ------------ --------------- --------------
001 1850 001 1850
003 1000 003 2000
003 1000 003 2000
This type of join is known as an inner-join. Notice that IDs 002, 004 and 005 do not appear in the result because they occur in one table but not the other.
In a left outer join: every row in the left-hand table occurs in the output. If a row does not have a mate in the right-hand table then the associated columns will be NULL. For example, the query:
SELECT *
FROM payroll_ledger AS pl LEFT JOIN employee_records AS er
ON pl.employee_number = er.employee_number
yields:
employee_number check_amount employee_number pay_per_period
--------------- ------------ --------------- --------------
001 1850 001 1850
002 2200
003 1000 003 2000
003 1000 003 2000
Right outer joins are the same except the roles of left and right are
swapped. sqlite3
does not support right outer joins. Full joins
are the obvious combination of left-outer and right-outer join.
sqlite3
does not support full joins.
JOIN examples using the FEC data¶
Find the candidate name, candidate home state, contribution home state, and contribution amount for all contributions:
SELECT cand.lastname, cand.firstname, cand.state AS cand_state, cont.state AS cont_state, cont.amount
FROM candidate AS cand JOIN contribution AS cont
ON cand.cand_id = cont.cand_id;
Select the top three candidates in terms of total contributions:
SELECT candidate.cand_id, lastname, firstname, SUM(amount) AS total
FROM candidate JOIN contribution
ON candidate.cand_id = contribution.cand_id
GROUP BY candidate.cand_id
ORDER BY total DESC
LIMIT 3;
Find the top three candidates in terms of instate contributions:
SELECT candidate cand.id_lastname, firstname, SUM(amount) AS total
FROM candidate, contribution ON candidate.cand_id = contribution.cand_id
WHERE candidate.state = contribution.state
GROUP BY candidate.cand_id
ORDER BY total DESC
LIMIT 3;
Indexes¶
Indexes are used to speed up joins, which can be helpful, but they take-up space.
CREATE INDEX cont_amount ON contribution (amount);
Nested queries¶
It is possible to nest queries. Here are a couple of examples:
How many candidates had contributions?
SELECT COUNT(*)
FROM candidate
WHERE candidate.cand_id IN (SELECT cand_id FROM contribution);
List candidates who had at least one contribution of more than three times the average contribution
SELECT DISTINCT lastname, firstname
FROM candidate JOIN contribution ON candidate.cand_id = contribution.cand_id
WHERE amount > 3 * (SELECT AVG(amount) FROM contribution);
Conditional clause¶
You can use the CASE
clause to specify values conditionally. Here
is an example query that uses a nested query with a CASE
to compute an
intermediate relation that separates in-state from out-of-state
contributions:
SELECT lastname, candidate.state AS homestate, candidate.cand_id AS cand_id,
CASE WHEN candidate.state= contribution.state THEN amount else 0 END AS isc,
CASE WHEN candidate.state != contribution.state THEN amount else 0 end AS oosc
FROM candidate INNER JOIN contribution ON candidate.cand_id = contribution.cand_id)
We can use this query in a more complex query that computes a relation that includes a candidate’s last name, his home state, the total number of in state contributions he received, and the total number of out of state contributions he received.
SELECT lastname, homestate, SUM(isc), SUM(oosc)
FROM (SELECT lastname, candidate.state AS homestate, candidate.cand_id AS cand_id,
CASE WHEN candidate.state= contribution.state THEN amount else 0 END AS isc,
CASE WHEN candidate.state != contribution.state THEN amount else 0 end AS oosc
FROM candidate INNER JOIN contribution ON candidate.cand_id = contribution.cand_id)
GROUP BY cand_id;
Modifying tables¶
The most efficient way to add information to a table is to do a bulk
import. It is also possible to add one tuple at a time, using the
INSERT
statement:
INSERT INTO <table>
VALUES (X,Y,Z)
Here is an example use:
INSERT INTO candidate
VALUES ("HHHH", "DUPONT", "TODD", "LIB", "CHICAGO", "IL", "60637", "CCCCC", "01");
You can also specify the particular columns that will be populated:
INSERT INTO <table> (<col name>, <col name>, <col name>)
VALUES (X,Y,Z)
You can update an existing tuple using the UPDATE
statement:
UPDATE <table>
SET <col name> = <expr>, <col name> = <expr>, ...
WHERE <expr>
For example,
UPDATE candidate
SET committee_id = "H55555"
WHERE lastname="DUPONT";
You can delete tuples from a table using the DELETE
statement:
DELETE FROM <table>
WHERE <expr>
For example:
DELETE FROM candidate
WHERE lastname="DUPONT";