SQL Notes¶

Creating tables¶

Create a table with information about candidates for federal election using data from the FEC:

CREATE TABLE candidate
  (committee_id varchar(10),
   lastname varchar(50),
   firstname varchar(50),
   party varchar(6),
   city varchar(20),
   state varchar(2),
   zip integer,
   cand_id varchar(10),
   district varchar(2),
   constraint pk_candidate primary key (committee_id));

Import data into the table using sqlite3 command-line interface:

.separator ","
.import candidate.csv candidate

Create a table for contribution data from the same course.

CREATE TABLE contribution
  (cand_id varchar(10),
   amount integer,
   city varchar(20),
   state varchar(10),
   zip integer,
   month integer,
   year integer,
   constraint fk_contribution foreign key (cand_id)
    references candidate (cand_id));

Basic SELECT-FROM-WHERE queries:¶

List all candidates from a specific party using basic selection:

SELECT * FROM candidate WHERE party = "REP";

Add clause to indicate that case should be ignored when doing the equality check:

SELECT * FROM candidate WHERE party = "rep" COLLATE NOCASE;

List the names of all the candidates using projection:

SELECT lastname, firstname FROM candidate;

Combine selection and projection to list the names of all the republican candidates:

SELECT lastname, firstname FROM candidate WHERE party = "REP";

Add clause to limit the size of the output:

SELECT lastname, firstname FROM candidate WHERE party = "REP" LIMIT 10;

Basic Aggregation¶

Count the number of republicans using basic aggregation:

SELECT count(*) FROM candidate WHERE party = "REP";

Add in a name for the aggregate value using AS:

SELECT count(*) AS cnt FROM candidate WHERE party = "REP";

Count the number of candidates in each party using grouping:

SELECT party, count(*) AS cnt
FROM candidate
GROUP BY party;

Add in sorting by counts:

SELECT party, count(*) AS cnt
FROM candidate
GROUP BY party
ORDER BY cnt;

Change sort order and use limit to pick out top 10 parties by number of candidates:

SELECT party, count(*) AS cnt
FROM candidate
GROUP BY party
ORDER BY cnt DESC
LIMIT 10;

Filter on group values using HAVING to find all the parties with at least 100 candidates.

SELECT party, count(*) AS cnt
FROM candidate
GROUP BY party
HAVING cnt > 100
ORDER BY cnt;

Aggregation examples using contribution data¶

What was the total number of contributions made by residents of each state?

SELECT state, count(*) AS total
FROM contribution
GROUP BY state
ORDER BY state;

If you want to compute the total amount per state: replace count(*) with sum(amount). If you want to compute the maximum contribution from each state replace count(*) with max(amount). Replacing count(*) with min(amount) illustrates an issue: -1 as a contribution amount.

Here is a query that ignores contributions with negative amounts:

SELECT state, min(amount) AS total
FROM contribution
WHERE amount > -1
GROUP BY state
ORDER BY state;

Basic joins¶

Here are two example tables:

payroll_ledger:

  employee_number    check_amount
  001                $1850
  002                $2200
  003                $1000
  003                $1000

employee_records:

  employee_number    pay_per_period

  001                $1850
  003                $2000
  004                $1975
  005                $2450

Compute a cross product of these tables, that is, match every tuple from one table with every tuple from the other table:

SELECT * FROM payroll_ledger,employee_records;

Here is the result:

employee_number  check_amount  employee_number  pay_per_period
---------------  ------------  ---------------  --------------
            1850          001              1850
            1850          003              2000
            1850          004              1975
            1850          005              2450
            2200          001              1850
            2200          003              2000
   ...
            1000          004              1975
            1000          005              2450

Add a JOIN condition to extract the rows in which the employee id is the same in both employee id columns:

SELECT *
FROM payroll_ledger AS pl JOIN employee_records AS er
ON  pl.employee_number = er.employee_number;

Here is the result:

employee_number  check_amount  employee_number  pay_per_period
---------------  ------------  ---------------  --------------
001              1850          001              1850
003              1000          003              2000
003              1000          003              2000

This type of join is known as an inner-join. Notice that IDs 002, 004 and 005 do not appear in the result because they occur in one table but not the other.

In a left outer join: every row in the left-hand table occurs in the output. If a row does not have a mate in the right-hand table then the associated columns will be NULL. For example, the query:

SELECT *
FROM payroll_ledger AS pl LEFT JOIN employee_records AS er
ON pl.employee_number = er.employee_number

yields:

employee_number  check_amount  employee_number  pay_per_period
---------------  ------------  ---------------  --------------
001              1850          001              1850
002              2200
003              1000          003              2000
003              1000          003              2000

Right outer joins are the same except the roles of left and right are swapped. sqlite3 does not support right outer joins. Full joins are the obvious combination of left-outer and right-outer join. sqlite3 does not support full joins.

JOIN examples using the FEC data¶

Find the candidate name, candidate home state, contribution home state, and contribution amount for all contributions:

SELECT cand.lastname, cand.firstname, cand.state AS cand_state, cont.state AS cont_state, cont.amount
FROM candidate AS cand JOIN contribution AS cont
ON cand.cand_id = cont.cand_id;

Select the top three candidates in terms of total contributions:

SELECT candidate.cand_id, lastname, firstname, SUM(amount) AS total
FROM candidate JOIN contribution
ON candidate.cand_id = contribution.cand_id
GROUP BY candidate.cand_id
ORDER BY total DESC
LIMIT 3;

Find the top three candidates in terms of instate contributions:

SELECT candidate cand.id_lastname, firstname, SUM(amount) AS total
FROM candidate, contribution ON candidate.cand_id = contribution.cand_id
WHERE candidate.state = contribution.state
GROUP BY candidate.cand_id
ORDER BY total DESC
LIMIT 3;

Indexes¶

Indexes are used to speed up joins, which can be helpful, but they take-up space.

CREATE INDEX cont_amount ON contribution (amount);

Nested queries¶

It is possible to nest queries. Here are a couple of examples:

How many candidates had contributions?

SELECT COUNT(*)
FROM candidate
WHERE candidate.cand_id IN (SELECT cand_id FROM contribution);

List candidates who had at least one contribution of more than three times the average contribution

SELECT DISTINCT lastname, firstname
FROM candidate JOIN contribution ON candidate.cand_id = contribution.cand_id
WHERE amount > 3 * (SELECT AVG(amount) FROM contribution);

Conditional clause¶

You can use the CASE clause to specify values conditionally. Here is an example query that uses a nested query with a CASE to compute an intermediate relation that separates in-state from out-of-state contributions:

SELECT lastname, candidate.state AS homestate, candidate.cand_id AS cand_id,
       CASE WHEN candidate.state= contribution.state THEN amount else 0 END AS isc,
       CASE WHEN candidate.state != contribution.state THEN amount else 0 end AS oosc
FROM candidate INNER JOIN contribution ON candidate.cand_id = contribution.cand_id)

We can use this query in a more complex query that computes a relation that includes a candidate’s last name, his home state, the total number of in state contributions he received, and the total number of out of state contributions he received.

SELECT lastname, homestate, SUM(isc), SUM(oosc)
FROM (SELECT lastname, candidate.state AS homestate, candidate.cand_id AS cand_id,
             CASE WHEN candidate.state= contribution.state THEN amount else 0 END AS isc,
             CASE WHEN candidate.state != contribution.state THEN amount else 0 end AS oosc
      FROM candidate INNER JOIN contribution ON candidate.cand_id = contribution.cand_id)
GROUP BY cand_id;

Modifying tables¶

The most efficient way to add information to a table is to do a bulk import. It is also possible to add one tuple at a time, using the INSERT statement:

INSERT INTO <table>
VALUES (X,Y,Z)

Here is an example use:

INSERT INTO candidate
VALUES ("HHHH", "DUPONT", "TODD", "LIB", "CHICAGO", "IL", "60637", "CCCCC", "01");

You can also specify the particular columns that will be populated:

INSERT INTO <table> (<col name>, <col name>, <col name>)
VALUES (X,Y,Z)

You can update an existing tuple using the UPDATE statement:

UPDATE <table>
SET <col name> = <expr>, <col name> = <expr>, ...
WHERE <expr>

For example,

UPDATE candidate
SET committee_id = "H55555"
WHERE lastname="DUPONT";

You can delete tuples from a table using the DELETE statement:

DELETE FROM <table>
WHERE <expr>

For example:

DELETE FROM candidate
WHERE lastname="DUPONT";