Loading Data in MySQL

Overview
Creating the Data File
Loading Your Data
Entering NULL Values

Overview

To load data in MySQL, you need to create a data file, which specifies what data should be loaded and then run a command that specifies how data should be loaded.

Creating the Data File

Each line in the data file specifies one tuple to be loaded into <tableName>. It lists, in order, values for the attributes in the list specified in the LOAD command (described below), separated by <separator>. As a concrete example, consider relation Likes(drinker, beer) and a data file likes.dat that might look like:

zarko|Guinness
karin|Bud
karin| BudLite

As a result of loading likes.dat, the following tuples are inserted into Likes:

('zarko', 'Guinness')
('karin', 'Bud')
('karin', ' BudLite')

Warning: Note that the third line of likes.dat has a blank after "|". This blank is not ignored by the loader. The value to be loaded for attribute beer is ' Bud', a four-character string with a leading blank. It is a common mistake to assume that 'Bud', a three-character string with no leading blank, will be loaded instead. This can lead to some very frustrating problems that you will not notice until you try to query your loaded data, because ' Bud' and 'Bud' are different strings.

Loading Your Data

A simple load command has the following form:

LOAD DATA
LOCAL INFILE "<dataFile>"
REPLACE INTO TABLE <tableName>
FIELDS TERMINATED BY '<separator>'
(<list of all attribute names to load>)

<dataFile> is the name of the data file. The optional keyword LOCAL means that the filename is interpreted with respect to the client end of the connection, i.e. your home directory rather than the mysql server (interlibrary.cs.uchicago.edu)
<tableName> is the name of the table to which data will be loaded. Of course, it should have been created already before the bulk load operation.
The optional keyword REPLACE controls handling of input records that duplicate existing records on unique key values. REPLACE means that when there are duplicates the new tuples are used and the old ones are deleted. IGNORE means that the new tuples are ignored and the old ones kept.
<separator> specifies the field separator for your data file. This can be any single character. It is a good idea to use a character that you know will never appear in the data, so the separator will not be confused with data fields.
Finally, list the names of attributes of <tableName> that are set by your data file, separated by commas and enclosed in parentheses. This list need not be the complete list of attributes in the actual schema of the table, nor must it be arranged in the same order as the attributes when the table was created. Any attributes unspecified in the list of attributes will be set to their default values, if declared, and NULL otherwise.

As a concrete example, here is a LOAD command for loading likes.dat into Likes:

LOAD DATA
LOCAL INFILE "likes.dat"
REPLACE INTO TABLE Likes
FIELDS TERMINATED BY '|'
(drinker, beer)

For more information on the LOAD command consult the online MySQL documentation.

Entering NULL Values

You may specify NULL values simply by entering \N. For example, if we were entering integer values into a table with schema (a, b, c) specified in the LOAD command, the following lines in the data file:

3|\N|5
\N|2|4
1|\N|6
\N|\N|7

would result in inserting the following tuples in the relation:

(3, NULL, 5)
(NULL, 2, 4)
(1, NULL, 6)
(NULL, NULL, 7)

Keep in mind that any primary keys or other constraints requiring that values be non-NULL will reject tuples for which those attributes are unspecified.

If you do not wish to enter values for any row of a given column, you can, as mentioned above, leave that column out of the attribute list altogether.

This document was written originally for Prof. Jeff Ullman's CS145 class in Autumn, 1997; revised by Jun Yang for Prof. Jennifer Widom's CS145 class in Spring, 1998; further revisions by Jeff Ullman, Autumn, 1998; further revisions by Srinivas Vemuri for Prof. Jeff Ullman's CS145 class in Autumn, 1999; further revisions by Nathan Folkert for Prof. Jennifer Widom's CS145 class in Spring, 2001; further revisions by Xuehai Zhang for CS235 at the University of Chicago in Autumn 2002 and Autumn 2003. Adapted for MySQL by Svetlozar Nestorov for CS235 in Autumn, 2005; further revisions by Svetlozar Nestorov in Spring 2007; further revisions by Svetlozar Nestorov in Fall 2009, Spring 2010, Autumn 2010, Spring 2011, Fall 2011, Spring 2012, Fall 2012 for CS5PP3001.