tags:
- SQL
movies.db
in SQLite..schema
shows us the tables created in this database. To implement the many-to-many relationship between the entities Person and Movie from the ER Diagram, we have a joint table here called stars
that references the ID columns of both people
and movies
as foreign key columns!movies
table, we can select from the table and limit results.
SELECT * FROM "movies" LIMIT 5;
SELECT * FROM "movies"
WHERE "title" = 'Cars';
.timer on
that enables us to time our queries.movies
— that is, the table movies
was scanned top to bottom, one row at a time, to find all the rows with the title Cars."title"
column in the movies
table.
CREATE INDEX "title_index" ON "movies" ("title");
EXPLAIN QUERY PLAN
before any query.DROP INDEX "title_index";
EXPLAIN QUERY PLAN
again with the SELECT
query will demonstrate that the plan would revert to scanning the entire database.Do databases not have implicit algorithms to optimize searching?
"title"
, there would be no automatic optimization.Would it be advisable to create a different index for every column in case we need it?
SELECT "title" FROM "movies"
WHERE "id" IN (
SELECT "movie_id" FROM "stars"
WHERE "person_id" = (
SELECT "id" FROM "people"
WHERE "name" = 'Tom Hanks'
)
);
EXPLAIN QUERY PLAN
ahead of this query again. This shows us that the query requires two scans — of people
and stars
. The table movies
is not scanned because we are searching movies
by its ID, for which an index is automatically created by SQLite!CREATE INDEX "person_index" ON "stars" ("person_id");
CREATE INDEX "name_index" ON "people" ("name");
EXPLAIN QUERY PLAN
with the same nested query. We can observe that
people
uses something called a COVERING INDEX
stars
also use a covering index, we can add "movie_id"
to the index we created for stars
. This will ensure that the information being looked up (movie ID) and the value being searched on (person ID) are both be in the index.stars
table.
DROP INDEX "person_index";
CREATE INDEX "person_index" ON "stars" ("person_id", "movie_id");
EXPLAIN QUERY PLAN
SELECT "title" FROM "movies" WHERE "id" IN (
SELECT "movie_id" FROM "stars" WHERE "person_id" = (
SELECT "id" FROM "people" WHERE "name" = 'Tom Hanks'
)
);
.timer on
let us execute the above query to find all the movies Tom Hanks has starred in, and observe the time it takes to run. The query now runs a lot faster than it did without indexes (in lecture, an order of magnitude faster)!"title"
column of the table movies
. If the movie titles were sorted alphabetically, it would be a lot easier to find a particular movie by using binary search."titles"
column. This copy is sorted and then linked back to the original rows within the movies
table by pointing to the movie IDs. This is visualized below.
CREATE INDEX "recents" ON "movies" ("titles")
WHERE "year" = 2023;
EXPLAIN QUERY PLAN
SELECT "title" FROM "movies"
WHERE "year" = 2023;
This shows us that the movies
table is scanned using the partial index.Are indexes saved in the schema?
.schema
and we will see the indexes created listed in the database schema.INSERT
).movies.db
on the terminal, we can use a Unix command
du -b movies.db
DROP INDEX "person_index";
VACUUM;
This might take a second or two to run. On running the Unix command to check the size of the database again, we can should see a smaller size. Once we drop all the indexes and vacuum again, the database will be considerably smaller than 158 MB (in lecture, around 100 MB).Is it possible to vacuum faster?
If a query to delete some rows doesn’t actually delete them, but only marks them as deleted, could we still retrieve these rows?
accounts
that stores account balances.
accounts
database after the first update to Bob’s account but before the second update to Alice’s account, they could get an incorrect understanding of the total amount of money held by the bank.bank.db
in our terminal so we can implement a transaction for transferring money from Alice to Bob!accounts
table.
SELECT * FROM "accounts";
We note here that Bob’s ID is 2 and Alice’s ID is 1, which will be useful for our query.BEGIN TRANSACTION;
UPDATE "accounts" SET "balance" = "balance" + 10 WHERE "id" = 2;
UPDATE "accounts" SET "balance" = "balance" - 10 WHERE "id" = 1;
COMMIT;
Notice the UPDATE
statements are written in between the commands to begin the transaction and to commit it. If we execute the query after writing the UPDATE
statements, but without committing, neither of the two UPDATE
statements will be run! This helps keep the transaction atomic. By updating our table in this way, we are unable to see the intermediate steps."balance"
column in accounts
has a check constraint to ensure that it has a non-negative value. We can run .schema
to check this.)ROLLBACK
. Once we begin a transaction and write some SQL statements, if any of them fail, we can end it with a ROLLBACK
to revert all values to their pre-transaction state. This helps keep transactions consistent.
BEGIN TRANSACTION;
UPDATE "accounts" SET "balance" = "balance" + 10 WHERE "id" = 2;
UPDATE "accounts" SET "balance" = "balance" - 10 WHERE "id" = 1; -- Invokes constraint error
ROLLBACK;
How do we decide when a transaction can get an exclusive lock? How do we prioritize different kinds of transactions?
What is the granularity of locking? Do we lock a database, a table or a row of a table?
BEGIN EXCLUSIVE TRANSACTION;
If we do not complete this transaction now, and try to connect to the database through a different terminal to read from the table, we will get an error that the database is locked! This, of course, is a very coarse way of locking because it locks the entire database. Because SQLite is coarse in this manner, it has a module for prioritizing transactions and making sure an exclusive lock is obtained only for the shortest necessary duration.