SQL tutorial: Learn SQL on PostgreSQL
SQL tutorial: Learn SQL on PostgreSQL
Follow this step-by-step guide to install PostgreSQL, load data, connect to a database, and execute SQL queries including table joins
PostgreSQL is an open-source, object-relational (also called extended relational) database management system. Modern relational database features in PostgreSQL include complex queries, foreign keys, triggers, updatable views, transactional integrity, and multi-version concurrency control. Users can extend PostgreSQL with new data types, functions, operators, aggregate functions, index methods, and procedural languages.
With more than 20 years of development and deployment behind it, PostgreSQL is a solid open-source database that rivals even commercial relational databases in many respects. You can install it on Linux (all recent distributions), Windows (Windows 2000 SP4 and later), FreeBSD, OpenBSD, NetBSD, MacOS, AIX, HP/UX, and Solaris. You can also find a hosted high-performance version of PostgreSQL in Amazon Aurora, and a wire-compatible distributed implementation in CockroachDB.
From here on in, I will use the names PostgreSQL and Postgres (not all-caps) interchangeably, as is the general practice. I can pronouncePostgres; I never know whether to say Post-gres-Q-L or Post-gres-sequel.
In this tutorial, we will step through installing PostgreSQL, connecting to the database, loading some data, and running a variety of SQL queries. To cap it all off, we’ll execute a query against two related tables by using SQL join clauses. Along the way, I will suggest some GUI clients for Postgres you might want to use, and provide a brief overview of useful PostgreSQL extensions. Finally, I’ll leave you with a few good references to help you take your next steps with Postgres and SQL.
Install PostgreSQL
My InfoWorld colleague Serdar Yegulalp wrote a strong tutorial on installing and configuring PostgreSQL, “Get started with PostgreSQL 10.” I used that to inform my installation on an iMac, and wound up downloading and running the Mac-only Postgres.App shown in the screenshot below.
In addition to installing the latest production database version (10.5) with default configuration settings, I added the Postgres command-line utilities to my computer's path and installed both the cross-platform pgAdmin 4and the Mac-only Postico GUI tools, as shown below.
Since I have an active JetBrains subscription, I also installed DataGrip, a multi-database client tool. If I were doing this exercise on Windows, I might well use the visual SQL query builder and SQL command line in Alpha Anywhere.
Connect to a PostgreSQL database
The key things to know to connect to your Postgres database are its host, its assigned port, the database that you’d like to use initially, and your username and password. Several of these have well-known default values.
If you install Postgres on the same computer as the client, the host will be localhost. By default, there will be two visible databases, one named postgres and one using your user name. The port will be 5432 unless it was changed, typically at database startup.
The information above is independent of the client you use. For most purposes, the psql command line works just fine at the beginning. Later on, when you're trying to construct queries with complicated joins, you might want to use a graphical client, primarily to save yourself a lot of time and error-prone typing.
To connect with psql if your database uses the defaults and is on the computer you are using, make sure your database is running, and type
psql <enter>
at the command line:
If you instead see an error message to the effect that the command was not found, you may either add the Postgres command-line utilities to your path and restart your shell, or fully qualify the location of the executable.
Once you have psql running, if you type
help <enter>
you’ll see something like this:
The prompt tells you what database you are using. You can find out more with
\conninfo
, and change the connection with \c
:
Let’s try changing databases:
And now let’s switch back, in my case to database mheller:
Your default database and user names will differ from mine.
To execute SQL commands and queries, add a semicolon (
;
) when the query is complete and then press <enter>. Here we create a new database, connect to it, connect back to our default database, and finally drop the new database:
Database creates and drops can also be accomplished from the system shell. The relevant command-line utilities are
createdb
and dropdb
.Load data into a PostgreSQL database
Database design is an art, not a science, and takes considerable experience to get right. Even expert database designers often have to modify their designs over time to match the application data usage pattern, for example by adding indexes for common queries, or by adding foreign key constraints to enforce referential integrity.
Rather than taking weeks (or months) to learn database design, let’s load a stock database that’s already in Postgres format. I wanted to use one of the PgFoundry Sample Databases, but alas, that site is no longer active.
One alternative would be to use the three-table database from the PostgreSQL Exercises site; it’s simple enough, but I’ll leave that for you to work through later, since you can do all the exercises online, and I want you to use a database locally now. The SportsDB and DVD Rental (Pagila)samples are interesting, but a bit too complicated for a first go-round.
Instead, I want you to open https://github.com/pthom/northwind_psqland either clone or download and unpack the repository. Northwind was originally a Microsoft sample, but GitHub user pthom has adapted it for Postgres. I like Northwind mostly because I used to use it to teach SQL to developers, and became familiar with the data. The database diagram is as below:
Entity-relationship diagrams like this one come out of tools; this image looks a lot like it was generated by SQL Server Management Studio. Essentially, the rectangles are tables, the columns are shown within the rectangles, and the primary and foreign keys are marked with key icons. The connections between tables are the foreign key contraints.
You can load the database into PostgreSQL using the
create_db
script, assuming that you are on a machine with a Bash shell. If you are running on Windows, try renaming create_db.sh
to create_db.cmd
and deleting the first line of the script.
Open a shell in the directory where you cloned or unpacked northwind_psql and run the script. A successful database load will look something like this:
Now we can try querying the database.
Run SQL queries in PostgreSQL
To begin with, let’s try looking at a single table. From a command line run
psql
against the northwind
database. Your default username will probably work; if not, use the credentials user northwind_user
and password thewindisblowing
.
What are the regions defined in the database? The
select
query should look like the following. I got the table name from the diagram above.
That was easy. The asterisk notation says to return all fields in the table. Suppose we just wanted to see the region names in alphabetical order. That requires specifying the desired field name, the name of which we learned from the previous query, and adding an
order by
clause. Ascending order is the default.
Now let’s look at the product categories:
And let’s look at the beverage products, which we just saw have
category_id=1
. That requires a where
clause.
There are a lot of fields in
employees
, so let’s just look at a few of them. Here we are ordering by two fields, title
and last_name
.SQL tutorial: Learn SQL on PostgreSQL
Follow this step-by-step guide to install PostgreSQL, load data, connect to a database, and execute SQL queries including table joins
I happen to know that there are a lot of customers, so let’s just look at a few of them, using a
limit
clause. This table uses alphabetic customer IDs, so the default order will make sense.
How many rows are actually in the
customers
table? The count
function, one of many aggregates available in SQL, can tell us, without generating lots of output:Use SQL joins in PostgreSQL
At this point, we are going to create and run a query against multiple tables using
join
clauses, almost always inner join
clauses (the default). Join clauses can get squirrely to write by hand even if you refer to the database diagram, so I suggest using a graphical database client with a SQL builder.
The general pattern for a
join
clause is that you list the two related tables, and then specify the fields that must match. In the example below we are relating employees
and territories
through an intermediate table, employee_territories
. That’s a typical database design pattern when tables have a many to many relationship: One employee can have many territories, and one territory can belong to many employees. The on
portion of the join
clause defines the variables that must match for the rows to be included. Inner joins require that both sides be present in order to return the row in the result; outer joins allow either side to be present; and left and right joins require only the respective side to be present.
The query to list employees and their territories looks like this:
In the DataGrip IDE, the query (top right) and result table (bottom left) look like this:
49 rows were returned; I could scroll through the window to see them all.
Understand PostgreSQL extensions
Beyond standard SQL, PostgreSQL supports several advanced features: views, foreign keys, transactions, window functions, and schema inheritance. In addition to these, it supports geometric types, text-search types, JSON and JSONB types, and array types. When you look at the internals of PostgreSQL, you’ll find generic index types that make it reasonably easy to add custom data types and still search them efficiently.
Views are essentially saved queries, which can be treated as though they were tables. Foreign keys enforce table relationships, for example (in the case of school enrollments) forbidding you from adding a non-existent student to a class roster, or from deleting a class roster that has students enrolled.
Transactions allow a series of SQL statements to be rolled back if any of them fail, for example (in the case of bank transfers) rolling back a credit if the corresponding debit fails (e.g. for lack of funds). Window functions are like aggregates, but still display the rows being aggregated. Schema inheritance sounds convenient for defining tables, but currently has limitations that restrict its utility.
JSON, JSONB, and array types allow PostgreSQL to be used as a MongoDB-like document database. Geometric types support spatial and geographic information. Text-search types allow PostgreSQL to be used for Google-like full-text search. There’s also an XML type, but the implementation is currently incomplete.
Take your next steps with PostgreSQL and SQL
To learn database design and SQL in depth, you will want to work through at least one book or course, and probably more than one. I happen to like Database Design for Mere Mortals and SQL Queries for Mere Mortals, but you can’t really go wrong with any of the books from Addison-Wesley or O’Reilly. You also can’t really go wrong with any of the relevant online courses from reputable publishers and academic institutions.
Note that there are many dialects of SQL that differ from the Postgres dialect, especially when you get into specialized areas such as GIS and JSON support, or support for stored procedures. An inclusive list of Postgres books can be found on the PostgreSQL site.
For Postgres-specific information, an essential reference is the PostgreSQL documentation. You can get some of the same information locally at the command line with
man postgres
, and from within psql with the \?
and \h
commands, such as the following:
One very useful free interactive online resource is the PostgreSQL Exercises site. Additional online resources are listed on the PostgreSQL site.
Comments
Post a Comment