layout | author | title | revision | version | description |
---|---|---|---|---|---|
default |
mattmc3 |
Modern SQL Style Guide |
2019-01-17 |
1.0.1 |
A guide to writing clean, clear, and consistent SQL. |
select * from modern.sql_style_guide as guide where guide.attributes in ('clean', 'clear', 'consistent') and guide.look = 'beautiful'
Purpose
These guidelines are designed to make SQL statements easy to write, easy to
read, easy to maintain, and beautiful to see. This document is to be used as a
guide for anyone who would like to codify a team’s preferred SQL style.
This guide is opinionated in some areas and relaxed in others. You can use this
set of guidelines, fork them, or make your own — the key here is that you pick a
style and stick to it. The odds of making everyone happy are low, so compromise
is a guiding principle to achieve cohesion.
It is easy to include this guide in Markdown format as a part of a project’s
code base or reference it here for anyone on the project to freely read.
This guide is based on various existing attempts at SQL standards including:
http://www.sqlstyle.guide and
Kickstarter guide. Due to its origins, it is
licensed under a Creative Commons Attribution-ShareAlike 4.0 International
License.
The example SQL statements used are based on tables in the
AdventureWorks database. Note that due to the use of the
existing AdventureWorks schema, some of the guidelines in this document
are not always followed, especially with regards to naming conventions.
Those discrepancies will be called out as they appear.
NOTE: This style guide is written for use with Microsoft SQL
Server, but much of it can be applied to any SQL database with some
simple modifications.
Principles
- We take a disciplined and practical approach to writing code.
- We treat SQL like any other source code, which should be checked into
source control, peer reviewed, and properly maintained. - We believe consistency in style is important, and we value craftsmanship, but
not to the exclusion of other practical concerns. - We demonstrate intent explicitly in code, via clear structure and comments
where needed. - We adhere to a consistent style for handwritten SQL so that our code can
thrive in an environment with many authors, editors, and readers.
Quick look
Before getting into all the specifics, here is a quick look at some examples
showing well formatted, beautiful SQL that matches the recommendations in
this style guide:
-- basic select example select p.Name as ProductName , p.ProductNumber , pm.Name as ProductModelName , p.Color , p.ListPrice from Production.Product as p join Production.ProductModel as pm on p.ProductModelID = pm.ProductModelID where p.Color in ('Blue', 'Red') and p.ListPrice < 800.00 and pm.Name like '%frame%' order by p.Name
-- basic insert example insert into Sales.Currency (CurrencyCode, Name, ModifiedDate) values ('XBT', 'Bitcoin', getutcdate()) , ('ETH', 'Ethereum', getutcdate())
-- basic update example update p set p.ListPrice = p.ListPrice * 1.05 , p.ModifiedDate = getutcdate() from Production.Product as p where p.SellEndDate is null and p.SellStartDate is not null
-- basic delete example delete cc from Sales.CreditCard as cc where cc.ExpYear < '2003' and cc.ModifiedDate < dateadd(year, -1, getutcdate())
Rules
General guidance
-
Favor using a «river» for vertical alignment so that a query can be
quickly and easily be scanned by a new reader. -
Comments should appear at the top of your query or script, and should explain
the intent of the query, not the mechanics. -
Try to comment things that aren’t obvious about the query (e.g., why a
particular filter is necessary, why an optimization trick was needed, etc.) -
Favor being descriptive over terseness:
GOOD:
select emp.LoginID as EmployeeUserName
BAD:
select emp.LoginID as EmpUsrNm
-
Follow any existing style in the script before applying this style guide.
The SQL script should have one clear style, and these rules should not be
applied to existing scripts unless the whole script is being changed to
adhere to the same style. -
Favor storing
datetime
anddatetime2
in UTC unless embedding timezone
information (datetimeoffset
) so that times are clear and convertible.
Use ISO-8601 compliant time and date information
(YYYY-MM-DD HH:MM:SS.SSSSS
) when referring to date/time data.
Casing
Do not SHOUTCASE or «Sentence case» SQL keywords (e.g., prefer select
, not
SELECT
or Select
). SHOUTCASED SQL is an anachronism, and is not appropriate
for modern SQL development. Using lowercase keywords is preferred because:
- UPPERCASE words are harder to type and
harder to read. - SQL syntax is not case-sensitive, and thus lowercase keywords work correctly
in all variants of SQL - No other modern languages use ALLCAPS keywords.
- Modern editors color code SQL keywords, so there is not a need to distinguish
keywords by casing. - If you are in an environment where your keywords are not colored (i.e. as a
string in another language), using a river for formatting provides a similar
benefit of highlighting important keywords without resorting to CAPS. - UPPERCASE IS ASSOCIATED WITH SHOUTING WHEN SEEN IN TEXT, IS HARD TO READ, AND
MAKES SQL FEEL MORE LIKE COBOL THAN A MODERN LANGUAGE.
If the SQL script you are editing already uses SHOUTCASE keywords, match that
style or change all keywords to lowercase. Favor bending the rules for the
sake of consistency rather than mixing styles.
Naming guidance
-
Names should be
underscore_separated
orPascalCase
but do not mix styles.GOOD:
select count(*) as the_tally, sum(*) as the_total ...
BAD:
select count(*) as The_Tally, sum(*) as theTotal ...
Tables
-
Do not use reserved words for table names if possible.
-
Prefer the shortest commonly understood words to name a table.
-
Naming a table as a plural makes the table easier to speak about. (e.g.
favoremployees
overemployee
) -
Do not use object prefixes or Hungarian notation (e.g.
sp_
,prc_
,vw_
,
tbl_
,t_
,fn_
, etc). -
Tables with semantic prefixes are okay if they aid understanding the nature
of a table (e.g. in a Data Warehouse where it is common to use prefixes like
Dim
andFact
). -
Avoid giving a table the same name as one of its columns.
-
Use a joining word for many-to-many joining tables (cross references) rather
than concatenating table names (e.g.Xref
):GOOD:
drivers_xref_cars
BAD:
drivers_cars
-
Tables should always have a primary key. A single column, auto-number
(identity) surrogate key is preferable. -
Natural keys or composite keys can be enforced with unique constraints in
lieu of making them a primary key. -
Composite keys make for verbose and slow foreign key joins.
int
/bigint
primary keys are optimal as foreign keys when a table gets large. -
Tables should always have
created_at
andupdated_at
metadata fields in
them to make data movement between systems easier (ETL). Also, consider
storing deleted records in archival tables, or having adeleted_at
field for
soft deletes. -
Don’t forget the needs of data analysts and ETL developers when designing your
model.
Columns
- Do not use reserved words for column names if possible.
- Prefer not simply using
id
as the name of the primary identifier for the
table if possible. - Do not add a column with the same name as its table and vice versa.
- Avoid common words like
Name
,Description
, etc. Prefer a descriptive
prefix for those words so that they don’t require aliases when joined to other
tables with similarly named columns. (NOTE: This guide uses
the AdventureWorks database, which commonly has columns namedName
against
this guide’s advice. Remember that an existing convention may be in place that
is beyond your control. ) - Do not use
Desc
as an abbreviation forDescription
. Spell it out, or use
some other non-keyword.
Aliases
- Aliases should relate in some way to the object or expression they are aliasing.
- As a rule of thumb the alias can be the first letter of each word in the object’s
name or a good abbreviation. - If there is already an alias with the same name then append a number.
- When using a subquery, prefix aliases with an
_
to differentiate them from
aliases in the outer query. - Always include the
as
keyword. It makes the query easier to read and is
explicit. - For computed data (i.e.
sum()
oravg()
) use the name you would give it were
it a column defined in the schema.
Whitespace
- No tabs. Use spaces for indents.
- Configure your editor to 4 spaces per indent, but prefer your SQL to indent
to the «river», and not to a set indent increment. - No trailing whitespace.
- No more than two blank lines between statements.
- No empty lines in the middle of a single statement.
- One final newline at the end of a file
- Use an .editorConfig file to enforce reasonable whitespace
rules if your SQL editor supports it:
# .editorConfig is awesome: https://EditorConfig.org # SQL files [*.{sql,tsql,ddl}] charset = utf-8 indent_style = space indent_size = 4 end_of_line = crlf trim_trailing_whitespace = true insert_final_newline = true
River formatting
Spaces may be used to line up the code so that the root keywords all end on the
same character boundary. This forms a «river» down the middle making it easy for
the reader’s eye to scan over the code and separate the keywords from the
implementation detail. Rivers are bad in typography, but helpful here.
Celko’s book describes using a river to vertically align your query.
Right align keywords to the river if you chose to use one. The on
clause in
the from
may have its own river to help align information vertically.
Subqueries should create their own river as well.
-- a river in the 7th column helps vertical readability select prdct.Name as ProductName , prdct.ListPrice , prdct.Color , cat.Name as CategoryName , subcat.Name as SubcategoryName from Production.Product as prdct left join Production.ProductSubcategory as subcat on prdct.ProductSubcategoryID = subcat.ProductSubcategoryID left join Production.ProductCategory as cat on subcat.ProductCategoryID = cat.ProductCategoryID where prdct.ListPrice <= 1000.00 and prdct.ProductID not in ( select _pd.ProductID from Production.ProductDocument _pd where _pd.ModifiedDate < dateadd(year, -1, getutcdate()) ) and prdct.Color in ('Black', 'Red', 'Silver') order by prdct.ListPrice desc, prdct.Name
-- alternately, a river in the a different column is fine if that is preferred -- due to longer keywords, but know that indenting can feel "off" if the -- `select` is not in the first column for the query select prdct.Name as ProductName , prdct.ListPrice , prdct.Color , cat.Name as CategoryName , subcat.Name as SubcategoryName from Production.Product as prdct left join Production.ProductSubcategory as subcat on prdct.ProductSubcategoryID = subcat.ProductSubcategoryID left join Production.ProductCategory as cat on subcat.ProductCategoryID = cat.ProductCategoryID where prdct.ListPrice <= 1000.00 and prdct.ProductID not in ( select _pd.ProductID from Production.ProductDocument _pd where _pd.ModifiedDate < dateadd(year, -1, getutcdate()) ) and prdct.Color in ('Black', 'Red', 'Silver') order by prdct.ListPrice desc, prdct.Name
Indent formatting
Using a river can be tedious, so if this alignment is not preferred by your
team, then a standard 4 space indent can be used in place of a river.
Major keywords starting a clause should occupying their own line. Major keywords
are:
- Select statement
select
into
from
where
group by
having
order by
- Insert statement additions
insert into
values
- Update statement additions
update
set
- Delete statement additions
delete
All other keywords are minor and should appear after the indent and not
occupy a line to themselves. Other than this section, this guide will stick to
showing «river» formatting examples.
-- Editors tend to handle indenting style better than river alignment. River -- formatting has advantages over indent formatting, but this style is -- acceptable. select prdct.Name as ProductName ,prdct.ListPrice ,prdct.Color ,cat.Name as CategoryName ,subcat.Name as SubcategoryName from Production.Product as prdct left join Production.ProductSubcategory as subcat on prdct.ProductSubcategoryID = subcat.ProductSubcategoryID left join Production.ProductCategory as cat on subcat.ProductCategoryID = cat.ProductCategoryID where prdct.ListPrice <= 1000.00 and prdct.Color in ('Black', 'Red', 'Silver') order by prdct.ListPrice desc, prdct.Name
select
clause
Select the first column on the same line, and align all subsequent columns
after the first get their own line.
select prdct.Color , cat.Name as CategoryName , count(*) as ProductCount from ...
If three or fewer columns are selected, have short names, and don’t need
aliased, you may chose to have them occupy the same line for brevity.
-- shortcut for small columns select p.Color, c.Name, p.ListPrice from ...
If using a select
modifier like distinct
or top
, put the first column
on its own line.
-- treat the first column differently when using distinct and top select distinct p.Color , c.Name as CategoryName from ...
Use commas as a prefix as opposed to a suffix. This is preferred because:
- It makes it easy to add new columns to the end of the column list, which is
more common than at the beginning - It prevents unintentional aliasing bugs (missing comma)
- It makes commenting out columns at the end easier
- When statements take multiple lines like windowing functions and
case
statements, the prefix comma makes it clear when a new column starts - It does not adversely affect readability
The comma should border the «river» on the keyword side.
GOOD:
select Name
, ListPrice
, Color
, CategoryName
...
BAD:
-- whoops! forgot a trailing comma because it's hard to see, making an -- accidental alias of `ListPrice Color` select Name, ListPrice Color, CategoryName ...
Always use as
to rename columns. as
statements can be used for additional
vertical alignment but don’t have to be:
GOOD:
select prdct.Color as ProductColor , cat.Name as CategoryName , count(*) as ProductCount from ... ...
BAD:
select prdct.Color ProductColor , cat.Name CategoryName , count(*) ProductCount from ... ...
Always rename aggregates, derived columns (e.g. case
statements), and
function-wrapped columns:
select ProductName , sum(UnitPrice * OrderQty) as TotalCost , getutcdate() as NowUTC from ...
Always use table alias prefixes for all columns when querying from more than one
table. Single character aliases are fine for a few tables, but are less likely
to be clear as a query grows:
select prdct.Color , subcat.Name as SubcategoryName , count(*) as ItemCount from Production.Product as prdct left join Production.ProductSubcategory as subcat on ...
Do not bracket-escape table or column names unless the names contain keyword
collisions or would cause a syntax error without properly qualifying them.
GOOD:
-- owner and status are keywords select Title , [Owner] , [Status] from Production.Document
BAD:
-- extra brackets are messy and unnecessary select [Title] , [Owner] , [Status] from [Production].[Document]
Windowing functions
Long Window functions should be split across multiple lines: one for each
clause, aligned with a river. Partition keys can share the same line, or be
split. Ascending order is an intuitive default and thus using an explicit asc
is not necessary whereas desc
is. All window functions should be aliased.
select p.ProductID , p.Name as ProductName , p.ProductNumber , p.ProductLine , row_number() over (partition by p.ProductLine , left(p.ProductNumber, 2) order by right(p.ProductNumber, 4) desc) as SequenceNum , p.Color from Production.Product p order by p.ProductLine , left(p.ProductNumber, 2) , SequenceNum
case
statements
case
statements aren’t always easy to format but try to align when
, then
,
and else
together inside case
and end
.
then
can stay on the when
line if needed, but aligning with else
is
preferable.
select dep.Name as DepartmentName , case when dep.Name in ('Engineering', 'Tool Design', 'Information Services') then 'Information Technology' else dep.GroupName end as NewGroupName from HumanResources.Department as dep order by NewGroupName, DepartmentName
from
clause
Only one table should be in the from
part. Never use comma separated
from
-joins:
GOOD:
select cust.AccountNumber , sto.Name as StoreName from Sales.Customer as cust join Sales.Store as sto on cust.StoreID = sto.BusinessEntityID ...
BAD:
select cust.AccountNumber , sto.Name as StoreName from Sales.Customer as cust, Sales.Store as sto where cust.StoreID = sto.BusinessEntityID ...
Favor not using the extraneous words inner
or outer
when joining tables.
Alignment is easier without them, they don’t add to the understanding of the
query, and the full table list is easier to scan without excessive staggering:
GOOD:
-- this is easier to format and read select * from HumanResources.Employee as emp join Person.Person as per on emp.BusinessEntityID = per.BusinessEntityID left join HumanResources.EmployeeDepartmentHistory as edh on emp.BusinessEntityID = edh.BusinessEntityID left join HumanResources.Department as dep on edh.DepartmentID = dep.DepartmentID
BAD:
-- verbosity for the sake of verbosity is not helpful -- `join` by itself always means `inner join` -- `outer` is an unnecessary optional keyword select * from HumanResources.Employee as emp inner join Person.Person as per on emp.BusinessEntityID = per.BusinessEntityID left outer join HumanResources.EmployeeDepartmentHistory as edh on emp.BusinessEntityID = edh.BusinessEntityID left outer join HumanResources.Department as dep on edh.DepartmentID = dep.DepartmentID
The on
keyword and condition can go on its own line, but is easier to scan if
it lines up on the join
line. This is an acceptable style alternative:
-- this is an easier format to scan visually, but comes at the cost of longer -- lines of code. select * from HumanResources.Employee as emp join Person.Person as per on emp.BusinessEntityID = per.BusinessEntityID left join HumanResources.EmployeeDepartmentHistory as edh on emp.BusinessEntityID = edh.BusinessEntityID left join HumanResources.Department as dep on edh.DepartmentID = dep.DepartmentID ...
Additional filters in the join
go on new indented lines. Line up using the
on
keyword:
GOOD:
select emp.JobTitle from HumanResources.Employee as emp left join HumanResources.EmployeeDepartmentHistory as edh on emp.BusinessEntityID = edh.BusinessEntityID left join HumanResources.Department as dep on edh.DepartmentID = dep.DepartmentID and dep.Name <> dep.GroupName -- multi-conditions start a new line where dep.DepartmentID is null
BAD:
select emp.JobTitle from HumanResources.Employee as emp left join HumanResources.EmployeeDepartmentHistory as edh on emp.BusinessEntityID = edh.BusinessEntityID left join HumanResources.Department as dep on edh.DepartmentID = dep.DepartmentID and dep.Name <> dep.GroupName -- needs a new line where dep.DepartmentID is null
Begin with inner join
s and then list left join
s, order them semantically,
and do not intermingle left join
s with inner join
s unless necessary. Order
the on
clause with joining aliases referencing tables top-to-bottom:
GOOD:
select * from Production.Product as prd join Production.ProductModel as prm on prd.ProductModelID = prm.ProductModelID left join Production.ProductSubcategory as psc on prd.ProductSubcategoryID = psc.ProductSubcategoryID left join Production.ProductDocument as doc on prd.ProductID = doc.ProductID
BAD:
select * from Production.Product as prd left join Production.ProductSubcategory as psc on psc.ProductSubcategoryID = prd.ProductSubcategoryID -- backwards join Production.ProductModel as prm -- intermingled on prm.ProductModelID = prd.ProductModelID -- backwards left join Production.ProductDocument as doc on prd.ProductID = doc.ProductID
Avoid right joins
as they are usually better written with a left join
GOOD:
select * from Production.Product as prd left join Production.ProductSubcategory as psc on ...
BAD:
select * from Production.ProductSubcategory as psc right join Production.Product as prd on ...
where
clause
Multiple where
clauses should go on different lines and align to the river:
select * from Production.Product prd where prd.Weight > 2.5 and prd.ListPrice < 1500.00 and Color in ('Blue', 'Black', 'Red') and SellStartDate >= '2006-01-01' ...
When mixing and
and or
statements, do not rely on order of operations and
instead always use parenthesis to make the intent clear:
select * from Production.Product prd where (prd.Weight > 10.0 and Color in ('Red', 'Silver')) or Color is null
Always put a semicolon on its own line when using them. This prevents common
errors like adding conditions to a where
clause and neglecting to move the
trailing semicolon:
GOOD:
-- The prefix semicolon is clear and easy to spot when adding to a `where` delete prd from Production.Product prd where prd.ListPrice = 0 and weight is null and size is null ; ...
BAD:
-- A trailing semicolon is sinister. -- We added some where conditions and missed it. -- This is a destructive bug. delete prd from Production.Product prd where prd.ListPrice = 0; -- dangerous and weight is null -- syntax error here, but the bad delete is valid and size is null ...
group by
clause
Maintain the same column order as the select
clause in the group by
:
GOOD:
select poh.EmployeeID , poh.VendorID , count(*) as OrderCount , avg(poh.SubTotal) as AvgSubTotal from Purchasing.PurchaseOrderHeader as poh group by poh.EmployeeID , poh.VendorID
BAD:
-- messing with the 'group by' order makes it hard to scan for accuracy select poh.EmployeeID , poh.VendorID , count(*) as OrderCount , avg(poh.SubTotal) as AvgSubTotal from Purchasing.PurchaseOrderHeader as poh group by poh.VendorID -- out of order , poh.EmployeeID
having
clause
A having
clause is just a where
clause for aggregate functions. The same
rules for where
clauses apply to having
.
Example:
select poh.EmployeeID , poh.VendorID , count(*) as OrderCount , avg(poh.SubTotal) as AvgSubTotal from Purchasing.PurchaseOrderHeader as poh group by poh.EmployeeID , poh.VendorID having count(*) > 1 and avg(poh.SubTotal) > 3000.00
order by
clause
Do not use the superfluous asc
in order by
statements:
GOOD:
-- asc is implied and obvious select per.LastName , per.FirstName from Person.Person per order by per.LastName , per.FirstName
BAD:
-- asc is clutter - it's never ambiguous when you wanted to sort ascending select per.LastName , per.FirstName from Person.Person per order by per.LastName asc -- useless asc , per.FirstName asc
Ordering by column number is okay, but not preferred:
-- This is okay, but not great. select per.FirstName + ' ' + per.LastName as FullName , per.LastName + ', ' + per.FirstName as LastFirst from Person.Person per order by 2
The by
keyword can sit on the other side of a 7th column river, but align
the order by columns:
select per.FirstName , per.LastName from Person.Person per order by per.LastName , per.FirstName
If three or fewer columns are in the order by
and have short names you may
chose to have them occupy the same line for brevity.
-- shortcut for small columns select per.FirstName, per.LastName from Person.Person per order by per.LastName, per.FirstName
Descriptive SQL Style Guide
Copyright (c) 2020 by Peter Gulutzan. All rights reserved.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; version 2 of the License.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
All references to «this program» or «software» mean «this guide», the Descriptive SQL Style Guide.
The copy of the GNU General Public License is the final section of this guide.
Version 0.1.2 2022-06-15
This is a descriptive style guide, like a dictionary that tells you what is common usage, with citations.
To find common usage I looked at vendor manuals, expert blogs, and prescriptive style guides
(guides that tell you what you should do). Status = alpha.
I think that anyone might use this to decide what code conventions are best for their own organizations,
based on orthodoxy, logic, style used for another language’s style guide, and chosen DBMS vendor.
The prescriptive style guides are the ones on GitHub, in English, that have more than 100 stars at time of writing:
Konstantin Taranov,
SQL Server Name Convention and T-SQL Programming Style,
841 stars.
Simon Holywell,
SQL Style guide.
811 stars. Partly based on the book «Joe Celko’s SQL Programming Style».
Matt Mazur,
Mazur’s SQL Style Guide.
441 stars. Mr Mazur says his guide is «opinionated».
Fred Benenson,
KickStarter, SQL Style Guide
240 stars. Some PostgreSQL emphasis.
Philipp Salvisberg,
«Trivadis PL/SQL & SQL Coding Guidelines»
62 stars + favourable reviews for example by Steven Feuerstein.
Also available as
pdf. Oracle emphasis.
Mark Donnelly,
SQL Style guide.
119 stars. Last updated in 2013.
For the «Names» sections I also consult these non-GitHub web pages because they have details about specific object types:
Tim Hall,
Oracle naming conventions.
Peter Gulutzan,
«SQL Naming Conventions».
Sehrope Sarkuni,
«How I Write SQL, Part 1: Naming Conventions».
Craig Mullins,
«On DB2 Naming Conventions».
Also at
datavail.com
and
db2portal.
Jeffrey Keller,
«An Unemotional Logical Look at SQL Server Naming Conventions».
Ben Brumm,
«SQL Best Practices and Style Guide».
For the «Format» sections I also consult this book chapter:
Phil Factor.
«SQL Server Team-Based Development. Chapter 1: Writing Readable SQL».
Section = Code Layout.
The vendor manuals are:
Oracle Release 20 or 19 SQL Language Reference.
DB2 manual 11.5.
SQL Server: 2019.
MySQL: 8.0.
MariaDB 10.
Tarantool 2.3.
Although I am a former or current employee of some of these companies,
I am only using information that is publicly available.
For SQL Developer’s
«code editor» also called «worksheet»
I rely on
www.thatjeffsmith.com
and
totierne.blogspot.com.
Finally, I will sometimes quote the ANSI/ISO SQL standard,
and many bloggers whom I will identify as they come up.
Inconsistency happens so I try to look for multiple examples.
In following sections I will identify style guides by the author’s surname,
and identify vendor manuals by the product’s name.
My definition of SQL style is:
consistently choosing words or formats that do not affect what the DBMS returns,
often describable as a list of rules that formalize what words or formats to choose,
for example saying INTEGER rather than int.
Or, more simply: what to say when there are two ways to say (almost) exactly the same thing.
I do not bother with rules that only affect one vendor.
I begin most sections with the word Choice: followed by settings that you can pick.
Then I describe what the choices mean, then what prescriptive guides say,
what I happen to have seen in vendor manuals (warning: sampling may be affected by chance),
and what some bloggers or other sources might have said if I happened to notice them.
I want to avoid looking biased for one style or one vendor,
so I do not use a consistent style in examples.
Contents
Choices
Keywords and letter case
Semicolons
Not-equal operator
Unnecessary keywords and operators
SELECT *
ORDER BY ordinal
Comments
Multiple-line comments
New or old style inner join
Data types
Literals
Long literals
Names (meaning)
Names (letter case)
Names (legal characters
Names (length)
Names (delimiter)
Names (qualifier)
Names (prefix or suffix)
Names of tables
Names of temporary tables
Names of views
Names of columns
Names of aliases and range variables
Names of constraints
Names of indexes
Names of triggers
Names of sequences
Names of functions (or: names of routines)
Names of savepoints
Names of collations
Names of variables or parameters
Dynamic SQL
Format terminology
Format choices terminology
Format symbols
Format parentheses
Format comments
Format line length
Indenting units
Indenting
Format clauses deciding what is a clause
Format clauses by indenting
Format clauses by right-aligning keywords
Format clauses by aligning contents
Format lists
Format conditions
Format subqueries
Format WITH
Format joins
Format INSERT
Format UPDATE
Format CREATE TABLE
Format CREATE VIEW
Format CREATE PROCEDURE or CREATE FUNCTION
Format CREATE TRIGGER
Format CASE expression
Format blocks the usual way
Format blocks with analogies to other guides and choices
Format DECLARE
Format overflow
Formatters or pretty printers
Contributors
GPL Version 2 License
Choices
Choice: worry about style?
I will put «Choice: …?» questions in most sections.
You do not need to care if …
You have a good-enough tool that does the job for you.
You only work with SQL occasionally, and mostly for yourself.
You see that fussing about appearance should be low priority.
You have read one of the prescriptive guides and are satisfied.
If any of those things apply to you, good, you are in the
majority and now you can go off and find better things to do.
You only need to care if …
You want to come up with a better tool.
You are in a team whose boss insists on consistency.
You would like to know what, if anything, justifies the rules.
If any of those things apply to you, too bad,
now you will have to check «worry about style = yes»
and make similar checks for all the «Choice:» matters
that follow.
Choices are not inheritable because I do not classify well.
Keywords and letter case
Choice: Keywords upper case or lower case?
Google Ngrams shows relative popularity of words and phrases in books, and it distinguishes upper versus lower case.
I used it for terms that are likely to appear only in database books:
CREATE TABLE is more popular than create table.
SELECT DISTINCT is more popular than select distinct.
TO SAVEPOINT is more popular than to savepoint.
EXECUTE IMMEDIATE is more popular than execute immediate.
INTEGER PRIMARY KEY is more popular than integer primary key.
So saying «in general keywords are upper case» has some evidence.
Similarly, a poll by Lukas Eder resulted in
a majority for SELECT in upper case.
Exception #1: when a keyword is not being used as a keyword.
For example, IMMEDIATE is in the SQL standard’s «non-reserved word» list, so I can use it as a column name.
I have trouble believing that the guides which say «capitalize keywords» mean that I should capitalize IMMEDIATE
in that circumstance. I think they really meant
«capitalize words that are not identifiers», which coincidentally excludes reserved words.
Exception #2: when trying to avoid SHOUTING or when trying to be like most other languages.
This might be reasonable when entering statements with a client that highlights properly.
See
this argument on stackoverflow.
Exception #3: data type names. See later section = Data types.
Prescriptive guides:
Benenson, Donnelly, Holywell, Salvisberg say: upper case.
Taranov says: upper case, except for data types.
Mazur says: lower case («It’s just as readable as uppercase SQL and you will not have to constantly be holding down a shift key.»)
Vendor manual examples:
MySQL, MariaDB, Oracle, SQL Server, Tarantool: upper case for keywords.
DB2: upper case for all words, whether or not they are keywords.
Oracle SQL Developer has an option «Case change» with choices «UPPER», «lower», «keep unchanged», and «Init cap»
(which presumably is intended for identifiers more than keywords).
Bloggers:
The Drupal manual says:
«Make SQL reserved words UPPERCASE. This is not a suggestion.
Drupal db abstraction commands will fail if this convention is not followed.»
Ian Hellström in «Execution Plans»
notes that Oracle uses a hash of the statement text to see whether it is in the
library cache. Thus even a tiny difference between two statements
can cause a cache miss and affect performance.
According to Don Burleson
capitalization of keywords does not matter, but
according to
Morton and Osborne and Sands
it does matter. Probably they’re looking at different versions.
Anyway, that means that maybe sometimes in theory inconsistency will affect performance.
A TPC-H example has all lower case.
Semicolons
Choice: End all statements with semicolons?
In standard SQL «;» is not part of a statement, it is a signal that the statement is over,
so it is required for direct SQL (such as SQL typed to a client program) and within BEGIN … END,
but not dynamic SQL (such as execute immediate, or SQLExecDirect in the call level interface).
Thus if a JDBC driver rejects a semicolon-terminated statement with
«SQLSyntaxErrorException: ORA-00911: invalid character»,
which has been known to happen, it is within its rights.
Also see this cartoon.
But all the vendor manuals either show that «;» is an optional part of a statement
(as in
this SQL Server bnf), or show it in examples.
The key problems are:
(1) if you do not know when statements end, you cannot run a script containing multiple statements
(2) it is hard to know when statements end, for example «SELECT 5 BEGIN» is a valid single statement
(in a certain dialect where BEGIN is not a reserved word and [AS] is not required),
but simple parsers could easily think it is more than one.
Of course, a client
might get confused and think that <semicolon> <newline> is end-of-statement
(this has happened with Toad [] within comments,
and with other clients within BEGIN … END blocks).
However, vendors have options for changing what the client thinks is the
terminator —
SET TERMINATOR for DB2,
DELIMITER for MySQL/MariaDB,
SET SQLTERMINATOR for Oracle,
set delimiter for Tarantool.
Of course, I like to emphasize that there is a cleaner solution —
use a client with a
recognizer that supports
client-side predictive parsing.
It is not true that Microsoft has deprecated the feature
«Not ending Transact-SQL statements with a semicolon».
Read the fine print of Microsoft’s
Deprecated Notice.
It says that the feature will be «supported in the next version»
although it «will be deprecated in a later version», which they have been saying
for
over a decade
and lack of ; has been known to
cause an error,
but they probably are holding back because there is so much installed code.
Prescriptive guides: Taranov says end with «;». Factor says «generally speaking» end with «;»
Vendor manuals:
Usually «;» is not in the BNFs or railroad diagrams
(although Oracle does show it and Microsoft does show it as [;] i.e. optional);
however, most manuals’ examples end with semicolons (except DB2’s).
Bloggers:
Dan Guzman says end with «;».
Don Burleson says it depends.
Jeff Smith says it depends.
Factor says «Use the semicolon to aid the reading of code, even where SQL syntax states that it is only optional.»
The best explanation of SQL Server anomalies that I have seen is in
«Rules of SQL formatting – Terminating SQL statements with semicolons».
Not-equal operator
Choice: Not-Equal Operator: <> or != or ^= or ¬=?
The main variants are != and <> which are accepted by all major vendors.
DB2 and Oracle also support ^= and ¬=.
¬ is the Unicode standard symbol U+00AC «NOT SIGN» but since it is not in 7-bit ASCII.
it will not be recognized if you have the wrong code page.
Accordingly Oracle
says
«Some forms of the inequality condition may be unavailable on some platforms»
and IBM
says
«A logical not sign (¬) can cause parsing errors in statements passed from one DBMS to another.»
!= is found in C, Java, Perl, Python and the like.
The Python manual used to accept <> along with the warning
«The <> spelling is considered obsolescent.»
… now it does not mention <> at all.
The idea is that ! is a «not sign» (if you know C and do not know Unicode) and
= is an equals sign, so this should be easy reading,
<> is found in Access, BASIC, Pascal, and Rexx (remember them?).
The idea is that <= means «less than or equal», so in a consistent world <> means «less than or greater than».
<> is standard but != is more common in Oracle examples that I have seen.
That reminds me that once upon a time someone reported a case where
!= was faster than <> and
spawned many answers.
Prescriptive guides:
Mazur says: Use != over <> … Simply because != reads like «not equal» which is closer to how we’d say it out loud.»
Benenson uses != in an example.
Vendor manuals:
Oracle’s inequality test example uses !=
MySQL’s example uses both != and <>
SQL Server example uses both != and <>
DB2’s example uses <>
Bloggers:
Itzik Ben-Gan says «This case should be a nobrainer: go for the standard one!»
An iBATIS thread shows a
user getting into trouble because XML complains about an SQL statement that contains <>
and seeing two proposed solutions: use cdata, or switch to !=.
Unnecessary keywords and operators
Choice: Add unnecessary keyword INNER in INNER JOIN?
Choice: Add unnecessary keyword OUTER in LEFT OUTER JOIN?
Choice: Add unnecessary keyword INTO in INSERT INTO?
Choice: Add unnecessary parentheses around low-precedence expressions?
Choice: Add unnecessary keyword COLUMN for ALTER TABLE ADD?
This is pleonasm, which Cambridge
defines as
«the use of more words than are needed to express a meaning, done either unintentionally or for emphasis;»
According to Merriam-Webster
pleonasm means «the use of more words than those necessary to denote mere sense»
and «Pleonasm is commonly considered a fault of style, but it can also serve a useful function.»
That is perhaps the reason that I can not find prescriptive guides or vendor examples
that advocate SELECT ALL, or UNION DISTINCT or ORDER BY x ASC.
But saying Add unnecessary keywords and operators = yes
should have these three effects, which are activated in at least one prescriptive guide or blog:
[INNER] before JOIN
Mazur says: «Include inner for inner joins», probably the reasoning again is that it makes something easier to read.
Examples in vendor manuals seem to show INNER more often than not.
[INTO] after INSERT
The word INTO is optional in MySQL/MariaDB and SQL Server,
but compulsory in DB2, Oracle, and the SQL standard.
Holywell says:
«Why you would willingly choose a proprietary solution when a standard SQL method already exists is beyond me.»
Parentheses around OR conditions
That is, x OR y becomes (x OR y).
The reasoning is that, if there is an AND in the vicinity,
it will take precedence — but who remembers precedence rules?
As the
Java style guide says:
(regarding mixed operators)
«you should not assume that other programmers know precedence as well as you do».
But add-unnecessary-keywords-and-operators = yes does not mean:
[TRANSACTION] or [TRAN] or [WORK] after COMMIT.
This is the opposite of the INSERT INTO phenomenon —
in this case not every vendor supports the unnecessary word.
So, naturally, you will not see the same usage of these keywords in all vendor manuals.
Prescriptive guides:
Salvisberg says: «Never initialize variables with NULL. … Variables are initialized to NULL by default.»
(but probably he does not mean DEFAULT NULL).
Holywell says:
«Keep code succinct and devoid of
redundant SQL—such as unnecessary quoting or parentheses or WHERE clauses that can otherwise be derived.»
Vendor manual examples:
Oracle: does not allow LEFT JOIN with OUTER
MariaDB, MySQL: show LEFT JOIN without OUTER
Bloggers:
«Modern SQL Style Guide»,
explaining why «left outer join» is bad, says: «`outer` is an unnecessary optional keyword».
Scott Czepiel says:
Omit the “outer” when doing a left join … Likewise omit “inner” from inner joins
SELECT *
Choice: Allow SELECT *?
The main complaint about SELECT * is that the definition of the table might change.
However, that is true even if we select with column names, if ALTER can be used to
change a column definition.
To be consistent, Allow SELECT * = no should also affect other syntax
that is based on assumptions about stable table structures, such as
INSERT without INTO, or NATURAL JOIN.
And maybe it is not truly pleonastic to specify column names,
if there is a chance that table definition will change someday.
So maybe SELECT * can be replaced by
SELECT column-name [, column-name …]
and maybe INSERT INTO table-name can be replaced by
INSERT INTO table-name (column-name [, column-name …]).
Vendor manual examples:
Since I expect that SELECT * will be used for examples, I did not check.
Bloggers:
Factor
says: use SELECT * for «ad-hoc» work, not «production» work.
ORDER BY ordinal
Choice: Allow ORDER BY ordinal?
«SELECT … ORDER BY 1;» was once legal in standard SQL but became illegal in the 1999 version.
However, all major vendors still support it.
MySQL and Tarantool and
DB2 with
NPS-mode even support «GROUP BY 1».
«ORDER BY ordinal» would be convenient for
SELECT very_long_expression, very_long_expression FROM … ORDER BY 2;
However, all major vendors support aliases too.
ORDER BY ordinal» would be convenient for
VALUES (‘b’, ‘c’) UNION ALL VALUES (‘x’, ‘y’) ORDER BY 2;
However, not all vendors support ORDER BY in such contexts.
Prescriptive guides:
Taranov, Salvisberg say: specify columns.
Salvisberg says: «Always specify column names instead of positional references in ORDER BY clauses.»
Vendor manuals:
Oracle:
shows ORDER BY position as well as ORDER BY name
DB2:
has examples of ORDER BY name only.
SQL Server:
says «Avoid specifying integers in the ORDER BY clause as positional representations of the columns in the select list.»
Bloggers:
Claire Carroll
posted
«Write better SQL: In defense of group by 1»
… the gitlab folks would probably approve.
Comments
Choice: /* comment */ or —comment?
The main variants of comments are:
bracketed (start with /* and end with */) also called multi-line or block or C-style or slash star,
simple (start with — and end with newline) also called single-line or line or double-dash.
MySQL allows replacing — with # for simple comments but that is rare.
Although the SQL standard said that simple comments are mandatory and bracketed comments are optional,
nowadays all vendors support both types.
Oddly, there are cases where bracketed comments are legal but simple comments are illegal,
for example with DB2’s DSNTIAUL
and in standard PREPARE texts (General Rule 6 =
«If P [the contents of the SQL statement variable]
does not conform to the Format, Syntax Rules, and Access Rules of a <preparable statement>,
or if P contains a <simple comment> then … an exception condition is raised: syntax error or access rule violation.»).
I do not know why.
Maybe it has something to do with the fact that some types of SQL injection attack
depend on simple comments, as
described on netsparker.com.
Or maybe it has something to do with caching, as a
DB2 document hints that only
statements with bracketed comments will get into the dynamic cache.
There are
documentation generators
that have SQL support and that require a particular comment style, often starting with /**.
The other style question is: where to put comments?
ApexSQL has advice for where to put bracketed comments inside stored procedures.
but I did not find specific advice about location in other documents (except Don Burleson’s blog).
Vendor manual examples show comments preceding the statement (on a separate line),
or comments following a clause (on the same line, at the end of the line),
but I found none that show comments following the statement or comments within a line.
Prescriptive guides:
Taranov says «Always use multi-line comment».
Holywell says «Use the C style opening /* and closing */ where possible» and
«Avoid nesting comments» (with an example of a bracketed comment inside a simple comment).
Salvisberg says: «Inside a program unit only use the line commenting technique —«.
Vendor manuals: show both styles, no apparent preference.
Oracle SQL Developer has an option «Put — comments between /* … */» but not the other way around,
which I guess is a hint about where their sentiments lie.
Bloggers:
Don Burleson
says: use simple comments except inside 3GL programs that use the Oracle Precompilers.
Multiple-line comments
Choice: multiple-line comment style 1, or 2, or 3, or 4, or 5?
Choice: precede multiple-line comment with a blank line?
Choice: follow multiple-line comment with a blank line?
Bracketed /* … */ comments can take up multiple lines.
Usually they precede what they are commenting on.
It may be hard to reformat them so it is good if each line is short.
I have seen or read about five styles.
Usually they are preceded or followed by newlines.
/* This comment is Style 1: Align start and end. */ /* * This comment is Style 2: Align asterisks. */ /* ** This comment is Style 3: Lines start with **. */ /** * This comment is Style 4: Comment starts with **. */ /* This comment is style 5: Comment ends on line end. */
Style 3 is recommended (but not followed) by the SQL Server manual.
Style 4 is a signal to tools like Doxygen that this is for documentation as mentioned earlier.
Vendor examples:
Oracle: style 5.
DB2: style 2.
SQL Server: style 1.
MySQL: style 1.
MariaDB: style 5.
Bloggers:
Apex SQL
has recommendations for what should be in the comment, and options for inserting empty lines before/after
the comment.
New or old style inner join
Choice: join syntax: old-style or new-style?
«FROM a, b [WHERE join-condition etc.]» is old style, also called FROM-join.
«FROM a … JOIN b ON join-condition» is new style, also called ANSI or ISO or SQL92 join.
Back in 2005 Doug Burns described
his pro-and-con thoughts, which I hereby distort:
Pro: changing to an OUTER join is easy because it has almost the same syntax
Pro: it is more unlikely that you will miss a condition and go Cartesian
Pro: mixing up the join-condition with other conditions in the WHERE clause might mislead,
Con: new style is more verbose
Con: old style has a long-established base.
Some bloggers give the wrong impression that Microsoft has deprecated old style joins and/or they are non-standard.
In fact Microsoft only deprecated old style outer joins
(see
this red-gate post
and
this Microsoft post).
In fact the syntax «FROM table_name1 , table_name2» is legal according to the ISO/IEC («ANSI»)
SQL:2016 rules for <table reference list>
and Microsoft
is aware of that.
Prescriptive guides:
Benenson, Salvisberg, Taranov say: use new style.
Vendor manuals:
DB2: the
«Inner join»
examples mix old and new style without recommendations.
Oracle: the
«Using Join Queries: Examples» show only old style.
Remember that we’re only talking about inner joins here —
for outer joins, Oracle recommends
against
using the old style and its Oracle-specific join operator.
MariaDB: recommends new style.
SQL Server: says new style is
«recommended».
Tarantool: has an introductory section using only old style.
Bloggers:
Salesforce
apparently is in the group that thinks that old style is
not standard.
Joe Celko says:
«But what I found is that people who use the traditional notation think in sets,
while those who use the infix notation are stuck with a procedural linear mindset.»
He also says:
«Weaker SQL programmers use INNER JOIN, since it is sequential and a familiar, procedural infix binary operator.
Stronger SQL programmers use the WHERE syntax because it is set-oriented, obeys the law of proximity and is an n-ary operator.»
Data types
Choice: abbreviated data type names?
Choice: upper case or lower case data type names?
In standard SQL you can create a column with data type int,
but in information_schema it will show up as INTEGER —
the canonical form is always the unabbreviated word in upper case.
Unfortunately not every vendor converts canonically but this
shows that for the standard INTEGER is a better word.
However, Aaron Bertrand
(who wrote a series of blog posts touching on SQL style) switched to int,
because of a SQL Server quirk.
In SQL Server, if you are querying a system table, it makes a difference
what collation you used when you originally defined the whole database.
It can mean that you will not find columns defined as ‘int’ when you search for columns defined as ‘INT’ or ‘INTEGER’.
(In SQL Server built-in data type names are not
not reserved words.)
For another indicator, again using Ngrams …
INTEGER PRIMARY KEY is more popular than INT PRIMARY KEY
but varchar is more popular than VARCHAR or CHARACTER VARYING.
Although the INT-versus-INTEGER questions are not settled, there is no question
that CHAR(n) is preferred instead of CHARACTER(n).
By the way, if you did not know how to pronounce CHAR, see
this blog post.
For the other data type abbreviation — DEC versus DECIMAL — I found no guidance.
Sometimes I have seen that the length, and/or precision and scale, is skipped when it is default.
This is probably related to the Pleonasm question — should unnecessary words be skipped?
Prescriptive guides:
Salvisberg says: «Avoid declaring NUMBER variables, constants or subtypes with no precision.»
Holywell says: «It is best to avoid the abbreviated keywords and use the full length ones where available»
(but he is not talking about data types)
Taranov says: «Always specify a length to any text-based data type such as varchar, nvarchar, char, nchar:»
Vendor Manuals:
I just looked at the data types that were used in the manuals’ CREATE TABLE examples.
DB2: CHAR(n), CHAR (n), CLOB, CLOB(n), DATE, DECIMAL(n,n), DOUBLE, INTEGER, SMALLINT, TIMESTAMP, VARCHAR(n)
(I thought it was interesting that DOUBLE was used although DOUBLE PRECISION is standard)
Oracle: BLOB, CHAR(n), CLOB, DATE, NUMBER,, NUMBER(n), NUMBER(n,n), number(n), NCLOB, VARCHAR2(n), varchar2(n)
(I thought it was interesting that CHAR(n) was used when n <= 2 and NVARCHAR(n) was used when n > 2 but doubt that is a rule)
SQL Server: char(n), DATETIME, datetime, float, INT, int, money, nvarchar(n), smallint, VARBINARY, VARCHAR(n)
MySQL: BLOB, CHAR(n), DATE, DATETIME, INT, VARCHAR(n)
MariaDB: bigint, BLOB, CHAR(n), DATETIME INT, int, varchar(n)
Tarantool: INT, INTEGER, SCALAR, STRING
… Summary: vendors do not follow style guides in this area.
Oracle SQL Developer has an option «Case change», and if you pick «UPPER»
then VARCHAR2 is upper case, that is, it is a keyword like any other.
This redgate forum post
shows that there are different options for keywords UPPERCASE and data types lowercase.
Literals
Choice: allow non-standard literal formats?
In standard SQL the format of a literal determines its type:
inside » — CHAR
inside X» — BINARY or VARBINARY
inside DATE » — DATE
exponential notation — DOUBLE or REAL or FLOAT
[sign]digits[[period][digits]] INTEGER or SMALLINT or DECIMAL or NUMERIC
and so on.
But vendors vary, and due to implicit casting the format does not really indicate much.
What then should we do if we want to put ‘ within »?
MySQL/MariaDB and SQL Server sometimes let us say «…’…» (they have to use something else for delimiting identifiers).
Some DBMSs let us escape by saying ‘…’…’.
Some JDBC APIs let us escape by saying {escape ‘escape character’}
Oracle lets us change what the character string delimiter is with q'{…’…}’
Perhaps all DBMSs let us say ‘…»…’.
Prescriptive guides:
Nobody says anything.
Vendor manuals:
Examples usually show ‘…»…’ as it is the only standard unless the DBMS supports Unicode escaping.
Blogs:
Louis Davidson
suggests functions quotename() and concat() for putting strings together.
Long literals
Choice: use standard syntax, or use a continuation character, or use ||
This situation comes up with both character and binary strings:
the string is too long (presumably that means «Maximum line length» is exceeded).
The string has to be broken up and placed on multiple lines.
«use standard syntax» means: depend on the fact that in
standard SQL ‘A’ /* whitespace */ ‘B’ is interpreted as ‘AB’. So
SELECT 'video meliora proboque, ' 'deteriora sequor.' ...
But, alas, some vendors do not allow it.
And even one that does — MySQL/MariaDB — does not allow X’41’ ’42’.
«use a continuation character» means depend on the client
(or possibly the server) to see a line-continuation signal. So
SELECT 'video meliora proboque, ' 'deteriora sequor.' ...
But, alas, some vendors use different continuation characters.
«use ||» means depend on support of || for concatenation. So
SELECT 'video meliora proboque, ' || 'deteriora sequor.' ...
This is the best bet for vendor support; even MySQL/MariaDB
can be forced to accept || for concatenation sometimes.
The problem is not that || might make a performance difference.
The problem is that we have to specify a special format rule
regarding placement of the operator (see section Format symbols).
Prescriptive guides:
I have no information from any prescriptive guide.
Vendor manuals:
Oracle: In SQL/Plus, the continuation character is a hyphen.
DB2: line continuation is possible with but this is in the client
SQL Server: line continuation is possible with and
this says that is part of T-SQL not the client
Names (meaning)
Choice: Names should mean something?
We all know the setting should be ‘yes’ but
if you are trying to make a point about syntax,
then a placeholder name
has all the meaning that you need, because the referent could be anything at all.
So in a document like this I use names like
TABLE_NAME or Column1 but expect that people
would be more specific when representing something specific.
As far as I can tell the
foobar placeholders,
known in other languages, are not frequent in SQL contexts.
Perhaps we have fewer reasons to swear.
Prescriptive guides:
No need to quote. Everybody would agree.
Names (letter case)
Choice: Names UPPER CASE or lower case or snake_case or PascalCase or camelCase?
Terminology:
camelCase = (all words except the first start with a capital letter)
PascalCase = (all words start with a capital letter) sometimes called upper camel case
SNAKE_CASE = all words all capital letters, underscore separator)
snake_case = all words all lower case letters, underscore separator)
…
Inevitably, in the following Names sections, I am going to have to
give examples which use a particular case.
This does not mean I recommend a particular case, I leave recommendations to prescriptive guides.
One argument against lower case is that it does not reflect what the
standard information_schema name will look like, since the standard
rule is that the name will be folded to upper case before being stored.
PostgreSQL ignores the SQL standard
(see the blog post
«Sometimes MySQL is more standards-compliant than PostgreSQL»), but some other DBMSs do not.
One argument against snake_case is the academic conference presentation by
David Binkley and Marcia Davis and Dawn Lawrie and Christopher Morrell,
«To camelcase or under-score»
which concluded:
«… it be-comes evident that the camel
case style leads to better allaround performance once a subject is trained on this style.»
But their experimenting was not in an SQL situation
where names are distinguished by indentation or by having all non-identifiers in upper case.
And anyway there is another study by
Bonita Sharif and Jonathan I. Maletic,
«An Eye Tracking Study on camelCase and under_score Identifier Styles»,
which concluded:
«While results indicate no difference in accuracy between the two styles,
subjects recognize identifiers in the underscore style more quickly.»
(I have only read the abstract.)
Prescriptive guides:
Holywell says: «Use underscores where you would naturally include a space in the name» …
«Avoid … CamelCase—it is difficult to scan quickly.» (Apparently this is a reference to what I have called PascalCase.)
Salvisberg says: «write all names in lower case»
Benenson gives an example of snake_case: «SELECT COUNT(*) AS backers_count»
and says «Variable names should be underscore separated:»
(I consider «underscore separated» to be a synonym of «snake_case»).
Sarkuni says: snake_case
Mullins says: SNAKE_CASE
Taranov says: PascalCase for everything except database, schema, and synonym.
Factor says: «Schema objects are, I believe, better capitalized.» (example) «This_Is_Capitalized»
Vendor manual examples:
MariaDB, MySQL, Oracle, Tarantool: usually snake_case
SQL Server: usually PascalCase
DB2: usually SNAKE_CASE
SQL Standard: SNAKE_CASE
For example these are extracts from the manuals’ CREATE TABLE pages.
CREATE TABLE employees_demo … (Oracle)
CREATE TABLE EMPLOYEE_SALARY … (DB2)
CREATE TABLE CREATE TABLE dbo.PurchaseOrderDetail … (SQL Server)
CREATE TABLE client_firms … (MySQL)
CREATE TABLE table_name … (MariaDB)
CREATE TABLE modules … (Tarantool)
Names (legal characters)
Choice: Names can include $?
Choice: Names can include letters other than A-Z?
What characters should be legal for regular identifiers of most objects?
In standard SQL, the answer is _ or digit or any Unicode character that is considered to be a letter, so Cyrillic / Japanese kana / Chinese / etc. are all okay
But in practice the answers vary widely, as one can see by looking at the vendor documentation of
DB2,
Oracle,
SQL Server,
MariaDB,
Tarantool (see «Identifiers»).
The minimum common denominator is A-z, a-z, 0-9, _ and — for some reason I have never understood — $.
The maximum is any Unicode letter, digit, _, $, and sometimes # or @.
I used to expect that (say) Russian developers would regard Latin-alphabet names as inferior to Cyrillic-alphabet names.
But evidence for that expectation is totally lacking.
Possibly one thing that holds people back is the fear that some client or driver will
misplace the character set and show garbage?
Improve your vocabulary: the word for such misplacement is mojibake.
Prescriptive guides:
Holywell says: «Only use letters, numbers and underscores in names.»
Taranov says: «use only Latin symbols [A-z] and numeric [0-9].»
Vendor manuals:
DB2 «Naming conventions».
DB2’s
«Some ODBC Applications May Not Be Able to Handle Special Characters in Table/Column Names»
warning is that there are problems with drivers with some special characters.
Names (length)
Choice: maximum length: 18 or 30 or 64 or 128?
There is a vendor-dependent maximum length.
Usually it is 128 (DB2, Oracle, SQL Server), or 64 (MySQL/MariaDB), or more than 20000 (Tarantool).
Shorter names were required in olden times and may still be required for a few objects.
But the desirable column-name length will be less than the maximum if column names are used as headers
in reports. Here I am assuming that the name will inevitably be the label, although
JDBC
thinks otherwise.
Length considerations might induce people to use abbreviations,
and that must be okay because the SQL standard uses abbreviations too.
Prescriptive guides:
Mullins says: recommended maximum length = 128
Holywell says: «Keep the length to a maximum of 30 bytes»
Taranov says: 128 for most things
Salvisberg says:
«Avoid using abbreviations unless the full name is excessively long.
Avoid long abbreviations. Abbreviations should be shorter than 5 characters.
Any abbreviations must be widely known and accepted.
Create a glossary with all accepted abbreviations.»
Vendor examples:
Microsoft’s
«Column name limitations»
warning is that there are problems with drivers with long names and special characters.
Bloggers:
Northwestern University
«Data base object naming standards: Abbreviations»
is a huge list prefaced by the words
«In general, abbreviations should be used only when length restrictions prevent use of fully spelled-out words in object names.»
Names (delimiter)
Choice: Should identifiers be delimited?
If possible this means enclosing «…»s as in standard SQL, but may mean enclosing `…`s or enclosing […]s.
Again, there might be some reason for explanatory names for reporting requirements,
but for programming requirements the only possible reasons would be:
() You are using more than one SQL dialect and you want to maintain the same names in both dialects
() You prefer case sensitive names and your SQL dialect follows the SQL standard requirement for delimited identifiers.
Re […]: Microsoft’s rules for
SET QUOTED_IDENTIFIER
are so complicated that nobody can be blamed for using []s instead.
Re `…`: Although people call them backticks when used for delimiting, the real (Unicode) name is U+0060 «grave accent».
If you have a French keyboard where a grave accent is a
«dead key»,
it might be troublesome.
Prescriptive guides:
Sarkuni says: «Avoid quotes.»
Donnelly says: «Named objects should not be surrounded by backticks»
Salvisberg says: «avoid double quoted identifiers» … «Never use quoted identifiers.»
Holywell says: «Avoid … Quoted identifiers — if you must use them then stick to SQL92 double quotes for portability».
Bloggers:
Apex’s
«Rules of SQL formatting» say:
Adding square brackets around all identifiers can be visually distracting.
but they have an option named «Remove unnecessary brackets».
Cloudera’s
Impala Guide recommends:
«consider adopting coding conventions (especially for any automated scripts or in packaged applications)
to always quote all identifiers with backticks.»
Names (qualifier)
Choice: Should names be qualified?
Saying «SELECT employees.id FROM employees;» looks like pleonasm.
Saying «SELECT employees.id FROM employees, departments;» looks like pleonasm but maybe serves a «useful function».
Saying «SELECT id FROM schema_name.table_name;» looks like a necessity if there is no default schema.
I have rarely seen the four-level catalog_name.schema_name.table_name.column_name.
At MySQL we played with it for a while for information_schema but in my opinion we
should have reserved the top level for the server source, which is something that
I think SQL Server has approached.
People who get deep in qualifier mires might be tempted to add aliases.
Prescriptive guides:
Sarkuni says: «All field names in non-trivial SQL statements (i.e. those with more than one table) should be explicitly qualified and prefixing as a form of namespacing field names is a bad idea.»
Mazur says: «Include the table when there is a join, but omit it otherwise» (i.e. qualify if there is a join).
Blogs:
Tim Hall passes on advice about
avoiding name clashes in PL/SQL.
Aaron Bertrand
says that leaving out a schema name can have side effects.
Steven Feuerstein
says «Qualify every identifier in SQL statements.»
Names (prefix or suffix)
Choice: Should any object name have any prefix or suffix?
Some say that an object’s name should indicate its type, and some say otherwise.
I will just note here what the otherwise-sayers say in prescriptive guides.
Sarkuni says: «Object names should not include the object type in them.»
Mullins says: «it is a bad idea to embed specialized meaning into database object names»
(he gives an example of a suffix that means unique, and asks what if it is also a foreign key).
Salvisberg says: «Avoid adding redundant or meaningless prefixes and suffixes to identifiers. Example: CREATE TABLE emp_table.»
Holywell says: «Avoid … Descriptive prefixes or Hungarian notation such as sp_ or tbl.»
Improve your vocabulary again …
Factor
says:
«The habit most resistant to eradication is «Tibbling,» the use of reverse Hungarian notation,
a habit endemic among those who started out with Microsoft Access.
A tibbler will prefix the name of a table with «tbl,» thereby making it difficult to pronounce.»
Keller says: «Avoid prefixes and suffixes for tables and views, such as tblTable.
Hungarian notation (which was always intended to identify variable usage)
slipped into common SQL Server naming conventions, but it is widely derided.
Object identifiers should describe what is contained within, not the object itself.»
But other prescriptive guides say: go for it.
So in later sections I will repeat the advice of people who advocate prefixes and suffixes,
but those sections all have an implicit caveat: others disagree with all such paraphernalia.
If you decide «Should any object name have any prefix or suffix=no», then
ignore any prefix/suffix Choices in following sections.
Names of tables
Choice: Should table names be plural?
Sometimes no. In English and many other languages pluralizing can be confusing
or inconsistent. Some tables by definition have exactly one row so demanding
plurals everywhere can be excessive. And of course sometimes noted authorities
say use singular, so just go along to get along.
One authority that is sometimes brought in is Codd.
He used singular for relations.
But
Fabian Pascal says
«In other words, there is no relational (i.e., formal theoretical) reason for preferring one over the other — Codd’s
reference to the «Supplier relation» should not be taken to mean imposition of the singular.»
One authority that is sometimes brought in is
ISO 11179.
But I have been unable to find where that states that singular names are better,
and it seems that
other people
have also been unable.
Prescriptive guides:
Hall says: All table names should be plural. If the table name contains several words, only the last one should be plural.
Example: APPLICATION_FUNCTION_ROLES
Sarkuni says: «Tables, views, and other relations that hold data should have singular names, not plural.
Rather than going into the relational algebra explanation of why this is correct I will give a few practical reasons.
They’re Consistent. It’s possible to have a relation that holds a single row. Is it still plural?
They’re unambiguous. Using only singular names means you do not need to determine how to pluralize nouns.
Ex: Does a «Person» object go into a «Persons» relation or a «People» one? How about an «Octopus» object? Octopuses? Octopi? Octopodes?
Straightforward 4GL Translation. Singular names allow you to directly translate from 4GL objects to database relations. You may need to remove some underscores and switch to camel case but the name translation will always be straight forward.
Ex: team_member unambiguously becomes the class TeamMember in Java or the variable team_member in Python.»
Keller says: «At first glance, it’s natural to think of a collection of objects in the plural. A group of several individuals or companies would be Customers. Therefore, a table (being a collection of objects) should be named in the plural. An individual row in that table would be a single customer.
The ISO/IEC naming principles, while dated, recommend pluralized table names and singular column names.
Most SQL Server system tables use plural names (sysnotifications, sysoperators), but this is inconsistent. Why sysproxylogin and not sysproxylogins?»
… «Because pluralization of words can vary in so many ways (customers, mice, moose, children,
crises, syllabi, aircraft), non-native speakers have additional challenges.
Sticking with singular object names avoids this problem entirely.»
Salvisberg says: «Plural name of what is contained in the table (unless the table is designed to always hold one row only – then you should use a singular name).»
and «A jar containing beans is labeled «beans».»
Mazur says: «Table names should be a plural snake case of the noun»
Holywell says: «Avoid … Plurals—use the more natural collective term where possible instead. For example staff instead of employees or people instead of individuals.
… «Use a collective name or, less ideally, a plural form. For example (in order of preference) staff and employees.»
Taranov says: not plural.
In some SQL standard examples, there are plurals when contents can be plural, singular when contents can only be singular
Bloggers:
Peter Gulutzan (also known as «I») said long ago in
«SQL Naming Conventions»:
Example: SELECT * FROM beans; not SELECT * FROM bean;
Names of temporary tables
Choice: temporary table name prefix = # or tmp or temp or nothing?
With some DBMSs, it is possible to create a table named A and a temporary table named A.
References to A will only find the temporary table, and the result is occasional confusion.
So it might be okay to recommend, or force, people to distinguish temporary table names.
Prescriptive guides:
Salvisberg says temporary tables can have an optional suffix _tmp.
Vendor manuals:
In
SQL Server
a prefix # or ## is compulsory. (The Unicode name for the # symbol is «number sign», not
«hash mark» and, alas, not the lovely word
«octothorpe»).
Bloggers:
Stephen Faroult and Pascal l’Hermite in
«Refactoring SQL Applications»
suggest that it is common for temporary tables to
have a prefix or suffix of TMP or TEMP.
Names of views
Choice: view name prefix = v_ or vi_ or vw_ or nothing?
Choice: view name suffix = _v or _vi or _vw or nothing?
View names are table names and are not distinguished from base table names.
Views are tables so the Choices in the previous section are applicable.
However, sometimes people want to distinguish them with a prefix or suffix.
In particular, I have seen a suggestion that,
if a view is of a single base table, its name may be
the base table name plus the suffix _v.
Keller says: view suffix = _v.
Taranov says: View prefix = VI_
Salvisberg says: optionally add _v
Bloggers:
Joe Celko
says
«VIEWs are often named incorrectly. A VIEW is a table, so it is named just like any other table.
The name tells us what set of things it represents in the data model.
The most common offender is the “Volkswagen” coder who prefixes or suffixes
the VIEW name with “vw_” or “VIEW_” in violation of ISO-11179 rules.
We do not mix data and meta data in a name.
This is as silly as prefixing every noun in a novel with “n_” so the reader will
know that the word is a noun in English grammar.»
(That reminds me that once upon a time writers
did distinguish nouns by capitalizing them. Germans still do.)
Sharad Maheshwari and Ruchin Jain,
«DBMS – Complete Practical Approach», say: «All view names must begin with of the following prefixes: vw, VW, v, or V.»
Mullins says: «avoid embedding a ‘T’, or any other character, into table names to indicate that the object is a table.
Likewise, indicator characters should be avoided for any other table-like object (i.e. aliases, synonyms, and views).»
isbe.net
«Naming conventions» says: prefix with vw.
Names of columns
Choice: Should column name include table name or abbreviated table name?
Column names are singular, although I suppose that columns with an array or multiset data type might not be
(the only guide that suggests such an exception is Salvisberg).
The name «id»,
which is an abbreviation of
identification,
usually for a unique-key column with a numeric data type, is popular.
But opinions vary whether a column should be named id or have a suffix _id.
If you join two tables, and they both have a column named id, then you are forced to qualify.
But if you avoid that by putting the table name in the column name,
for example the employees table has employee_id and the departments table has department_id,
then you are being redundant because you already know what tables the columns are in.
An advantage of naming all primary keys id is: you know what it is without looking it up.
Names in other computer languages have hints about types, for example is_fat is probably a boolean.
But opinions vary whether this should occur in SQL.
Prescriptive guides:
Hall says: do not include table name or alias.
Sarkuni says: name = id is okay («This means that when you’re writing SQL you don’t have to remember the names of the fields to join on»)
and a column referencing it should have referenced-table-name and suffix _id.
But Keller says: if there is a column referencing it, then name = CustomerID is better than Customer.ID.
Holywell says: name = id is not okay («avoid»).
Mazur says:
«Boolean fields should be prefixed with is_, has_, or does_. For example, is_customer, has_unsubscribed, etc.»
Vendor manuals:
I checked what column names were in examples for the CREATE TABLE statement.
MariaDB: a, b, expires, x
MySQL: id, name, a, b, adate, s1, s2
Oracle: employee_id, first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct,
manager_id, department_id, dn, id, col1, col2, title, author, department_id, department_name, etc.
DB2: DEPTNO, DEPTNAME, MGRNO, ADMRDEPT, PROJNO, PROJNAME, RESPEMP, PRSTAFF, PRSTDATE, EMPNO, SALARY, ID, NAME, LIVING_DIST,
SSN, VOICE, PHOTO, HIREDATE, COMM, ACTNO, EMPTiME, etc.
SQL Server: col1, x, EmployeeID, PurchaseOrderID, LineNumber, ProductID, UnitPrice, OrderQty,
ReceivedQty, RejectedQty, DueDate, rowguid, LineTotal, StockedQty, LName, FName, GUID, low, high, etc.
Tarantool: s1, s2
… Notice the frequency of _id or NO suffix, DB2’s love of abbreviations, and
how both MySQL and Tarantool use s1 and s2 (you can blame me for that, s stands for Spalte which is the German word for column).
Notice that SQL Server does capitalize when there is only one word, but probably this is not a rule.
Blogs:
Lukas Eder
points out that qualifying id in SQL statements is not good enough
if you also have to distinguish in a Java client.
Power BI
(apparently thinking about things like report headings)
says:
«For example, you should use “Sales Amount” instead of “SalesAmt” or “SalesAmount”.
You can use space and special characters in column names of a view.
The goal is to simplify the life to the user,
and not to simplify the life to a programmer who has to type a column name in the keyboard.»
Names of aliases and range variables
Choice: Should aliases / range variables be abbreviations?
In a select list a column can be named with [AS] alias-name.
Aliases can be used in ORDER BY, and in MySQL/MariaDB can be used in GROUP BY,
and with
(rarely supported)
«lateral column aliasing» they can even be used later in the select list,
but the main purpose is just so that the result name can be short and legal.
Vendors might invent names for expressions but they can vary and can be odd.
Also in FROM etc. an [AS] name clause can appear.
Often the name is called an alias but in standard SQL terms
like «correlation name» and «range variable» are preferred.
One main purpose is to avoid repeating long qualified names.
For columns the common idea is to be comprehensible,
for tables the common idea is to be short.
Prescriptive guides:
Holywell, Mazur, and Taranov say: Always be explicit and say AS name. Pleonasm does not apply.
Donnelly says: Always be explicit and say AS name — for columns. «Column aliases should always use the keyword AS»
Donnelly says: «Tiny names for table aliases can sometimes work as abbreviations.»
Holywell says: maybe, alias name should be first letter of each word in a table name
Mazur says: «Avoid aliasing table names most of the time»
Benenson says: «Always rename all columns when selecting with table aliases»
Salvisberg says: «Always use table aliases when your SQL statement involves more than one source.»
Vendor manuals:
Oracle: «AVG(salary) AS avgsal»
SQL Server: «AVG(UnitPrice) AS [Average Price]» …
an
old version of the manual
says: «The AS clause … is the preferred syntax to use in SQL Server 2005.»
Perhaps one could interpret that as meaning the word AS should be explicit, but I prefer to
believe that it means: stop using the
deprecated alternative, an equal sign
MySQL: «CONCAT(last_name,’, ‘,first_name) AS full_name», «id AS ‘Customer identity'» (very non-standard) …
«it is good practice to be in the habit of using AS explicitly when specifying column aliases»
Blogs:
ApexSQL
has an option named «Use first three letters for alias» and an example «HumanResources.EmployeeDepartmentHistory edh».
Aaron Bertrand says: one to three letters, but no random single letters please.
Names of constraints
Choice: Use automatically generated names where possible?
Choice: Constraint (Primary Key) Suffix = _pk or _PK or nothing
Choice: Constraint (Primary Key) Prefix = pk or PK or pk_ or PK_ or nothing
Choice: Constraint (Unique Key) Suffix = uq or UQ or _uq or _UQ or _uk or _UK or nothing
Choice: Constraint (Unique Key) Prefix = uq or UQ or uq_ or UQ_ or uk_ or UK_ or nothing
Choice: Constraint (Foreign Key) Suffix = fk or FK or _fk or _FK or nothing
Choice: Constraint (Foreign Key) Prefix = fk or FK or fk_ or FK_ or nothing
Choice: Constraint (Check) Suffix = ck or CK or ck_ or CK_ or nothing
Choice: Constraint (Check) Prefix = ck or CK or _ck or _CK or nothing
Sometimes constraint names are decided by the vendor, for example when the
word UNIQUE is used in a CREATE TABLE statement. This is fine if you
intend to use the vendor’s style, but usually it looks machine-generated.
For example if you say CREATE TABLE m (… s1 INTEGER UNIQUE);
for Tarantool it is named unique_unnamed_M_1 (type + name + table-name + integer),
for DB2 it is a timestamp,
for MySQL/MariaDB it is the column name.
If you think that as a human you can do better, you will want to
override the DBMS, always, by specifying with your own system.
And there is no reason that the name of a constraint should differ wildly from the name of an index that it depends on.
Typically a constraint name will include a table name (sometimes abbreviated or aliased),
a column name if applicable, and a suffix indicating the constraint type.
Some suggesters,
for example this bugs.mysql.com issue#66051
and cakephp’s «Database Conventions»,
say that for foreign keys the table-name part should be singular, for example if
the table name is orders then the table-name part of the foreign key name should be order.
Prescriptive guides:
Holywell says: depend on the vendor name (generally it is «sufficiently intelligible»), otherwise make a custom name
Keller says: prefix PK or CK or FK or UQ «sometimes»
Hall says: table + _PK or _FK or _CK. For foreign key: referencing-table + referenced-table + _FK. For check: something + _CHK.
Taranov says: PK_ or CK_ or FK_
Salvisberg says:
For primary key: table + _pk.
For unique key: table + _uk.
For check: table + column-or-role + _ck + optional number
For foreign key: referencing-table + referenced-table + _fk + optional number
Mullins asks: what if a constraint is both foreign-key and unique, what do the suffix-lovers do? I have seen no answers.
Vendor manuals:
Oracle: (for foreign key) fk_deptno, fk_empid_hiredate,
(for check) check_divname, check_office,
(for not null) nn_qty, emp_salary_nn_demo
(for unique) promo_id_u, wh_unq, emp_email_uk_demo
(for primary key) loc_id_pk, sales_pk.
DB2: (for check) phoneno_length, YEARSAL,
(for unique) EMP_ACT_UNIQ,
(for foreign key) EMP_ACT_PROJ
SQL Server: (for check) CK_emp_id
(for default) DF_PurchaseOrderDetail_ModifiedDate
(for primary key) PK_PurchaseOrderDetail_PurchaseOrderID_LineNumber, Guid_PK
MySQL: (for check) c1_nonzero, c2_positive
(for foreign key) child_ibfk_1
MariaDB: (for check) a_greater
Bloggers:
Microsoft’s
«Schema Generation Wizard»
has options for prefixes of names of primary-key columns and foreign-key columns: PK and FK.
Shane O’Neill has arguments for
why all constraints should have names.
Names of indexes
Choice: Include table name in index name?
Choice: Index name prefix = idx or ix or nothing?
Choice: Index name suffix = i or idx or ix or nothing?
Often an index will be associated with a constraint, and so the rules
for «Names of constraints» will apply to indexes. Otherwise, the
components of an index name are:
maybe a prefix, maybe the table name, maybe one or more column names,
maybe a suffix, maybe an ordinal.
For the typical index that is a result of something like
CREATE INDEX index-name ON table-name (column-name [, column-name …])
the index is in the table’s namespace, so there is no problem
having an index named I for one table and another index named I for another table.
Therefore it is redundant to include table-name in index-name, but
it happens anyway.
Prescriptive guides:
Sarkuni says: «Indexes should be explicitly named and include both the table name and the column name(s) indexed.
Including the column names make it much easier to read through SQL explain plans.»
Keller says: prefix = IX.
Hall says: name = table + column(s) + _I
Salvisberg says: name = table + column(s)-or-purposes + _idx.
Vendor manual examples:
Oracle: «cust_eff_ix ON customers», «ord_customer_ix ON orders», «idx_personnel ON CLUSTER personnel»,
«area_index ON xwarehouses», «upper_ix ON employees (UPPER(last_name))»,
«income_ix ON employees(salary + (salary*commission_pct))», «cust_last_name_ix ON customers (cust_last_name)» etc.
DB2: «UNIQUE_NAM ON PROJECT(PROJNAME)», «JOB_BY_DPT ON EMPLOYEE (WORKDEPT, JOB)», «IDX1 ON TAB1 (col1)»,
«MYDOCSIDX ON MYDOCS(DOC)» etc.
SQL Server: «i1 ON t1 (col1)», «IX_VendorID ON ProductVendor (VendorID)»
«AK_UnitMeasure_Name ON Production.UnitMeasure(Name)», «IX_INDEX_1 ON T1 (C2)» etc.
MySQL: «part_of_name ON customer (name(10))», «idx1 ON t1 ((col1 + col2))», «id_index ON lookup (id)»
MariaDB: «HomePhone ON Employees(Home_Phone)», «xi ON xx5 (x)»
… None of the manuals have a consistent style, but we do see some index names that have prefixes/suffixes,
table names, column names, or function names.
Names of triggers
Choice: Trigger name suffix = _trg or _TRG or _tr or or _TR or nothing?
Choice: Trigger name prefix = _tr or TR_ or nothing?
If a trigger involves an action on a table, everyone seems to agree
that the trigger name should say something about the action and the table.
Not everyone agrees which comes first,
and not everyone agrees what «something» is. It could be the verb as a suffix
(for example _insert) or it could be an abbreviation (for example I or ins)
or it could merely be a «suggestion» of the verb (for example new_).
If for column names you decide «do not include the table name»,
why for trigger names should you decide «do include the table name»?
Here is a possible excuse: columns are parts of tables, but
triggers are independent objects, so this is a different matter.
In PostgreSQL a trigger is
«local» to a table
(a departure from standard SQL) so I do not know whether the usual rules should apply there.
Prescriptive guides:
Hall says: «Trigger names should be made up of the table name, an acronym representing the triggering action and the suffix «_TRG».»
Example: APPLICATION_BIS_TRG where BIS stands for BEFORE INSERT STATEMENT-LEVEL»
Taranov says: TR_ prefix and _DML or _DDL suffix
Salvisberg says:
either object-name + b or io (for before-row or instead of) i or u or d (for insert or update or delete) without _trg suffix,
or object-name + suggestion-of-the-verb + _trg.
Vendor manuals:
I looked at examples in the CREATE TRIGGER pages and sub-pages.
Oracle: t, «order_info_insert INSTEAD OF INSERT ON order info», dept_emplist_tr
, «maintain_employee_salaries FOR UPDATE OF salary ON employees
DB2: «NEW_HIRED AFTER INSERT ON EMPLOYEE», «RAISE_LIMIT AFTER UPDATE OF SALARY ON EMPLOYEE»
«NEWPROD NO CASCADE BEFORE INSERT ON PRODUCT»
SQL Server: reminder1, «Purchasing.LowCredit ON Purchasing.PurchaseOrderHeader AFTER INSERT»
«connection_limit_trigger ON ALL SERVER WITH EXECUTE AS ‘login_test’ FOR LOGON»
MySQL: «ins_sum BEFORE INSERT ON account», «ins_transaction BEFORE INSERT ON account»,
«upd_check BEFORE UPDATE ON account»
MariaDB: «increment_animal AFTER INSERT ON animals»
Tarantool: «stores_before_insert BEFORE INSERT ON stores»
Names of sequences
Choice: Sequence name suffix = _seq or nothing?
Choice: Sequence name prefix = sq_ or nothing?
If a sequence is associated with a single other object, such as a table
or the primary-key column of a table, then there is agreement that
the sequence name should say something about that object.
There is no agreement, as I said earlier, to have a prefix or suffix to
indicate object type.
Prescriptive guides:
Salvisberg says: if the sequence is for a table’s primary-key generation: table-name-or-abbreviation + _seq
Taranov says: sq_ prefix
Vendor manuals:
Oracle: customers_seq
SQL Server: Test.CountBy1, Test.CountByNeg1, ID_Seq, TestSequence, SmallSeq, DecSeq
(apparently they’re naming based on increment-value and on data type, as well as object-name + suffix)
DB2: ORDER_SEQ
MariaDB: s, s2, s3
Here is
an Oracle document saying _SEQ.
Here is
Oracle’s
«2-Day Developer’s Guide» saying
«Tip:
When you plan to use a sequence to populate the primary key of a table,
give the sequence a name that reflects this purpose. (This topic uses the naming convention table_name_SEQ.)»
Bloggers:
Richard Foote and
Mandeep K Sandhu and
Vinish Kapoor
uses _seq.
The
progress.com manual says:
Do not define ORACLE sequences with names ending in _SEQ unless the dataserver manual instructs you to do so.
The DataServer uses ORACLE sequences whose names end in _SEQ for internal purposes.
The
PL/SQL Style Guide
says all sequence names should end in _seq
SunCom says: add _SEQ
Names of routines
Choice: Use prefix or suffix for function or procedure name?
Function names show what the returned value is, and optionally how it was obtained.
Some people
wonder whether a «get_» prefix is necessary.
Example: first_initial_of_name
Example: name_in_upper_case_according_to_german_collation
Built-in function names are upper case.
They are not keywords and not reserved (at least not always),
and they are like ordinary functions,
so if we put them in upper case it is not because we are following a general rule that only keywords are
upper case.
However, it is popular to put them in upper case, as some
vendors’ manuals show.
Example: SELECT ABS(1); not SELECT abs(1);
Prescriptive guides:
Salvisberg says: (re function)
Name is built from a verb followed by a noun in general. Nevertheless, it is not sensible to call a function get_… as a function always gets something.
The name of the function should answer the question “What is the outcome of the function?”
Optionally prefixed by a project abbreviation.
Salvisberg says: (re procedure)
Name is built from a verb followed by a noun. The name of the procedure should answer the question “What is done?”
e.g. calculate_salary
Bloggers:
ApexSQL’s
Rules of SQL formatting – SQL naming conventions and capitalization rules says:
«Also, the prefix in the name of the stored procedure should not be sp_».
Names of savepoints
Savepoint names may contain a hint of what went before,
for example after inserting the name could be: inserting.
However, there is no common practice, so there is no need for a «Choice: …» here.
Vendor manual examples:
Oracle: do_insert my_savepoint
DB2: SAVEPOINT1, SAVEPOINT2, SAVEPOINT3
SQL Server: ProcedureSave
Tarantool: x
Names of collations
Choices: Collation names: BCP style or Microsoft style or roll-your-own?
This would apply only for vendors which allow users to create collations, which are rare.
BCP style is what I call the use of the
Unicode Locale Data Markup Language (LDML), for example
«da-u-ks-level1» where da = language tag for Danish,
u stands for BCP 47’s locale extension,
ks is the key for strength
from the
Collation Settings table
and level1 means level1. No character set or code page.
Microsoft style is what I call the
suffixes
and Microsoft’s
fixed rules:
The suffix _ai («accent insensitive») and the suffix _ci («case insensitive»).
Although _ci_ai might sometimes be very very roughly equivalent to a standard level1 collation,
level1 really should also mean things like «ignore Japanese katakana/hiragana differences»
and «ignore differences between S and Sharp S»
and much else that has absolutely nothing to do with accents and cases.
The roll-your-own choice would be something more descriptive but less «standard»
than the other choices, for example GermanPhoneBook.
Prescriptive style guides and vendor manual examples: none.
Names of variables or parameters
Choice: input-parameter prefix: i_ or in_ or nothing
Choice: output-parameter prefix: out_ or nothing
Choice: local-variable prefix: l_ or v_ or nothing
Choice: parameter names: UPPER CASE or snake_case or PascalCase or camelCase
There is an incentive to use a prefix. Consider a MySQL procedure:
CREATE TABLE employees (thing INTEGER); INSERT INTO employees VALUES (0); CREATE PROCEDURE select_thing() BEGIN DECLARE thing INTEGER DEFAULT 1; SELECT thing FROM employees; END;
Will it select 0 or 1?
Yes there is a rule — MySQL bug 5967
where Konstantin Osipov improves our vocabulary again by mentioning «name shadowing» —
but it is easy to forget it.
So if there was a convention that always distinguished column names
from variable names, this particular confusion could be avoided.
Standard SQL allows for qualifying variable names with the block label,
but not all vendors allow that.
Prescriptive guides:
Hall says: «PL/SQL variables are prefixed with a single letter, if possible, to indicate their type or usage.»
e.g. l_ + local variable name, i_ for input parameter name
Salvisberg says: in_ prefix for input parameter, out_ prefix for output parameter, l_ prefix for local variable
Taranov says: «Parameters name should be in camelCase»
Vendor manual examples:
I only looked at DECLARE examples:
Oracle: no suffixes or prefixes, snake_case
DB2: Prefix v_ then camelCase, for example v_rowsChanged
SQL Server: Variable names always start with @, after that the case examples are inconsistent.
Bloggers:
Malcolm Coxall,
«Oracle Quick Guides», says
«Parameters names must begin with p_. Variable names must begin with v_ …
IN parameter can be named _in» OUT parameter can be named _out …
INOUT parameter can be named _inout».
Dynamic SQL
Choice: Use host language’s multiple quoting?
Choice: Put SQL statement text in variables?
This applies for EXECUTE IMMEDIATE but also for non-SQL programs that call SQL.
The problem is always that execute(«sql-statement») will not look nice if
sql-statement contains «s, forcing escapes. Escapes — usually s —
make a statement less easy to read.
Languages such as Lua support
multiple quoting,
so the Tarantool manual suggests things like
execute([[SELECT ‘string-literal’ FROM «delimited-table-name»;]])
instead of
execute(«SELECT ‘string-literal’ FROM «delimited-table-name»;»)
But maybe you should not pass string literals, you should instead
assign the literals to variables and pass variables.
It is common in other languages to use a variable or a macro for
any constant that might be re-used, and the same could apply
for SQL.
Prescriptive guides:
Salvisberg says: «Always use a character variable to execute dynamic SQL.»
Examples from vendor manuals:
Oracle shows an EXECUTE IMMEDIATE statement with a character variable.
DB2 shows an embedded-SQL example with a character variable.
Blogs:
«Pro Oracle SQL» says that using bind variables may decrease parsing overhead.
Format terminology
Choice: format?
Formatting, also called layout, involves
adding or removing white space (spaces or tabs or newlines).
Effects include indenting and aligning.
Choosing format = no means: leave the text as is.
Choosing format = yes means: remove white space that is already in the text,
except in comments or literals or delimited identifiers.
Then add white space according to Choices described in following Format sections.
Indenting means: adding
fixed amounts
(«units») of whitespace at the start of a line.
For example one can say that the indent amount is «4 spaces».
Then in this statement the second line is indented:
TEXT AT LEFT MARGIN, ALSO CALLED THE ZERO POINT TEXT INDENTED BY ONE UNIT
The implication is that the second line is «within», «subsidiary to», «a level below», the first line.
If there is a third line which is subsidiary to the second line, then that is a multi-level indent.
TEXT AT LEFT MARGIN, ALSO CALLED THE ZERO POINT TEXT INDENTED BY ONE UNIT (AT LEVEL ONE) TEXT INDENTED BY TWO UNITS (AT LEVEL TWO) TEXT INDENTED BY ONE UNIT (AT LEVEL ONE AGAIN)
I say the first line is at «indent zero», the second line is «indent +1»,
and the third line is also at «indent +1» (because «+» will mean «relative to the previous line»).
It is not necessary to show «indent -1» because text end is obvious, it is before the next text starts.
Aligning, also called lining up, means: placing
non-white-space text underneath non-white-space text on an earlier line.
Left-align is the same as «indent at same level» (i.e. «indent +0 levels»), it
means the first characters align, for example
TEXT AT LEFT MARGIN A BIT OF TEXT
Right-align-of-first-word means the last characters of
the first word align, for example
TEXT AT LEFT MARGIN A BIT OF TEXT
Right-align-of-last-word means the last characters of the last
word align, for example
TEXT AT LEFT MARGIN A BIT OF TEXT
Right-aligning may involve shifting earlier shorter strings
to match the longest string, for example
TEXT AT LEFT MARGIN A BIT OF TEXT A MUCH LONGER BIT OF TEXT
Phrase means: an uninterrupted series of keywords. Phrases may be treated as units,
and people avoid putting newlines in phrases. Examples:
CREATE OR REPLACE TRIGGER, DROP VIEW IF EXISTS, ROLLBACK TO SAVEPOINT, LEFT OUTER JOIN.
Also I call >= ALL a phrase, though I acknowledge it does not fit my definition exactly.
Clause start means: a word or phrase that begins a clause.
When discussing indenting or aligning, a statement start may be treated as a clause start.
Blogs:
Jeff Smith has examples of formatting with slightly different choices of words.
Format choices terminology
Format choices are often expressible as:
Change <word-or-phrase> to [<whitespace> +] <word-or-phrase> [+ <whitespace>].
For example, instead of expressing a choice as
Choice: Add spaces around arithmetic operators?
I say
Choice: Replace arithmetic operator with space + arithmetic operator + space?
For one Choice, that does not look simpler.
But for 30 Choices, it is much simpler because all Choices have the same «BNF».
Whitespace is zero or more
«space» U+0020
«tab» U+0009 rarely needed if indent units are expressed as spaces
«newline» go to start of next line, i.e. this is carriage return + line feed. also called line break.
«line feed» go to same position on next line, rarely needed
«+1 indent» adding one level of indentation
«-1 indent» removing one level of indentation, rarely needed
«+0 indent» to the same level, but this would always be assumed after newline
«0 indent» to level 0, which is the left margin
There are two default assumptions which are not Choices because they are universal defaults:
«Where K is keyword or literal, Change K + K to K + space + K»
«Change whitespace + newline to newline».
Choices are not cumulative. Once a Choice has been applied for a token, go to the next token.
The initial general assumption is:
All text is changed so that all tokens are separated by a single space.
As we will see in later sections, that assumption can be overridden,
especially when there are comments and parentheses.
So most formatting choices can be phrased as:
Choice: [For a particular situation]
[before] add N newlines, indent N levels or left-align or right-align
[after] add N newlines
Example = «Choice: change FROM to 1 newline + indent+1 + FROM + [nothing]
«Choice: change WHERE to 1 newline + indent+0 + WHERE + [nothing]
«Choice: change AND to 1 newline + right-align + AND + [nothing]
SELECT skill FROM employees WHERE 0 = 0 AND 1 = 1;
Format symbols
Choice: change arithmetic operator to space + arithmetic operator + space?
Choice: change comparison operator to space + comparison operator + space?
Choice: && and || meaning OR are equivalent to AND and OR?
Choice: change . to space + . + space, or to .?
Choice: change || meaning concatenate to || + newline?
Choice: change semicolon to space + semicolon?
Choice: change comma to comma + space?
In this context a symbol is a token which is made up of non-alphabetic
characters, such as = or <>.
SQL does not require that symbols must be separated by white space,
for example (‘x’||’y’=’z’AND-1=»5″) and ( ‘x’ || ‘y’ = ‘z’ AND -1 = «5» ) are the same.
Therefore there are Choices for symbols but not for other tokens.
A binary operator is a symbol that separates two operands.
The operation may be arithmetic, concatenation, bitwise, comparison, or assignment.
Generally it does not matter — all binary operators except . have the same rule,
and the unanimous opinion of prescriptive guides is that that rule should be:
change binary operator to space + binary operator + space = yes.
But there are three possible exceptions: for && and || meaning OR, for ., and for || meaning concatenate.
By the way the SQL standard term is «dyadic operator» but
according to Google Ngrams the term «binary operator» is much more common.
«&& and || are equivalent to AND and OR: yes» would only apply for
MYSQL/MariaDB. See «Format AND / OR». There is no guidance for this setting.
«change . to space + . + space» means that qualified identifiers
look like «table_name.column_name» instead of «table_name . column_name».
It is clear from vendor examples that «table_name.column_name» is the
preference, but I have seen exceptions.
«change || meaning concatenate to || + newline»
is a Choice because there is a formatter that has special options for ||.
I suspect that this choice exists for the sake of long literals.
See section Long literals.
Another, more general, exception is: operators can be aligned if they
are in lists.
See section Format lists».
There is no Choice for other (monadic) operators such as unary minus
or ~ for negation. They are always: space + operator, with no space before the next token.
There is no choice for : when it is used with a label in SQL/PSM.
It is always : + space, with no space before the previous token.
«change semicolon to space + semicolon» is an uncommon choice but I have seen » ;»
in the Microsoft manual
and I have read that in eclipse it is
considered a bug
if the formatter always removes space before semicolon.
«change comma to comma space» is applicable only for commas
that are not in a list (see Format lists).
So the Choice affects function arguments like substring(1,2)
or declarations like DECIMAL(5,3).
Style guides for C, such as
this one at umd.edu, say
«Leave one space after a comma», but are probably not applicable.
Prescriptive guides:
Taranov gives an example where + instead of || is used for concatenation.
Holywell says: «Although not exhaustive always include spaces: before and after equals (=) after commas (,) surrounding apostrophes (‘) where not within parentheses or with a trailing comma or semicolon.»
Vendor Manuals:
In the MariaDB manual there is sometimes no spacing around = or before *.
Some Oracle examples are
here.
Oracle SQL Developer has three options:
«Spaces around operators (= < > + — * /)», «Spaces around commas», «Spaces around brackets [i.e. parentheses]»
Bloggers:
ApexSQL: there is an option for «add space inside parentheses»
Toad has a single option for «Plus-Minus-Mul-Div-Concat».
Format parentheses
Choice: change table-name + ( to table-name + space + (?
Choice: change built-in-routine-name + ( to built-in-routine-name + space + (?
Choice: change user-defined-routine-name + ( to user-defined-routine-name + space + (?
Choice: change data-type + ( to data-type + space + (?
Choice: change ( to ( + space?
Choice: change ) to space + )?
Choice: align ) with statement start?
«change table-name + ( to table-name + space + (?»
affects things like
table-name(column-name) as in CREATE TABLE table-name(column-name),
INSERT INTO table-name(column-name), CREATE INDEX index-name ON table-name(column-name).
The most common setting is: yes.
«change built-in-routine-name + ( to built-in-routine-name + space + (?»
is the decision whether to say UPPER(…) or UPPER (…).
«change user-defined-routine-name + ( to user-defined-routine-name + space + (?»
is the decision whether to say user_function(…) or user_function (…)
Often the setting is No for both.
But sometimes there are recommendations to say Yes for one and No for the other,
so that it will be clear from the syntax what kind of routine you are calling.
I am guessing that this happened because of C guidelines,
for example this one at cmu.edu, that say
«Do not put parens next to keywords. Put a space between.
Do put parens next to function names.
Keywords are not functions. By putting parens next to keywords keywords and function names are made to look alike.»
The
Java Code Conventions are similar.
But if that is the reasoning, it is an example of how C/Java rules get applied to SQL unnecessarily.
«change data-type + ( to data-type + space + (»
is the decision to say VARCHAR (10) instead of VARCHAR(10).
Really, nobody does. But I state it as a choice because it seems to be the
only case where reserved-word + ( does not become reserved-word + space + (.
Think of «IF («, «IN («, «WHILE («, «>= ALL («, «SELECT («, «VALUES («.
«change ( to ( + space» and «change ) to space + )»
is rare, but I have seen a recommendation for it.
Notice that this would change (((...)))
to ( ( ( ... ) ) )
.
For operator (expression) see Format symbols.
For (expression list) or (column list) see Format lists.
Prescriptive guides:
Mazur says: «Avoid spaces inside of parenthesis».
Vendor manuals:
In all vendor manuals, I was able to find parentheses in conditional expressions
which did not contain a space after ( or a space before ).
The SQL standard document uses ( ( (
consistently.
Bloggers:
Michael Kruckenberg and Jay Pipes, in
«pro SQL», say:
«you may want to have your style guide
require that SQL statements always include a space after the function name
when using stored functions.» This tip might depend on the proper setting
of sql_mode=ignore_spaces in MySQL or MariaDB.
«Choice: Change ( to ( + newline + indent+1» in combination with
«Choice: align ) with statement start = yes» is something I have
only seen for
CREATE TABLE … ( (Holywell)
ALTER TABLE …( (Taranov)
and WITH … (. (Taranov, Mazur, Benenson).
That is, the effect is
VERB name-or-expression ( )
Lukas Eder
has some recommendations about when parentheses are good, and when they are not.
Format comments
Choice: change simple comment to simple comment + 1 newline?
This is not really a choice, because nothing else makes sense.
I cannot think of another time in SQL where <newline> cannot be eliminated or replaced by some other white space.
Therefore it might be handy to put — at the end of a line, to prevent a formatter program from removing the newline.
Choice: Put bracketed comment with following keyword.
Choice: Put bracketed comment after comma in a list.
Choice: Put bracketed comment on separate line, left-align?
Given
SELECT skill /* per test */ FROM employees;
only a human could know whether the comment is about the
column or about the FROM clause. Therefore, the human
who writes the statement should signal.
One way is to stick to the advice in section Comments
and always put bracketed comments before their subjects.
Another way is to put extra whitespace around the
comment, and a formatter program should take that into account.
(That is especially applicable for a comment within a phrase.)
If neither of those solutions can be trusted, the comment
will have to go on a line of its own.
Prescriptive guides:
Holywell has an example of a bracketed comment preceding the statement, on its own line.
Vendor examples:
Oracle: Bracketed comments may precede or may follow.
DB2: Bracketed comments are at end of line (same as simple comments) but can include newlines (unlike simple comments).
SQL Server: Bracketed multi-line comment precedes the whole statement.
MySQL: Hints come after the word SELECT and clearly belong with it.
MariaDB: Comments come at end of statement but before semicolon.
Tarantool: Comments come on the same line either before after the semicolon.
Blogs:
Toad has folding for: «multi-line comments, multiple single-line comments on consecutive lines»
Format line length
Choice: Left margin: N?
Choice: Right margin: N?
Choice: Threshold = N?
Choice: If SQL statement length < Maximum line length: skip all other format choices?
Ordinarily Left margin = 1 because, as in editors, the minimum column number on a line is 1, not 0.
However, for indenting a left edge is called a
«zero point» as in editors, so «indent 0» means the Left margin.
Therefore Right margin — Left margin = Maximum line length.
«Maximum characters per line» would be a better term
but «Line length» is a term that infests style guides
so I stick with it.
There is no consensus what Maximum line length should be, and
computer
style guides for other languages
vary greatly.
Suggestions include: 72, 80, 120.
It might be useful to ask how wide is a screen, or how wide is a printed page,
but those too are things which vary greatly,
so it is prudent to assume that different people will choose different values.
«Threshold» — not a common term — is a position before
the Right margin. The correct effects of passing the Threshold
are not all clear to me, but I speculate that they include
squeezing the contents because space is getting dear.
For example, if Right margin = 80 and Threshold = 60
and length-of-text = 70, then one might ignore any changes
which require further indenting,
and ignore changes which require adding unnecessary spaces.
«If SQL statement length < Maximum line length: skip all other format choices = yes»
means you do not have to indent anything if the entire statement can fit on one line.
In other words, if you decide = yes, then this short line remains as is:
SELECT skill FROM employees;
But if you decide = no,
or,
if SQL statement length >= Maximum length,
then there is a need for more than one line.
The newlines should, if at all possible,
be on some lexical boundary.
Usually that means «newline for new clause» and I will devote several sections
to clauses, as well as to «Format lists».
Prescriptive guides:
Donnelly says: «Newlines should be used for any query that is at all complex or longer than 72 characters.»
Mazur says:
«The only time you should place all of your SQL on a single line
is when you’re selecting one thing and there’s no additional complexity in the query:»
Vendor manuals:
Oracle SQL Developer once had options for «schema type = small SQL»
and «threshold for small SQL = N», and
(
as far as I can make out)
you could use this for the equivalent of «Choice: no indenting if statement length less than N».
This is no longer true.
Nevertheless I quote an earlier description:
«If you set the ‘Schema Type’ to small SQL — you have to mind the ‘Threshold for small SQL’
as that allows you to leave code lines that don’t meet that threshold untouched,
which is nice if you want to avoid unnecessary line breaks for very small lines of code.»
Bloggers:
Factor says: «SQL code doesn’t have to be broken into short lines like a Haiku poem
… to specify that there must always be a line-break between each phrase
(before the FROM, ON, and WHERE clauses, for example) can introduce an unnecessary amount of white space into code.»
Fontology says:
«To determine line length for optimum readability, a good guideline is between 9 and 12 words for unjustified text.»
That establishes that «line length» can be the right phrase. But SQL Developer has an option «Max char line width».
This
exchange on toadworld.com
shows that Toad formatter by default goes as far as line length even if there are
clauses, but (in Toad parlance) «folds» if line length is small.
Also
this
shows how Toad «folds» when you change right margin.
It is pretty clear that it initially will not fold, but will fold on clause boundaries as you reduce right margin.
This is not quite the same as
Choice: No new line and indenting if statement length less than maximum line length?
it is more like
Choice: No new line and indenting if CLAUSE length less than maximum line length?
… I think the meaning is: fold on clause boundary if and only if line too long.
In another toadworld.com exchange, a Quest employee
says
«I think it’s more intuitive that folding occurs at logical positions like open and close parens,
entire statements, entire blocks, etc. Folding on individual clauses, expressions,
and other pieces makes for a big headache and opens the door for all kinds of error.»
Indenting units
Choice: The indenting unit is a tab, or a space?
Choice: For fixed indenting, the number of units is: N
Choice: For fixed indenting, after the first level, the number of units is: N
To be exact, a tab is Unicode character U+0009 CHARACTER TABULATION
and a space is U+0020 SPACE. Usually the indenting unit is a space,
and with a fixed-width font there is no problem with that.
So choosing «number of units = N» means «fixed indenting is 2 spaces»
for level 1, 2 * 2 spaces for level 2, and so on.
Suggestions for a fixed indenting amount are: 2, 3, or 4 spaces.
Ordinarily, the indent amount is the same for every level.
Rarely, the indent amount is a larger amount (for example 4) for the first level,
and a smaller amount (for example 2) for later levels.
Prescriptive guides:
Benenson says: 2 spaces. Salvisberg says: 3 spaces. Mazur says: 4 spaces.
Holywell says: Indent column definitions by four (4) spaces within the CREATE definition.
Vendors:
Oracle SQL Developer has options for «indent N spaces» and
«indent amount = space|tab», which to me indicates that
they realize that some people prefer tabs.
The SQL standard uses 4 for the first level, 2 for levels after the first level.
Bloggers:
Factor says: usually 2 or 3 spaces.
The Toad formatter default is 3 spaces.
Andy Mallon
goes into detail why spaces are preferable.
Bloggers:
ApexSQL, in
SQL query basics – How to improve readability by formatting commas, spacing and alignment, says
«you can also choose to add a fixed number of spaces after each column» (in a select list)
option for «each clause and each clause set argument begins on separate lines»
Format clauses deciding what is a clause
Choice: Decide what is a clause according to common sources?
Choice: Decide what is a clause according to the statement’s BNF?
When we’re talking about
indenting or aligning, we’re talking about lining up with what a
clause or expression is subordinate to.
Before saying «what to do with clauses» I must say «what is a clause».
There is no official definition that is appropriate.
We can point at a statement like
SELECT something FROM something WHERE something ORDER BY something;
and note that everybody agrees that SELECT / FROM / WHERE / ORDER BY
are «clause starts» (statement starts are a special form of clause starts).
So the general form for them is
something-that-is-not-clause-start keyword-phrase-as-clause-start [content]
But not every keyword phrase is a clause start. Define more narrowly.
My definition of «common SQL statements» is: statements
that are defined in the SQL standard 9075-2 Foundation document,
and supported in Oracle plus DB2 plus SQL Server.
My definition of «common clauses» is: clauses that cause indenting
in examples from the vendor manuals. I do not mean «always cause
indenting», in fact the examples are a minority of all cases.
The clauses that caused level-1 indenting in at least two examples
in two different vendor manuals are:
In ALTER TABLE: ADD, ALTER, DISABLE, DROP, ENABLE, REBUILD, WITH
In CREATE TABLE: ON, WITH
In CREATE TRIGGER: AFTER, BEFORE, FOR, REFERENCING
In CREATE VIEW: AS, SELECT
In DELETE: WHERE
In GRANT: TO
In INSERT: SELECT, VALUES
In SELECT: SELECT, FROM, JOIN, WHERE, GROUP BY, ORDER BY, LIMIT
In UPDATE: SET, WHERE
… This is a very conservative list.
… SELECT would cause indenting anyway because subqueries are clause boundaries.
If prescriptive guides are not explicit and vendor examples are lacking,
there is still the official BNF
Backus-Naur form
or railroad diagram.
Nobody explicitly says the BNF is an authority,
but if you decide that it is, it might help you
decide when deciding «when does a clause end» or «is this clause part of that clause».
The arbitrary and simplistic rules for using a BNF are:
1. A keyword plus a series of non-keywords is a clause.
(So a clause-start is the first keyword or phrase, and ends
when there is another keyword after the series of non-keywords),
or ends when the statement ends.)
2. The first clause (the statement start) may have sub-clauses.
3. Ignore BNF [options] words that are not in the statement that you are writing.
How well do they work?
I will take BNFs for some statements from the SQL standard, and apply them to minimal statements.
Assume that the clause choice is: change clause_start to indent-+1 + clause_start.
Assume that the lists choice is (we will define later in the Format lists section).
Assume that the threshold is 0.
BNF = CREATE ROLE <role name> [ WITH ADMIN <grantor> ]
The clauses are CREATE ROLE … and WITH … So:
CREATE ROLE r WITH ADMIN g;
BNF = CREATE [ RECURSIVE ] VIEW <table name> <view specification> AS <query expression> [ WITH [ <levels clause> ] CHECK OPTION ]
My example will not use the optional <levels clause> so the clauses are
CREATE VIEW … and AS … and WITH CHECK OPTION.
It is also decided (see «Format clauses») that SELECT is a clause too
but it is not in the main BNF and is therefore a sub-clause. So:
CREATE VIEW AS SELECT * FROM t WITH CHECK OPTION;
BNF = DECLARE <cursor name> <cursor properties> FOR <cursor specification>
The clauses are DECLARE and FOR. So:
DECLARE c CURSOR FOR SELECT * FROM t;
BNF = FETCH <fetch orientation> FROM <cursor name> INTO <fetch target list>
The clause starts are FETCH and FROM and INTO. So:
FETCH c FROM c INTO c;
BNF = GRANT <privileges> TO <grantee> [ { <comma><grantee> }… ] [ WITH HIERARCHY OPTION ] [ WITH GRANT OPTION ] [ GRANTED BY <grantor>]
The clause starts are GRANT, TO, WITH, GRANTED. So:
GRANT UPDATE (c) ON t TO c WITH GRANT OPTION;
BNF = PREPARE <SQL statement name> [ <attributes specification> ] FROM <SQL statement variable>
The clause starts are PREPARE, FROM. So:
PREPARE s FROM s;
BNF = SET CONSTRAINTS <constraint name list> { DEFERRED | IMMEDIATE }
The clause starts are SET, DEFERRED, IMMEDIATE. So:
SET CONSTRAINTS c DEFERRED;
BNF = SET SESSION CHARACTERISTICS AS <session characteristics list>
There are no clauses below SET. However, <session characteristics list>
is subject to the
Format lists
choice like any other list and let’s say the Choice was to start with newline + indent+1.
So:
SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
BNF = UPDATE <target table> [ FOR PORTION OF <application time period name> FROM <point in time 1> TO <point in time 2> ] [ [ AS ] <correlation name> ] SET <set clause list> [ WHERE <search condition> ]
I will not use FOR or AS or or FROM or TO.
Therefore the clauses are UPDATE… SET … WHERE … So:
UPDATE t SET c = v WHERE TRUE;
In all cases the results are acceptable.
Prescriptive guides:
No prescriptive guide explicitly suggests making a clause list by either of the methods in this section.
Examples from vendor manuals:
For «Decide what is a clause according to common sources?» I looked at what
more than one vendor says. Of course, if you use only one vendor, your list could be larger.
I looked for examples that clearly do not work with
any combination of Choices along with:
Use the statement’s BNF to decide what is a clause = yes.
Despite looking at a large sample, I found only one:
DB2:
GRANT SELECT, UPDATE ON TABLE DSN8A10.DEPT TO PUBLIC;
In this case IBM seems to have decided that TO … is a clause start but ON … is not.
Bloggers:
This
stackexchange thread
had some people attempting to define «clause»:
Tako Lee’s
list of options for sqlparser
is large because Tako Lee chose to allow different settings for different clauses.
I chose to list the settings only once unless I saw that prescriptive guides or vendors
were saying things like «indent +2 if it is JOIN, indent +4 if it is AND, etc.».
Format clauses by indenting
Choice: change clause-start to newline + indent+0 + clause-start
Or change clause-start to newline + indent+0 + clause-start
Or change clause-start to clause-start + newline + indent+1
Or change clause-start to newline + indent+1 + clause-start
And
Choice: change SELECT to SELECT + newline + indent+1
Or
Choice: change clause-start to newline + indent+0 + clause-start + newline + indent-+1
Usually the prescriptive guides use the term «left align» where I use «indent+0».
Let us see the effect of each Choice setting on this SELECT statement:
SELECT DISTINCT id FROM employees WHERE id > 5 GROUP BY id;
Assume indent amount is 4 spaces.
«change clause-start to newline + indent+0 + clause-start»
SELECT DISTINCT id FROM employees WHERE id > 5 GROUP BY id;
«change clause-start to newline + indent+0 + clause-start + newline + indent+1»
SELECT DISTINCT id FROM employees WHERE id > 5 GROUP BY id;
«change clause-start to newline + indent+1 + clause-start»
SELECT DISTINCT id FROM employees WHERE id > 5 GROUP BY id;
This indenting helps emphasize that FROM and WHERE and GROUP BY
are «subordinate» to SELECT.
«change SELECT to SELECT + newline + indent+1»
Benenson suggests «SELECT goes on its own line» and «Align all columns to the first column on their own line».
SELECT DISTINCT id FROM employees WHERE id > 5 GROUP BY id;
Format clauses by right-aligning keywords
change clause-start to newline + right-align word + clause-start
Or change clause-start to newline + right-align phrase + clause-start
«change clause-start to newline + right-align word + clause-start»
SELECT DISTINCT id FROM employees WHERE id > 5 GROUP BY id;
«change clause-start to newline + right-align phrase + clause-start»
SELECT DISTINCT id FROM employees WHERE id > 5 GROUP BY id;
Notice in this example that there is a
«river of white».
All the clause-starts have ended up on the left side of the river;
all the clause-ends (id and employees and id > 5 and id) have ended up on the right side.
«Right align word» usually produces rivers but «right align phrase» does it more.
Prescriptive guides:
Donnelly says: new clauses makes new start of line. ON is not a clause-starter. rivers: yes, right-aligned indenting.
Donnelly describes rivers of white thus:
«The keywords that begin a clause should be right-aligned.
The idea is to make a single character column between the keywords and their objects.»
Donnelly even shows an example where the statement-start (SELECT) is not at Left margin
because SELECT must right-align by phrase with GROUP BY:
SELECT key_column, COUNT(1) FROM tablename GROUP BY key_column;
Holywell says: rivers: usually, right-aligned first-word indenting.
«Rivers are bad in typography, but helpful here.»
In Taranov examples,
it looks like ON and AND are right-aligned, but FROM/WHERE/JOIN/GROUP BY/ORDER BY are indent+0.
This was not explained so I did not suggest a Choice for it.
Salvisberg says: rivers: yes, right-aligned indenting. i.e. align on first word, not align on phrase.
I am going to be repetitious now. Are subordinate clause-starts left-aligned or right-aligned?
Well, this is what left-align looks like:
UPDATE IGNORE SET WHERE OR
This is what right-align looks like:
UPDATE IGNORE SET WHERE OR
Thus, the last character of each word is in the same position.
This is what right-align with the last word looks like:
UPDATE IGNORE SET WHERE OR
Thus, when the clause-start is a phrase rather than a single
word, the alignment is with the last word of the longest clause-start (IGNORE in this case)
rather than with the first word (UPDATE in this case).
If right-aligning is used, then number-of-spaces in indenting units will not matter.
* Oracle SQL Developer has an option: «Right-Align Master Keywords»
(but the number of «master» keywords is limited (as in SELECT … INTO … WHERE)), and
an option «Indent Main Keyword 2x» (they mean the lines after the verb),
It is
relatively new.
James Thigpen
has
an example like Donnelly’s, where SELECT is moved over to right-align with GROUP BY.
Format clauses by aligning contents
Choice: Format clauses by aligning contents?
If we are aligning contents, then all the clause starts are on the left of the
statement or subquery, and all the contents (the clause ends) are on the right.
This can also be accomplished with right-aligning phrases, but when we align
contents we are putting the spaces after the keyword phrase, not before the phrase.
Extract from an example from Taranov:
(Notice that the commas are not part of the deal — see also the dbvis support question.)
SELECT t1.Value1 AS Val1 , t1.Value2 AS Val2 , t2.Value3 AS Val3 INTO #Table3 FROM CTE_MyCTE AS t1 ORDER BY t2.Value2;
Extract from the
TPC-H benchmark requirements:
select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice*(1-l_discount)) as sum_disc_price, sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price,avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= date '1998-12-01' -interval '[DELTA]' day (3) group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus
Prescriptive guides:
Mazur says:
«Some IDEs have the ability to automatically format SQL
so that the spaces after the SQL keywords are vertically aligned.
This is cumbersome to do by hand (and in my opinion harder to read anyway)
so I recommend just left aligning all of the keywords:»
Format lists
Choice: Lists maximum number of items per line = N?
Choice: Lists start with: space, or newline + usual fixed indent + N levels, or newline + align past preceding word?
Choice: Lists , i.e. end-of-line comma becomes , + newline or newline + , or newline + , + space?
Choice: Lists have similar parts align?
Choice: Lists ( i.e. initial left parenthesis becomes ( or newline + ( or newline + ( + newline?
Choice: Lists ) i.e. final right parenthesis becomes ) or newline + )?
For this section, list means: a comma-delimited list of columns or expressions.
Examples of lists are SELECT expression-list FROM …,
INSERT INTO t (column-list) VALUES (expression-list),
IN (expression-list),
CREATE TABLE … (table-element-list)
routine-call (argument-list),
routine-definition (parameter-list).
I have seen formatters that have separate options for each type of list,
but none of the prescriptive guides recommend that.
I will also discuss CREATE TABLE … (table-element-list) in a later section.
SQL people are used to seeing tables. Tables have rows and columns.
Therefore SQL people are used to seeing items on separate lines, with
the start of each item aligned.
Therefore in SQL this is okay even if other languages do not do it.
Sometimes a list which is vertical is called a «stack» or «stacked list».
Suppose that we’re looking at
SELECT DISTINCT alpha + 100 AS c1 /* one */, beta AS c200 /* two */, epsilon — 2 AS c3.
Set maximum number of items per line = 1.
That means that for every item there is a newline:
SELECT DISTINCT alpha + 1 AS c1 /* one */, beta AS c200 /* two */, epsilon - 2 AS c3
I have seen formatters that allow for setting N to
a number greater than 1, although the option is not in prescriptive guides.
Set start with = newline.
That means that there is a newline before the first item:
SELECT DISTINCT alpha + 100 AS c1 /* one */, beta AS c200 /* two */, epsilon - 2 AS c3
Set comma becomes newline + comma + space.
That means the commas come at the start rather than at the end:
SELECT DISTINCT alpha + 100 AS c1 /* one */ , beta AS c200 /* two */ , epsilon - 2 AS c3
Arguments that I have seen for comma-at-start style include:
* it is easier to comment out columns (you do not have to comment out the line and then remove the comma on the previous line)
* it is easier to add a comment after the expression (I do not understand this argument)
* commas-at-end = yes may look natural to people who use left-to-right alphabets, but lots of people use right-to-left.
Set have similar parts align = yes.
That means not only the item starts, but also
the operators and ASes and comments align:
SELECT DISTINCT alpha + 100 AS c1 /* one */ , beta AS c200 /* two */ , epsilon - 2 AS c3
Again, I have seen formatters that allow for this, although
the option is not in prescriptive guides (except for other statements).
Now put the list inside parentheses.
Set ( = newline + ( + newline.
That means we go now have a ( on its own line aligned with the commas:
SELECT DISTINCT ( alpha + 100 AS c1 /* one */ , beta AS c200 /* two */ , epsilon - 2 AS c3)
Set ) = newline + )
That means the parentheses are now aligned:
SELECT DISTINCT ( alpha + 100 AS c1 /* one */ , beta AS c200 /* two */ , epsilon - 2 AS c3 )
Thus there are only 6 basic «choices» but some of the choices
have more than 2 possible settings, and the choices can be
combined independently, and sometimes people will want to
have different settings for each type of list.
So there are hundreds of possible settings.
Prescriptive guides:
Salvisberg, Taranov say: commas-at-start = yes.
And Salvisberg says: space-after-the-comma = yes.
Celko, Mazur, Holywell say: commas-at-start = no.
Salvisberg says «operators aligned» — perhaps he means assignment operators.
Vendor manuals:
Oracle: comma-lists mostly on same line but indenting when line too long, comma-at-end, no AS, ) at end
DB2: keywords right aligned = no (2 spaces) and comma-lists mostly on same line
SQL Server: comma-lists mostly on same line
SQL Developer has an option «Before comma» which is equivalent to «Commas first = yes».
Bloggers / Others:
Joe Celko’s programming style» says:
«“Put commas at the end of a line, not the start.
A comma, semicolon, question mark, or periods are visual signals that something has just ended, not that it is starting.
Having a comma at the start of a line will make the eye tick leftward as it looks for that missing word that was expected
before the comma.”
Simon Holywell says: «There’s no way of being polite here — this [i.e. comma-at-start = yes]
looks hideous, weird and totally unconventional in a bad way.»
ApexSQL says:
«However, most often commas as list-separators are written at the beginning of a line
to make commenting of list members easier in development. This is also a standard in Adventureworks2012 database:»
Factor says: «You’d probably also want to insist on a new line after each column definition.»
(but in SELECT etc.)
«Now, no typesetter would agree to arrange this [list of my favourite cheeses]
in a vertical list, because the page would contain too much white space…
… …and they’d be most unlikely to want to put commas at the beginning of list elements.
However, if the list elements consisted of longer strings, then it would be perfectly acceptable.
In the same way, the rules for formatting SQL have to take into account the type of SQL statement
being formatted, and the average length of each list element
…
Commas, used as list separators, are often put at the beginning of lines.
I realize that it makes the «commenting out» of list members easier during development,
but it makes it difficult for those of us who are used to reading English text in books.
Commas come at the end of phrases, with no space before them, but if they are followed
by a word or phrase on the same line, then there is a space after the comma.»
Factor also says: «Put a line-break between list items only when each list item averages more than thirty or so characters.»
Re parentheses, see
Firefox Data Documentation SQL Style Guide re parentheses.
Oracle SQL Developer has multiple options which would especially
affect select lists and UPDATE … SET lists and WHERE lists with
multiple conditions: «Align Equal signs (= < > …)»,
«Align Operator signs (* — + …)», «Align AS keyword»,
«Align on commas», «Align line comments [i.e. simple comments]»,
«Align || at end of line», «Align variable declaration for stored procedures».
Taken together, these options all correspond to «Choices»
and presumably a procedure would look like this if they were all on:
CREATE PROCEDURE p ( first_parameter CHAR(1) := 'a'; second_parameter CHAR(22) := 'b'; ) INSERT INTO t1 SELECT first_column AS first_alias, second_column AS b , -- comment really_big_column AS c -- comment FROM t2 WHERE first_column = 12345 -- comment AND second_column <= 0 -- comment ORDER BY fifth_column || sixth_column ...
Bloggers:
Benenson says: SELECT goes on its own line».
From an explanation on
support.dbvis.com
»
1) The formatter indents the column names, ignoring the commas. So
with an indentation size of 4, the column name is indented to start in
column 4. If you have also selected Line Break Before Comma, the
comma will be appear before the column name. This is the intended
behavior.
«
Thus: with indenting or aligning, plus commas before, you are not trying to line up
the commas, you are trying to line up the contents.
The formatter comes from
SQLinForm.
Oracle SQL Developer had an option «For «Number of columns per line»
Toad had an option for «Stacked on more than N» (default 3), see
here and
here.
Oracle PLSQL Coding and Naming Standards
says «Parameters must be stacked unless less than three.»
… These would all correspond with «Choice: Lists maximum number of items per line = 3».
Format conditions
Choice: same as for Format lists?
or
change AND/OR to newline + indent+1 + AND/OR
or
change AND/OR to newline + indent+0 + AND/OR
or
change AND/OR to newline + right-align + AND/OR
or
change AND/OR to AND/OR + newline + indent-+0
Conditions are really just expressions that can return boolean values,
so they can appear anywhere, not just after WHERE or HAVING or ON or IF or WHERE.
But here I will illustrate with WHERE.
For simple conditions without AND/OR most examples show: WHERE condition.
The «same as for Format lists» setting would mean that a
a series of conditions can be regarded
as a list, with each AND/OR condition being one «item» of the list.
Potentially that would imply that a
string of conditions would have all their parts aligned, a setting
that we might also see for CASE operations, later.
The «change AND/OR to AND/OR + newline + indent-+0» setting
means AND/OR go to the end of a line, not the start of the next line.
This would be compatible with the idea that AND/OR are not «clause
starts», they are «operators».
And operators belong at the end of a line, if we follow Java conventions.
Prescriptive guides:
Extract from a Taranov example:
INNER JOIN dbo.Table3 AS t2 ON t1.Value1 = t2.Value1 WHERE t1.Value1 > 1 AND t2.Value2 >= 101
This fits with «clause starts on the left, clause ends on the right»,
and «change AND/OR to newline + right-align + AND/OR»
Holywell:
WHERE ... OR ... OR ...
This fits with «same as for Formats list», «similar parts aligned», and
«change AND/OR to newline + indent+1 + AND/OR».
Mazur fits with «change AND/OR to AND/OR + newline + indent+0» (i.e. indent+1 from the where)
where ... and ...
Thus Mazur is the only one who says AND/OR-at-end.
«all conditions within WHERE are indented (4 spaces), AND/OR is at end of line»
Benenson fits with «change AND/OR to newline + indent+0 + AND/OR»
WHERE ... AND ...
Salvisberg, Holywell fit with: «change AND/OR to newline + right-align + AND/OR» (i.e. right-align from the where)
WHERE ... AND ... WHERE ... OR ...
Vendor manuals:
Example from
Oracle:
WHERE NOT (job_id = 'PU_CLERK' AND department_id = 30)
Example from
SQL Server:»
WHERE EmployeeKey <= 500 AND LastName LIKE '%Smi%' AND FirstName LIKE '%A%'; Examples from DB2: WHERE WORKDEPT = 'E11' AND NOT JOB = 'ANALYST' + WHERE EDLEVEL > 12 AND (WORKDEPT = 'E11' OR WORKDEPT = 'E21')
… Quick summary: none of the vendor manuals’ examples follow any special rules.
SQL Developer has an option for «Indent AND/OR» (they mean one additional fixed-units indent so AND/OR is one level after WHERE).
Bloggers:
From an article in
Oracle Magazine:
WHERE ... AND ...;
This fits with «same as for Formats list», «similar parts aligned», and
«change AND/OR to newline + indent+1 + AND/OR».
Format subqueries
Choice: change subquery start to subquery start, or newline + indent-+1 + subquery start
The «subquery start» is ( or ALL/IN/ANY/EXISTS ( or SELECT.
The Choices for ()s are the same as in section
Format lists; for examples I chose the newline settings.
If «change subquery start to newline + indent-+1 + subquery start»
WHERE ( SELECT ... )
If «change SELECT to newline + indent+1 + SELECT + newline» and
«change FROM to newline + indent+2 + FROM».
CREATE VIEW v AS SELECT c FROM t;
Notice that the FROM is indented 4 spaces under the SELECT,
not 4 spaces under the CREATE, because sub-clauses are always
indented from the main clause, not from the statement start.
Prescriptive guides:
Salvisberg gives this example:
SELECT emp.last_name ,emp.first_name FROM employees emp WHERE emp.employee_id IN (SELECT j.employee_id FROM jobs j WHERE j.job_title like '%Manager%');
Benenson says: (subquery-is-clause-start: yes)
Format WITH
Choice: none, setting is as described in section Format parentheses.
In section Format subqueries the Choices involved putting ) at the end of
a SELECT without a newline, or aligning ) with (.
But in examples for WITH I regularly see a third way: align ( with WITH.
For example (this is from a
mozilla.org SQL Style Guide)
WITH sample AS ( SELECT client_id, FROM main_summary WHERE sample_id = '42' )
Prescriptive guides:
Taranov, Mazur, Benenson examples all show Align right parenthesis with WITH = yes.
Format joins
Choice: change JOIN to newline + indent+1 + JOIN, or newline + indent+0 + JOIN?
Choice: change ON to ON, or newline + indent+1 + ON?
Choice: old style join like JOIN?
JOIN is a sub-clause of FROM, so «indent+1» means «indent one level
after FROM», which may or may not be the same thing as «indent one level
after SELECT». ON is a sub-clause of JOIN, so «indent+1″ means indent one level
from JOIN».
«old style join like JOIN = yes» would suggest that however you would format «CROSS JOIN» is how you would format «,».
But the vendor manual examples which have old style joins (Oracle, DB2, Tarantool)
have them as comma-separated lists («PARTS, PRODUCTS», «employees, departments», «t1 AS a, t1 AS b»).
Prescriptive guides:
Taranov shows: change JOIN to newline + right-align word, change ON to newline + right-align phrase + ON
SELECT INNER JOIN ... ON ...
Holywell shows: change JOIN to newline + indent+1 + JOIN, change ON to newline + indent+0 + ON
FROM ... JOIN ... ON ...
Holywell explains: «Joins should be indented to the other side of the river and grouped with a new line where necessary.»
But in another example Holywell shows JOIN and ON right-aligned with SELECT and FROM.
Salvisberg shows: change JOIN to newline + indent+1 + JOIN, + names align (as if it is a list) and
change ON to ON
FROM ... JOIN ... ON ...
Mazur, Benenson, Donnelly say or show: change JOIN to newline + indent+0 + JOIN, change ON to ON
FROM ... JOIN ... ON ...
This is also the style of blogger James Thigpen.
Mazur suggests that ON might be on a different line if the condition is complex.
Examples in vendor manuals:
Oracle = inconsistent
DB2 = inconsistent
SQL Server = inconsistent
SQL Developer has an option for «JOIN statements» (they mean JOIN clauses).
Format INSERT
Choices: none needed.
If there is a list of target columns,
or if there is a list of VALUES columns, then see Format lists.
If there is a VALUES or SELECT clause, then see Format clauses.
So the Choices are already made, there is nothing to do now
except give an example. This is one from Salvisberg, who
believes that the list of target columns should always be
specified, and consistently chooses the setting that puts
commas in front, and believes that a clause start should be
changed to newline + indent+1 + clause start.
INSERT INTO departments (department_id ,department_name ,manager_id ,location_id) VALUES (departments_seq.nextval ,'Support' ,100 ,10);
Format UPDATE
Choices: none needed
As with INSERT, the Choices are already made, there is
nothing to do now except give an example. This one is
from Holywell, who believes clause starts should be right-aligned
and commas in comma lists should be at end.
/* Updating the file record after writing to the file */ UPDATE file_system SET file_modified_date = '1980-02-22 13:19:01.00000', file_size = 209732 WHERE file_name = '.vimrc';
Format CREATE TABLE
Choice: see Format lists.
The table-element list in CREATE TABLE is a list.
Therefore the same 6 choices apply, although you might choose different settings.
This is what a CREATE TABLE will look like if the choices are:
Lists maximum number of items per line = 2
Lists start with: newline + usual fixed indent + 1 level
Lists , i.e. end-of-line comma becomes , + newline
Lists have similar parts align = yes
Lists ( becomes (
Lists ) becomes )
Alternate phrasing: Inside CREATE TABLE any list of columns or constraints is
a comma-delimited list so the «Format lists» rules apply.
CREATE TABLE employees (id INTEGER PRIMARY KEY, class INTEGER, name CHAR(255) UNIQUE DEFAULT 5, markup INTEGER);
Probably «items per line = 2» is
not an optimal choice; recommendations are either 1 or infinite.
The «similar parts align» works well because the parts
(name, data type, column constraint, default) are well defined and ordered,
but would look less lovely if the DEFAULT clause was
long and came before UNIQUE.
And, although saying «) becomes )» saves screen space, it is not the most common choice.
So people may be more likely to choose «items per line = 1»
and «similar parts align = no» and ) becomes newline + )
which results in
CREATE TABLE employees (id INTEGER PRIMARY KEY, class INTEGER, name CHAR(255) UNIQUE DEFAULT 5, markup INTEGER );
If you decide that you like aligning parentheses, then you will probably want
to be consistent and align CASE with END, or BEGIN with END.
Choice: table constraints come after column definitions?
or Choice: column constraint immediately after column data type?
I can imagine it is good to add «UNIQUE (class, name)» as a final item on
a separate line, so that it does not interrupt the column list.
But I can imagine it is good to put «UNIQUE (class, name)» after the
definition of name and class.
(Remember that NOT NULL is a column constraint and it comes immediately after
the data type, so why not other constraints?)
If there is a big long foreign-key constraint, it might be broken up
because the earlier «Choices» determined that column lists get indented
with fixed indenting, aligned parentheses, etc. For example
CONSTRAINT constraint_name FOREIGN KEY REFERENCES table_name (column_name, column_name )
If there is a long check constraint, it might be broken up
because the earlier «Indent clauses» choices decided what should
happen with AND / OR indenting.
Prescriptive guides:
Holywell says: «If it make senses to do so align each aspect of the query on the same character position.
For example all NOT NULL definitions could start at the same character position.
This is not hard and fast, but it certainly makes the code much easier to scan and read.»
Holywell example:
CREATE TABLE staff ( PRIMARY KEY (staff_num), staff_num INT(5) NOT NULL, first_name VARCHAR(100) NOT NULL, pens_in_drawer INT(2) NOT NULL, CONSTRAINT pens_in_drawer_range CHECK(pens_in_drawer >= 1 AND pens_in_drawer < 100) );
This example also is consistent with Align right parenthesis with statement start = yes.
+ column constraint immediately after column data type = yes
Examples from vendor manuals
DB2:
CREATE TABLE EMPLOYEE_SALARY (DEPTNO CHAR(3) NOT NULL, DEPTNAME VARCHAR(36) NOT NULL, EMPNO CHAR(6) NOT NULL, SALARY DECIMAL(9,2) NOT NULL WITH DEFAULT)
Choices: no semicolon, everything upper case, INTEGER rather than INT, adjectives left-aligned, commas follow.
SQL Server:
CREATE TABLE dbo.PurchaseOrderDetail ( PurchaseOrderID int NOT NULL REFERENCES Purchasing.PurchaseOrderHeader(PurchaseOrderID), LineNumber smallint NOT NULL, ProductID int NULL REFERENCES Production.Product(ProductID), UnitPrice money NULL, ... )
Notice: semicolons, keywords upper case, object names Pascal Case, adjectives not aligned, commas follow, INT preference.
+ REFERENCES (because it is a constraint?) is newline + more indent
Oracle:
CREATE TABLE departments_demo ( department_id NUMBER(4) , department_name VARCHAR2(30) CONSTRAINT dept_name_nn NOT NULL , manager_id NUMBER(6) , location_id NUMBER(4) , dn VARCHAR2(300) ) ;
Notice: semicolons at end, keywords upper case, object names snake case, adjectives left-aligned, commas precede.
+ space before semicolon
+ column constraint immediately after column data type
Another
Oracle CREATE TABLE example:
CREATE TABLE books (title VARCHAR2(100), author person_t);
Tarantool:
CREATE TABLE employees_demo (employee_id INTEGER, first_name STRING, last_name STRING NOT NULL, email STRING NOT NULL, phone_number STRING);
SQL Standard 9075-12 document:
CREATE TABLE INFORMATION_SCHEMA_CATALOG_NAME ( CATALOG_NAME SQL_IDENTIFIER, CONSTRAINT INFORMATION_SCHEMA_CATALOG_NAME_PRIMARY_KEY PRIMARY KEY ( CATALOG_NAME ), CONSTRAINT INFORMATION_SCHEMA_CATALOG_NAME_CHECK CHECK ( 1 = ( SELECT COUNT(*) FROM INFORMATION_SCHEMA_CATALOG_NAME ) ), CONSTRAINT INFORMATION_SCHEMA_CATALOG_NAME_FK_CATALOG_NAME FOREIGN KEY ( CATALOG_NAME ) REFERENCES DEFINITION_SCHEMA.CATALOG_NAME );
Final ) is on a separate line aligned with ), space after (, space before ) — those are choices in
Section Format parentheses.
But for the CHECK clause ) ) is on the last line of the subquery — that is a choice in Section subqueries.
All words upper case — those are choices in Names sections.
Format CREATE VIEW
Choice: AS SELECT becomes AS SELECT, or newline + AS SELECT, or AS + newline + fixed-indent + SELECT?
Choice: WITH CHECK OPTION aligns with AS?
The «AS SELECT» setting looks like:
CREATE VIEW view_name AS SELECT
The «newline + AS SELECT» setting looks like
CREATE VIEW view_name AS SELECT
The «AS + newline + fixed-indent + SELECT» setting looks like
CREATE VIEW view_name AS SELECT
I cannot think of any principle that applies, except perhaps that AS is a special clause.
None of the prescriptive guides says anything about this, so I depend entirely on examples
from vendors and blogs.
In this list, some vendors appear more than once because they use more than one setting.
AS SELECT: nobody
newline + AS SELECT: DB2, Celko
newline + AS + fixed-indent + SELECT: nobody
AS + newline + fixed-indent + SELECT: Oracle
newline + AS + newline + SELECT: SQL Server, Factor
newline + fixed-indent + AS SELECT: Oracle, DB2
Vendors:
Oracle: see
here. Inconsistent. WITH READ ONLY aligns with AS.
DB2: see
here.
SQL Server: see
here. Consistent. WITH CHECK OPTION aligns with AS.
Blogs:
Joe Celko,
SQL for Smarties: Advanced SQL Programming shows
WITH CHECK OPTION is level-1 indent from CREATE.
Sharad Maheshwari and Ruchin Jain,
«DBMS – Complete Practical Approach»
say: «Everything after the AS keyword must be enclosed in parentheses.»
The
Teradata manual has
CREATE VIEW dept AS SELECT deptno(TITLE 'Department Number'), deptname(TITLE 'Department Name'), loc (TITLE 'Department Location'), mgrno(TITLE 'Manager Number') FROM department;
which is consistent with «AS + newline + fixed-indent + SELECT».
Format CREATE PROCEDURE or CREATE FUNCTION
Choice: indent first line of routine?
Choice: AS as in CREATE VIEW?
«indent first line of routine» is a Choice
because I have seen it occasionally.
But zero prescriptive guides recommend it and zero vendor examples show it.
I believe that there is an influence from other languages that have:
function_name_definition () { statements; }
The equivalent of { … } is BEGIN … END, so:
create_routine_statement_start () BEGIN ... END;
I call that «the usual way».
The parameter list might be followed by a descriptive phrase,
for which I know of no rule.
I have no guidance for whether LANGUAGE or other characteristics phrases are clause starts.
For «AS as in CREATE VIEW» I chose not to repeat all
the Choices in Format CREATE VIEW. I have assumed that,
whatever you like for newlines and indenting there,
you will like here too. (This applies only for the
dialects that have CREATE … AS … or CREATE … IS …).
Vendor examples:
Oracle:
the usual way.
DB2:
inconsistent.
SQL Server:
the usual way, with newlines before and after AS.
MySQL:
the usual way.
MariaDB:
inconsistent.
Bloggers:
Factor: (in Chapter 1 «Writing Readable SQL», Listing 1-7)
shows the usual way, with newlines before and after AS.
Format CREATE TRIGGER
Choice: change create-trigger clause to newline + indent+0 + create-trigger clause?
This is not really different from the Choice for
Format CREATE PROCEDURE or CREATE FUNCTION.
We have an initial CREATE … object_name, some modifiers, and then a triggered-action
statement which is usually a compound BEGIN/END statement.
According to our guidance via the SQL standard BNF, ON should be treated as a clause start.
However, the only prescriptive guide that has a CREATE TRIGGER example (Salvisberg) shows otherwise.
And the vendor manual examples are inconsistent.
Prescriptive guides:
Salvisberg shows: change create-trigger clause to newline + indent+0 + create-trigger clause = yes:
CREATE OR REPLACE TRIGGER dept_br_u BEFORE UPDATE ON departments FOR EACH ROW BEGIN END;/
Vendor manual examples:
Oracle shows BEFORE and ON as clauses. So is REFERENCING. So is FOR EACH ROW.
DB2 shows AFTER and FOR EACH ROW and WHEN are clauses. Indents the BEGIN/END! Sometimes indents twice!
SQL Server shows ON and FOR are clauses.
Mysql shows
BEFORE is BEFORE. FOR EACH ROW is newline + indent+0 + FOR EACH ROW.
Or
BEFORE is newline + indent+1 + BEFORE. FOR EACH ROW is newline + indent+1 + FOR EACH ROW.
inconsistent.
MariaDB shows indent+1 for AFTER, indent+2 for UPDATE.
Tarantool shows BEFORE and FOR EACH ROW are not indented. BEGIN is indented!
Format CASE expression
Choice: change WHEN/END to newline + indent+0 + WHEN/END (i.e. they left-align with CASE)?
Choice: change WHEN/END to newline + indent+1 + WHEN/END?
Choice: change THEN to THEN, or to newline + indent+2 + THEN?
Choice: Change END to END, or to newline + indent+0 + END?
The indenting is relative to CASE.
CASE … END is analogous to a statement.
So whenever a prescriptive guide says indent=yes or left-align=yes for a statement,
I would expect it to say something similar for CASE … END.
So in a consistent world there would be no need for this «Format CASE expressions»
section, but I have to add it because prescriptive guides and formatters have
options for it.
If there is an AS alias in a select list, it follows END on the same line.
I have not seen suggestions that a series of
WHEN … THEN … should be treated as a «list» and therefore
be aligned as in Format lists. I have, however, seen
a suggestion
that CASE expressions should be treated as items in a list, for example
SELECT '' AS a, CASE WHEN END AS b, '' AS c ...
This could be stated for subqueries too.
Prescriptive guides:
Taranov shows: change WHEN to WHEN, change THEN to newline + indent+0 (align with WHEN), END aligns with CASE
CASE WHEN ... THEN ... WHEN ... THEN ... END
Holywell shows: change WHEN/END to newline + indent+0 + WHEN/END = yes: and THEN on the same line
CASE ... WHEN ... THEN ... WHEN ... THEN ... END
Benenson says (re CASE statements) «try to align WHEN, THEN, and ELSE together inside CASE and END»
CASE WHEN ... THEN ... ELSE ... END
Not quite like Taranov because ELSE is elsewhere.
Mazur, Salvisberg show: change WHEN/ELSE to newline + indent+1 + WHEN/ELSE = yes: and THEN on the same line
CASE ... WHEN ... THEN ... WHEN ... THEN ... ELSE ... END
Salvisberg adds «Try to use CASE rather than an IF statement with multiple ELSIF paths.»
Mazur adds «Each when
should be on its own line (nothing on the case line)
and should be indented one level deeper than the case line.
The then
can be on the same line or on its own line below it, just aim to be consistent.»
Vendor examples:
Oracle: inconsistent
DB2: same as Mazur, Salvisberg
SQL Server: same as Mazur, Salvisberg
MySQL: inconsistent
MariaDB: single line
Tarantool: single line
Oracle SQL Developer has separate options for indent of CASE, WHEN, THEN, and AND/OR.
Bloggers:
Toad formatter has an option «Position THEN on the same line».
Factor shows: change WHEN/ELSE to newline + indent+1 + WHEN/ELSE = yes,
change THEN to newline + indent+2 + THEN
CASE ... WHEN ... THEN ... WHEN ... THEN ... END
SQL Prompt has options for placing WHEN / ELSE / THEN on new lines, and for aligning ELSE with WHEN.
Holywell example:
SELECT CASE postcode WHEN 'BN1' THEN 'Brighton' WHEN 'EH1' THEN 'Edinburgh' END AS city FROM office_locations WHERE country = 'United Kingdom' AND opening_time BETWEEN 8 AND 9 AND postcode IN ('EH1', 'BN1', 'NN1', 'KW1');
… It looks like «right align word» but then there is no river. Or, there are two rivers.
Format blocks the usual way
Choice: Format blocks the usual way?
For this purpose a «block» is a group of statements enclosed by a «control statement».
In standard SQL (SQL/PSM) the control statements that enclose statements are:
BEGIN … END, CASE … END CASE, IF … END IF, LOOP … END LOOP, WHILE … END WHILE,
REPEAT … END REPEAT, FOR … END FOR.
Naturally in Oracle or Oracle imitators (PL/SQL) the list differs, but
the effect of saying «yes» is the same:
Any statements between the verb and the END are indented N spaces.
The END aligns with the verb.
WHEN and ELSE are treated as control verbs.
IF condition THEN and WHEN condition THEN are single lines.
Example (with indent-amount = 2):
BEGIN sql-statement; LOOP sql-statement; IF condition THEN sql-statement; ELSE sql-statement; END IF; END LOOP; END;
That is what I have seen in every source that has a consistent set of examples.
Therefore I think it is okay to say that is the «usual» way.
Choice: labels on same line as control verb?
If labels on same line = yes, using an SQL/PSM label for the example:
BEGIN loop_label: LOOP sql-statement; END LOOP loop_label; END;
If labels on same line = no, using a PL/SQL label for the example:
BEGIN <<loop_label>> LOOP sql-statement; END LOOP loop_label; END;
Notice, in the first example, how END aligns with loop_label not with LOOP.
I do not know whether this is common, but it is in the MySQL manual.
Choice: FOR … LOOP is all one line: yes or no.
If all one line = yes:
FOR iteration LOOP
If all one line = no:
FOR iteration
LOOP
That is the only example that I could find of a difference between
Salvisberg’s recommendations and the Oracle manual.
Choice: a short IF … END IF; can go on one line: yes or no
This is about whether the «usual» form is followed rigidly for blocks,
even if for non-blocks it is okay to avoid «haiku».
Remember:
Modifiers of verbs go with the verb (see definition of «phrase»), so BEGIN ATOMIC has ATOMIC on the same line as BEGIN.
And DECLARE EXIT HANDLER FOR NOT FOUND is a single line.
Prescriptive guides:
Most of the prescriptive guides do not have anything to say about control statements.
Salvisberg says: indent blocks the usual way: yes. labels on same line: no. FOR … LOOP is all one line: no.
Salvisberg also advises: avoid goto because there is no indenting that can indicate its effect on the flow,
and always have labels for loops. Example:
<<while_loop>> WHILE (i <= co_max_value) LOOP i := i + co_increment; END LOOP while_loop;
… So different line for LOOP, and label has its own line.
Vendor manual examples:
Oracle:
indent blocks the usual way: yes. labels on same line: no. FOR … LOOP is all one line: yes.
a short IF … END IF; can go one line: no
DB2:
THEN is on same line as IF: yes.
a short IF … END IF; can go one line: no
SQL Server:
the examples given on this page are so wildly inconsistent that we’d have to say: there is no rule.
a short IF … END IF; can go one line: no
MySQL:
indent blocks the usual way: yes. labels on same line: yes. FOR … LOOP is all one line: n/a.
a short IF … END IF; can go one line: yes
MariaDB: not consistent, ignored
Oracle SQL Developer has an option «After statements» and one of the
options is «double break», that is, put two newlines after each statement.
Format blocks with analogies to other guides and choices
Choice: Format blocks as you do for Format lists?
The previous section was about formatting blocks the usual way.
The reason you might want indenting some other way would, I suppose, be:
«because that is what we do in [insert language name here]».
I will illustrate WHILE in SQL/PSM blocks but the same considerations
are applicable for other conditional statements and for PL/SQL.
Once again, we can do «Choice re-use».
Start with the six Choices at the beginning of section Format lists.
Substitute «statement» for «item», substitute «;» for «,», substitute «DO» for «(«, substitute «END» for «)».
Now we have:
Choice: Lists maximum number of statements per line = N?
Choice: Lists start with: space, or newline + usual fixed indent + N levels, or newline + align past preceding word?
Choice: Lists ; i.e. end-of-line semicolon becomes ; + newline or newline + ; or newline + ; + space?
Choice: Lists have similar parts align?
Choice: Lists DO i.e. initial left parenthesis becomes DO or newline + DO or newline + DO + newline?
Choice: Lists END i.e. final right parenthesis becomes END or newline + END?
For example, if
maximum number of statements per line = 1,
lists start with usual fixed indent + 1 level,
end-of-line semicolon becomes ; + newline,
similar parts align = no,
DO becomes newline + DO + newline,
END becomes END,
we get:
WHILE condition DO
statement;
statement; END WHILE;
Now that we are talking about SQL as a procedural language,
it is more legitimate to compare with indentation styles in other languages.
The functional equivalent for SQL block begin … end is usually { … } braces.
So it is useful to look at the Wikipedia article
Indentation style
section
«Brace placement in compound statements».
The following is an SQL variation of the table in that section.
I am assuming indentation with 4 spaces, and space-before-semicolon = no.
DO ... END placement Style -------------------- ----- WHILE condition DO K&R and variants statement; statement; END WHILE; WHILE condition Allman DO statement; statement; END WHILE; WHILE condition GNU DO statements END WHILE; WHILE condition Whitesmiths DO statements END WHILE; WHILE condition Horstmann + Pico DO statement; statement; END WHILE; WHILE condition DO Ratliff statement; statement; END WHILE; WHILE condition Lisp DO statement; statement; END WHILE; WHILE condition Haskell DO statement ; statement ; END WHILE
Format DECLARE
Choices: none needed
As with INSERT UPDATE etc., the Choices are already made, there is
nothing to do now except give an example. This one is
from Taranov, who believes definitions should be aligned
(as in Format CREATE TABLE), and data types should be lower case
(as in Data types).
DECLARE @myGoodVarchareVariable varchar(50);
DECLARE @myGoodNVarchareVariable nvarchar(90);
DECLARE @myGoodCharVariable char(7);
DECLARE @myGoodNCharVariable nchar(10);
If there was a default then it would not cause a newline,
because DEFAULT was not established to be a clause start
according to any of the criteria shown in previous sections.
If there was a multiple-column declaration then
the Choice that would apply is as in Format lists.
Format overflow
Choice: if overflow ignore Maximum line length, or word wrap, or format = no.
An «overflow» occurs when a line of SQL text can go past the Right margin.
The previous sections have shown many ways to format an SQL statement
onto multiple lines. Therefore overflow should be rare.
But — hopefully very rarely — a single clause or a single list item will
cause overflow past the maximum length. What then?
Setting «ignore Maximum line length» means «go ahead and overflow».
It is not a big deal if the line is on a screen, just result will be
a scroll bar or a return to the leftmost column position.
Setting «word wrap» means what in CSS would be called
word-wrap: normal
or in my terminology it means «if the next word would overflow, newline and indent 0».
Setting «format = no», that is, «do not try to format a statement that would
cause overflow», might help readability if the original format contained newlines.
Since «format = no» implies that you will have to figure out something readable by yourself,
the
Java code conventions
might help because the «general principles» are applicable to SQL if the too-long line is due to a single expression. They are:
«When an expression will not fit on a single line, break it according to these general principles:
•Break after a comma.
•Break before an operator.
•Prefer higher-level breaks to lower-level breaks.
•Align the new line with the beginning of the expression at the same level on the previous line.
•If the above rules lead to confusing code or to code that’s squished up against the right margin, just indent 8 spaces instead.»
Bloggers:
Commonwealth of Pennsylvania Department of Public Welfare,
«Oracle PLSQL Coding and Naming Standards»,
says «It is allowed to overshoot right margin if code cannot be accommodated in 78 characters.»
Formatters or pretty printers
If you have a good GUI client, it will add colour highlighting on your screen.
If you have a good formatter, it will automatically take care of some of the Choices that I raised here.
If your group uses Toad, SQL Developer, SQL Server Management Studio,
SQLinForm, etc., just accept what they offer.
Tao Klerks and Steve Culshaw have come up with a github.com page
«SQL Formatter comparisons»
which lists more formatters than I dreamed existed, and some of them are
open source, and some of them are online.
Google «sql lint» and you will see a few more products that will advise but not change.
Bloggers:
Lester I. McCann (writing about C)
says:
«What about programs that will automatically indent source code for you?
Such «crutches» do exist, but I discourage their use on code that is being written from scratch
because you are likely to develop bad habits if you begin to rely on the program to fix your
mistakes for you.»
Contributors
The final section of this guide is a copy of the GNU General Public License (GPL).
It is possible to use GPL for other than software, as explained at
the gnu.org FAQ.
FLOSS manuals are under GPL.
So is this document.
All references to «this program» or «software» mean «this guide», the Descriptive SQL Style Guide.
GPL Version 2 License
GNU GENERAL PUBLIC LICENSE Version 2, June 1991
Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users. This
General Public License applies to most of the Free Software
Foundation's software and to any other program whose authors commit to
using it. (Some other Free Software Foundation software is covered by
the GNU Lesser General Public License instead.) You can apply it to
your programs, too.When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.For example, if you distribute copies of such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have. You must make sure that they, too, receive or can get the
source code. And you must show them these terms so they know their
rights.We protect your rights with two steps: (1) copyright the software, and
(2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software.Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software. If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original
authors' reputations.Finally, any free program is threatened constantly by software
patents. We wish to avoid the danger that redistributors of a free
program will individually obtain patent licenses, in effect making the
program proprietary. To prevent this, we have made it clear that any
patent must be licensed for everyone's free use or not licensed at all.The precise terms and conditions for copying, distribution and
modification follow.GNU GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
- This License applies to any program or other work which contains
a notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License. The "Program", below,
refers to any such program or work, and a "work based on the Program"
means either the Program or any derivative work under copyright law:
that is to say, a work containing the Program or a portion of it,
either verbatim or with modifications and/or translated into another
language. (Hereinafter, translation is included without limitation in
the term "modification".) Each licensee is addressed as "you".Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of
running the Program is not restricted, and the output from the Program
is covered only if its contents constitute a work based on the
Program (independent of having been made by running the Program).
Whether that is true depends on what the Program does.
- You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the
notices that refer to this License and to the absence of any warranty;
and give any other recipients of the Program a copy of this License
along with the Program.You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee.
- You may modify your copy or copies of the Program or any portion
of it, thus forming a work based on the Program, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
- You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following:a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for
making modifications to it. For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable. However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.If distribution of executable or object code is made by offering
access to copy from a designated place, then offering equivalent
access to copy the source code from the same place counts as
distribution of the source code, even though third parties are not
compelled to copy the source along with the object code.
You may not copy, modify, sublicense, or distribute the Program
except as expressly provided under this License. Any attempt
otherwise to copy, modify, sublicense or distribute the Program is
void, and will automatically terminate your rights under this License.
However, parties who have received copies, or rights, from you under
this License will not have their licenses terminated so long as such
parties remain in full compliance.You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Program or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Program (or any work based on the
Program), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Program or works based on it.Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the
original licensor to copy, distribute or modify the Program subject to
these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to
this License.If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Program at all. For example, if a patent
license would not permit royalty-free redistribution of the Program by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
If the distribution and/or use of the Program is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Program under this License
may add an explicit geographical distribution limitation excluding
those countries, so that distribution is permitted only in or among
countries not thus excluded. In such case, this License incorporates
the limitation as if written in the body of this License.The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.Each version is given a distinguishing version number. If the Program
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free
Software Foundation. If the Program does not specify a version number of
this License, you may choose any version ever published by the Free Software
Foundation.
If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author
to ask for permission. For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes
make exceptions for this. Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.<one line to give the program's name and a brief idea of what it does.> Copyright (C) <year> <name of author> This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
Also add information on how to contact you by electronic and paper mail.
If the program is interactive, make it output a short notice like this
when it starts in an interactive mode:Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details.
The hypothetical commands
show w' and
show c' should show the appropriate
parts of the General Public License. Of course, the commands you use may
be called something other thanshow w' and
show c'; they could even be
mouse-clicks or menu items--whatever suits your program.You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the program, if
necessary. Here is a sample; alter the names:Yoyodyne, Inc., hereby disclaims all copyright interest in the program
`Gnomovision' (which makes passes at compilers) written by James Hacker., 1 April 1989
Ty Coon, President of ViceThis General Public License does not permit incorporating your program into
proprietary programs. If your program is a subroutine library, you may
consider it more useful to permit linking proprietary applications with the
library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License.
Рекомендации, описанные в этом руководстве, во многом пересекаются с описанными в книге Джо Селко «Стиль программирования Джо Селко на SQL» (оригинал: SQL Programming Style). Это, в частности, найдут полезным те, кто уже знаком с этой книгой. Тем не менее автор этого руководства в некоторых аспектах более категоричен, нежели Джо Селко, а в других, напротив, более гибок. И, конечно, нельзя не отметить, что это руководство значительно короче и лаконичнее книги Селко — здесь вы не встретите ни весёлых историй из жизни, наглядно объясняющих, как и почему лучше не делать, ни длинных повествований, мотивирующих на использование той или иной рекомендации.
Руководство написано в формате Markdown, что позволяет легко включить его в проект или просто сослаться на него оттуда, что гораздо удобнее, нежели работать с большой бумажной книгой.
Основные положения
Хороший стиль
- Идентификаторы и имена. Осмысленные и в едином стиле.
- Пробелы и отступы. Логично расставленные для лучшей читаемости кода.
- Дата и время. Соответствующие стандарту ISO 8601:
YYYY-MM-DD HH:MM:SS.SSSSS
. - Функции SQL. Стандартные вместо специфичных (определяемых поставщиком) с целью лучшей переносимости.
- Код. Лаконичный и без излишеств, как например: ненужные кавычки или скобки или неуместное использование оператора
WHERE
. - Комментарии. Предпочтительно в стиле C —
/*
(начало) и*/
(конец). Либо--
перед комментарием, тогда окончанием будет новая строка.
SELECT file_hash -- stored ssdeep hash
FROM file_system
WHERE file_name = '.vimrc';
/* Updating the file record after writing to the file */
UPDATE file_system
SET file_modified_date = '1980-02-22 13:19:01.00000',
file_size = 209732
WHERE file_name = '.vimrc';
Плохой стиль
- CamelCase. Неудобочитаем.
- Префиксы и венгерская нотация. Префиксы наподобие
sp_
илиtbl_
избыточны. - Множественное число. Лучше использовать более естественно звучащие собирательные понятия. Например,
staff
вместоemployees
илиpeople
вместоindividuals
. - Идентификаторы в кавычках. Если они обязательно нужны, тогда используйте двойные кавычки, определённые в стандарте SQL-92 с целью лучшей переносимости в дальнейшем.
- Принципы объектно-ориентированного проектирования. Не нужно применять к SQL или структуре базы данных.
Соглашения о наименовании
Общее
- Убедитесь в том, что имя уникально и его нет в списке зарезервированных ключевых слов.
- Ограничивайте длину имени 30 байтами (это 30 символов, если не используется многобайтный набор символов).
- Начинайте имена с буквы и не заканчивайте их символом подчёркивания.
- Используйте в именах только буквы, цифры и символ подчёркивания.
- Избегайте нескольких подряд идущих символов подчёркивания.
- Используйте символ подчёркивания там, где вы бы поставили пробел в реальной жизни (например,
first name
станетfirst_name
). - Избегайте сокращений. Если их всё же нужно использовать, убедитесь в том, что они общепонятны.
SELECT first_name
FROM staff;
Таблицы
- Используйте собирательные имена или, что менее предпочтительно, форму множественного числа. Например,
staff
иemployees
(в порядке убывания предпочтения). - Не используйте описательные префиксы вида
tbl_
и венгерскую нотацию в целом. - Не допускайте совпадений названия таблицы с названием любого из её столбцов.
- По возможности избегайте объединения названий двух таблиц для построения таблицы отношений. Например, вместо названия
cars_mechanics
лучше подойдётservices
.
Столбцы
- Названия всегда давайте в единственном числе.
- По возможности не используйте
id
в качестве первичного идентификатора таблицы. - Не создавайте в таблице столбцов с таким же названием, как у неё самой.
- Названия всегда пишите со строчной буквы. Могут быть исключения, например использование имени собственного.
Псевдонимы/корреляции
- Должны так или иначе быть связаны с объектами или выражениями, псевдонимом которых они являются.
- Имя корреляции обычно составляется из первых букв каждого слова в имени объекта.
- Добавьте цифру к имени, если такое уже существует.
- Всегда используйте ключевое слово
AS
для лучшей читаемости. - Для вычислимых данных (
SUM()
илиAVG()
) используйте такие имена, которые вы бы дали, будь они столбцами в таблице.
SELECT first_name AS fn
FROM staff AS s1
JOIN students AS s2
ON s2.mentor_id = s1.staff_num;
SELECT SUM(s.monitor_tally) AS monitor_total
FROM staff AS s;
Хранимые процедуры
- Имя должно содержать глагол.
- Не используйте описательные префиксы вида
sp_
и венгерскую нотацию в целом.
Универсальные суффиксы
Приведённые ниже суффиксы универсальны, что гарантирует простоту понимания значения столбцов из кода SQL.
_id
— уникальный идентификатор, например первичный ключ._status
— флаг или любой статус, напримерpublication_status
._total
— общее количество или сумма значений._num
— поле, содержащее число._name
— любое имя, напримерfirst_name
._seq
— непрерывная последовательность значений._date
— колонка, содержащая дату._tally
— счётчик._size
— размер или величина чего-либо, например размер файла._addr
— физический или абстрактный адрес, напримерip_addr
.
Синтаксис запросов
Зарезервированные слова
Зарезервированные ключевые слова всегда пишите прописными буквами, например SELECT
, WHERE
.
Не используйте сокращённый вариант ключевого слова, если имеется полный. Например, используйте ABSOLUTE
вместо ABS
.
Не используйте специфичные для какого-либо поставщика СУБД ключевые слова, если в ANSI SQL есть ключевые слова, выполняющие такие же функции. Это сделает ваш код более переносимым.
SELECT model_num
FROM phones AS p
WHERE p.release_date > '2014-09-30';
Пробельные символы
Для лучшей удобочитаемости кода важно правильно использовать пробельные символы. Не нужно нагромождать код или удалять пробелы, присущие естественному языку.
Пробелы
Можно и нужно использовать пробелы для выравнивания основных ключевых слов по их правому краю. В типографике получающиеся таким образом «коридоры» стараются избегать, в то же время в нашем случае они, напротив, помогают лучше вычленять важные ключевые слова.
(SELECT f.species_name,
AVG(f.height) AS average_height, AVG(f.diameter) AS average_diameter
FROM flora AS f
WHERE f.species_name = 'Banksia'
OR f.species_name = 'Sheoak'
OR f.species_name = 'Wattle'
GROUP BY f.species_name, f.observation_date)
UNION ALL
(SELECT b.species_name,
AVG(b.height) AS average_height, AVG(b.diameter) AS average_diameter
FROM botanic_garden_flora AS b
WHERE b.species_name = 'Banksia'
OR b.species_name = 'Sheoak'
OR b.species_name = 'Wattle'
GROUP BY b.species_name, b.observation_date);
Обратите внимание, что ключевые слова SELECT
, FROM
и т.д. выровнены по правому краю, при этом названия столбцов и различные условия — по левому.
Помимо этого, старайтесь расставлять пробелы:
- до и после знака равно (
=
) - после запятых (
,
) - до открывающего и после закрывающего апострофов (
'
), если последний не внутри скобок, или без последующих запятой или точки с запятой, или не в конце строки
SELECT a.title, a.release_date, a.recording_date
FROM albums AS a
WHERE a.title = 'Charcoal Lane'
OR a.title = 'The New Danger';
Переводы строки
Всегда делайте перенос строки:
- перед
AND
илиOR
- после точки с запятой (для разделения запросов)
- после каждого основного ключевого слова
- после запятой (при выделении логических групп столбцов)
Следуя принципу, что ключевые слова выравниваются по правому краю, а всё остальное — по левому, мы добиваемся достаточно удобного расположения частей кода, вследствие чего улучшается зрительная навигация по нему.
INSERT INTO albums (title, release_date, recording_date)
VALUES ('Charcoal Lane', '1990-01-01 01:01:01.00000', '1990-01-01 01:01:01.00000'),
('The New Danger', '2008-01-01 01:01:01.00000', '1990-01-01 01:01:01.00000');
UPDATE albums
SET release_date = '1990-01-01 01:01:01.00000'
WHERE title = 'The New Danger';
SELECT a.title,
a.release_date, a.recording_date, a.production_date -- grouped dates together
FROM albums AS a
WHERE a.title = 'Charcoal Lane'
OR a.title = 'The New Danger';
Отступы
Для того, чтобы SQL был удобочитаем, важно также следовать стандартам расстановки отступов.
JOIN
Объединения (JOIN
) должны располагаться по правую часть «коридора». При необходимости между ними можно добавить пустую строку.
SELECT r.last_name
FROM riders AS r
INNER JOIN bikes AS b
ON r.bike_vin_num = b.vin_num
AND b.engine_tally > 2
INNER JOIN crew AS c
ON r.crew_chief_last_name = c.last_name
AND c.chief = 'Y';
Подзапросы
Подзапросы тоже должны быть выровнены по правому краю «коридора», а внутри них самих применяются те же правила форматирования, что и в любом другом запросе. Если используются вложенные подзапросы, может иметь смысл поставить закрывающую скобку на новой строке ровно под парной ей открывающей скобкой.
SELECT r.last_name,
(SELECT MAX(YEAR(championship_date))
FROM champions AS c
WHERE c.last_name = r.last_name
AND c.confirmed = 'Y') AS last_championship_year
FROM riders AS r
WHERE r.last_name IN
(SELECT c.last_name
FROM champions AS c
WHERE YEAR(championship_date) > '2008'
AND c.confirmed = 'Y');
Формальные тонкости
- Используйте
BETWEEN
, где возможно, вместо нагромождения условийAND
. - Таким же образом старайтесь использовать
IN()
вместоOR
. - Используйте
CASE
, если значение должно быть интерпретировано до окончания выполнения запроса. С помощьюCASE
можно также формировать сложные логические структуры. - По возможности избегайте использования
UNION
и временных таблиц.
SELECT CASE postcode
WHEN 'BN1' THEN 'Brighton'
WHEN 'EH1' THEN 'Edinburgh'
END AS city
FROM office_locations
WHERE country = 'United Kingdom'
AND opening_time BETWEEN 8 AND 9
AND postcode IN ('EH1', 'BN1', 'NN1', 'KW1');
Синтаксис CREATE
При разработке схемы данных важно создавать человекочитаемый код. Убедитесь в том, что объявления столбцов логически структурированы и сгруппированы.
Внутри объявления CREATE
делайте отступ, равный 4 пробелам.
Типы данных
- По возможности не используйте специфичные для той или иной СУБД типы данных. Это может негативно сказаться на переносимости, а также этих типов может не оказаться в старых версиях этих же СУБД.
- Для работы с плавающей точкой используйте только
REAL
илиFLOAT
, но где нет необходимости в подобных вычислениях, всегда используйтеNUMERIC
иDECIMAL
. Ошибки округления в операциях с плавающей точкой могут оказаться очень некстати.
Значения по умолчанию
- Значение по умолчанию всегда должно совпадать по типу со столбцом. Если, скажем, столбец объявлен как
DECIMAL
, не нужно в качестве умолчания указывать значение типаINTEGER
. - Значения по умолчанию должны располагаться после объявления типа столбца и перед пометкой
NOT NULL
.
Ограничения и ключи
Ограничения и их подмножество, ключи, — важная часть любой структуры базы данных, поэтому важно следовать стандартам их объявления, чтобы избежать трудностей в последующей поддержке написанного.
Ключи
Выбор столбцов, которые будут играть роль ключей, должен быть обоснован и предельно выверен, поскольку от них напрямую зависит производительность и целостность данных.
- Ключ должен быть в какой-то степени уникальным.
- Должна быть согласованность по типу данных для значения во всей схеме, а также чем ниже вероятность того, что это изменится в будущем, тем лучше.
- Можно ли проверить значение на соответствие стандарту (например, ISO)?
- Ключ должен быть как можно проще, чтобы можно было без трудностей использовать составные ключи.
Это своего рода конвенции, которые нужно сформулировать при проектировании базы данных. Если требования впоследствии будут разрастаться, можно и нужно вносить изменения в структуру базы, чтобы поддерживать её в актуальном состоянии.
Ограничения
Как только решено, какие ключи должны использоваться, нужно определить их в базе с помощью ограничений наряду с валидацией значений полей.
Общее
- У каждой таблицы должен быть хотя бы один ключ.
- Ограничениям нужно присваивать вразумительные имена. Для
UNIQUE
,PRIMARY KEY
иFOREIGN KEY
подобные имена создаются автоматически, поэтому нужно позаботиться об остальных ограничениях.
Расположение и порядок
- Первичный ключ должен быть объявлен в самом начале, сразу после оператора
CREATE TABLE
. - Ограничения должны быть объявлены строго ниже столбца, с которым они связаны. Расставьте отступы так, чтобы объявление ограничения начиналось после названия столбца.
- В случае ограничений, затрагивающих несколько столбцов, старайтесь объявлять их как можно ближе к описанию последнего из них. В крайнем случае объявляйте ограничение в конце тела
CREATE TABLE
. - Ограничения целостности уровня таблицы должны располагаться в конце.
- Используйте алфавитный порядок там, где
ON DELETE
предшествуетON UPDATE
. - Внутри запроса можно выравнивать каждый уровень по-своему. Например, можно добавить отступы после названия столбцов, чтобы типы данных начинались с одной позиции, а затем ещё добавить отступов в нужном количестве, чтобы все объявления
NOT NULL
тоже были выровнены по левому краю. Подобное форматирование позволит быстрее ориентироваться в коде.
Валидация
- Используйте
LIKE
иSIMILAR TO
для обеспечения целостности строк с известным форматом. - Если диапазон числовых значений для столбца известен, используйте
CHECK()
для предотвращения внесения в базу некорректных данных или скрытого отсечения части значения слишком больших данных. Обычно проверка делается на то, что значение больше нуля. CHECK()
должен быть объявлен как отдельное ограничение для упрощения последующей отладки.
Пример
CREATE TABLE staff (
PRIMARY KEY (staff_num),
staff_num INT(5) NOT NULL,
first_name VARCHAR(100) NOT NULL,
pens_in_drawer INT(2) NOT NULL,
CONSTRAINT pens_in_drawer_range
CHECK(pens_in_drawer BETWEEN 1 AND 99)
);
Чего следует избегать
- Не применяйте объектно-ориентированные принципы, поскольку они далеко не всегда оптимально ложатся на реляционную модель баз данных.
- Не разносите по разным столбцам значения и единицы измерения. Нужно создавать столбцы так, чтобы единицы измерения были чем-то самим собой разумеющимся. Для проверки корректности вставляемых в столбец данных используйте
CHECK()
. - Избегайте паттерна EAV (Entity Attribute Value). Вместо него используйте специальные продукты, предназначенные для работы с неструктурированными данными.
- Не разбивайте данные, логически принадлежащие одной таблице, по разным таблицам на основании условностей, например архивации по времени или географическим атрибутам. Впоследствии для работы с несколькими подобными таблицам придётся часто использовать
UNION
вместо простых запросов к одной таблице.
Public user contributions licensed under
cc-wiki license with attribution required
Mozilla Data Documentation
SQL Style Guide
Table of Contents
- Consistency
- Reserved Words
- Variable Names
- Be Explicit
- Aliasing
- Joins
- Grouping Columns
- Left Align Root Keywords
- Code Blocks
- Parentheses
- Boolean at the Beginning of Line
- Nested Queries
- About this Document
Consistency
From Pep8:
A style guide is about consistency.
Consistency with this style guide is important.
Consistency within a project is more important.
Consistency within one module or function is the most important.However, know when to be inconsistent —
sometimes style guide recommendations just aren’t applicable.
When in doubt, use your best judgment.
Look at other examples and decide what looks best.
And don’t hesitate to ask!
Reserved Words
Always use uppercase for reserved keywords like SELECT
, WHERE
, or AS
.
Variable Names
- Use consistent and descriptive identifiers and names.
- Use lower case names with underscores, such as
first_name
.
Do not use CamelCase. - Functions, such as
cardinality
,approx_distinct
, orsubstr
,
are identifiers
and should be treated like variable names. - Names must begin with a letter and may not end in an underscore.
- Only use letters, numbers, and underscores in variable names.
Be Explicit
When choosing between explicit or implicit syntax, prefer explicit.
Aliasing
Always include the AS
keyword when aliasing a variable or table name,
it’s easier to read when explicit.
Good
SELECT
date(submission_timestamp) AS day
FROM
telemetry.main
LIMIT
10
Bad
SELECT
date(submission_timestamp) day
FROM
telemetry.main
LIMIT
10
Joins
Always include the JOIN
type rather than relying on the default join.
Good
-- BigQuery Standard SQL Syntax
SELECT
submission_date,
experiment.key AS experiment_id,
experiment.value AS experiment_branch,
count(*) AS count
FROM
telemetry.clients_daily
CROSS JOIN
UNNEST(experiments.key_value) AS experiment
WHERE
submission_date > '2019-07-01'
AND sample_id = '10'
GROUP BY
submission_date,
experiment_id,
experiment_branch
Bad
-- BigQuery Standard SQL Syntax
SELECT
submission_date,
experiment.key AS experiment_id,
experiment.value AS experiment_branch,
count(*) AS count
FROM
telemetry.clients_daily,
UNNEST(experiments.key_value) AS experiment -- Implicit JOIN
WHERE
submission_date > '2019-07-01'
AND sample_id = '10'
GROUP BY
1, 2, 3 -- Implicit grouping column names
Grouping Columns
In the previous example, implicit grouping columns were discouraged, but there are cases where it makes sense.
In some SQL flavors (such as Presto) grouping elements must refer to the expression before any aliasing is done. If you are grouping by a complex expression it may be desirable to use implicit grouping columns rather than repeating the expression.
Good
-- BigQuery SQL Syntax
SELECT
submission_date,
normalized_channel IN ('nightly', 'aurora', 'beta') AS is_prerelease,
count(*) AS count
FROM
telemetry.clients_daily
WHERE
submission_date > '2019-07-01'
GROUP BY
submission_date,
is_prerelease -- Grouping by aliases is supported in BigQuery
Good
-- Presto SQL Syntax
SELECT
submission_date,
normalized_channel IN ('nightly', 'aurora', 'beta') AS is_prerelease,
count(*) AS count
FROM
telemetry.clients_daily
WHERE
submission_date > '20190701'
GROUP BY
1, 2 -- Implicit grouping avoids repeating expressions
Bad
-- Presto SQL Syntax
SELECT
submission_date,
normalized_channel IN ('nightly', 'aurora', 'beta') AS is_prerelease,
count(*) AS count
FROM
telemetry.clients_daily
WHERE
submission_date > '20190701'
GROUP BY
submission_date,
normalized_channel IN ('nightly', 'aurora', 'beta')
Left Align Root Keywords
Root keywords should all start on the same character boundary.
This is counter to the common «rivers» pattern
described here.
Good:
SELECT
client_id,
submission_date
FROM
main_summary
WHERE
sample_id = '42'
AND submission_date > '20180101'
LIMIT
10
Bad:
SELECT client_id,
submission_date
FROM main_summary
WHERE sample_id = '42'
AND submission_date > '20180101'
Code Blocks
Root keywords should be on their own line.
For example:
Good:
SELECT
client_id,
submission_date
FROM
main_summary
WHERE
submission_date > '20180101'
AND sample_id = '42'
LIMIT
10
It’s acceptable to include an argument on the same line as the root keyword,
if there is exactly one argument.
Acceptable:
SELECT
client_id,
submission_date
FROM main_summary
WHERE
submission_date > '20180101'
AND sample_id = '42'
LIMIT 10
Do not include multiple arguments on one line.
Bad:
SELECT client_id, submission_date
FROM main_summary
WHERE
submission_date > '20180101'
AND sample_id = '42'
LIMIT 10
Bad
SELECT
client_id,
submission_date
FROM main_summary
WHERE submission_date > '20180101'
AND sample_id = '42'
LIMIT 10
Parentheses
If parentheses span multiple lines:
- The opening parenthesis should terminate the line.
- The closing parenthesis should be lined up under
the first character of the line that starts the multi-line construct. - The contents of the parentheses should be indented one level.
For example:
Good
WITH sample AS (
SELECT
client_id,
FROM
main_summary
WHERE
sample_id = '42'
)
Bad (Terminating parenthesis on shared line)
WITH sample AS (
SELECT
client_id,
FROM
main_summary
WHERE
sample_id = '42')
Bad (No indent)
WITH sample AS (
SELECT
client_id,
FROM
main_summary
WHERE
sample_id = '42'
)
Boolean at the Beginning of Line
AND
and OR
should always be at the beginning of the line.
For example:
Good
...
WHERE
submission_date > 20180101
AND sample_id = '42'
Bad
...
WHERE
submission_date > 20180101 AND
sample_id = '42'
Nested Queries
Do not use nested queries.
Instead, use common table expressions to improve readability.
Good:
WITH sample AS (
SELECT
client_id,
submission_date
FROM
main_summary
WHERE
sample_id = '42'
)
SELECT *
FROM sample
LIMIT 10
Bad:
SELECT *
FROM (
SELECT
client_id,
submission_date
FROM
main_summary
WHERE
sample_id = '42'
)
LIMIT 10
About this Document
This document was heavily influenced by https://www.sqlstyle.guide/
Changes to the style guide should be reviewed by at least one member of
both the Data Engineering team and the Data Science team.