Red-Green-Code

Deliberate practice techniques for software developers

  • Home
  • About
  • Contact
  • Project 462
  • CP FAQ
  • Newsletter

CPFAQ: SELECT Queries

By Duncan Smith Apr 4 0

PIVOT query

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

Data in a relational database is often not arranged in a way that makes sense to the end user. Instead, it’s optimized to reduce duplication and improve query performance. So when it’s retrieved from the database, it needs to be converted into a more optimal format through a combination of SQL queries and application logic. As an example, I’ll use data from my question database.

Outer Joins

In this example, the user interface is a spreadsheet. I’m using a spreadsheet for editing since it saves the time required to build my own UI on top of the database. In the spreadsheet, each row contains a question and its associated properties. Some properties, like Link (URL) and Number of Followers are unique to each question. But other properties, like Canonical Question Title, contain values that can be the same for multiple questions (since multiple questions can ask about different aspects of a single canonical question).

In the spreadsheet, if two questions have the same canonical question title, it’s convenient to show the full canonical question title for each one. Since the two questions may appear in different parts of the spreadsheet, we don’t want to have to scroll around to find the canonical title.

But in a database, storing an identical title multiple times would be wasteful. Instead, I have a CanonicalQuestion table that stores the unique titles. Then each question record has a foreign key reference to the corresponding canonical question record. So returning questions and canonical questions just requires a join:

SELECT q.Title, q.Url, q.NumFollowers, cq.Title AS [CanonicalTitle]
FROM Question q
LEFT JOIN CanonicalQuestion cq ON cq.CanonicalQuestionId = q.CanonicalQuestionId

The LEFT JOIN (short for LEFT OUTER JOIN) returns all rows from the left table (the one mentioned first, Question) and matches them where possible with rows from the right table (the one mentioned second, CanonicalQuestion). If no match is possible, it shows NULL for CanonicalTitle.

This is the appropriate join to use in this case because canonical titles are created manually, so they won’t be available for every question. A NULL just means that a canonical title needs to be written or chosen for that question.

The same query could be written with RIGHT JOIN just by reversing the tables:

SELECT q.Title, q.Url, q.NumFollowers, cq.Title AS [CanonicalTitle]
FROM CanonicalQuestion cq
RIGHT JOIN Question q ON q.CanonicalQuestionId = cq.CanonicalQuestionId

But I find it simpler to stick with LEFT JOIN and order the tables accordingly.

Changing Rows to Columns with STUFF and PIVOT

For the one-to-many relationship between a single canonical question and multiple questions, it’s easy enough to join the two tables and have the canonical question title appear as a column in the result set. The many-to-many relationship between tags and questions is trickier.

To allow one question to have multiple tags, and one tag to be associated with multiple questions, I’m using an associative table, QuestionTag. The only data in this table is foreign key data: each row has a question ID and a tag ID, representing a single tag associated with a single question.

To retrieve all questions with their associated tags, it’s possible to use a join query like those in the previous section:

SELECT q.QuestionId, q.Title, t.Alias
FROM Question q
LEFT JOIN QuestionTag qt on qt.QuestionId = q.QuestionId
LEFT JOIN Tag t on t.TagId = qt.TagId

Each row in the result set for this query contains a question and an associated tag, or NULL if a question has no tags (which can happen since I’m adding tags manually, not importing them from Quora). The problem is that if a question has five tags, five rows are returned. This doesn’t match the design of the spreadsheet I’m trying to create, which has only one row per question.

Since my target spreadsheet has a column per tag, but my result set has a row per tag, I need a way to convert rows to columns. The Stack Overflow question Efficiently convert rows to columns in SQL server has some useful techniques.

The T-SQL STUFF command is used to insert a string into another string, or just concatenate strings. Here’s how it can be used to create a comma-separated list of tags for each question:

SELECT QuestionId, Alias =
    STUFF((SELECT ',' + Alias
    FROM Tag t2
    JOIN QuestionTag qt2 ON qt2.TagId = t2.TagId
    WHERE qt2.QuestionId = qt.QuestionId
    FOR XML PATH('')), 1, 1, '')
FROM QuestionTag qt
GROUP By QuestionId

FOR XML PATH converts the result of the subquery into a single value, so it can be assigned to the Alias column. Passing the zero-length string ('') means we don’t want any XML tags, just a regular string.

I could stop here and just parse the comma-separated list in C#, but there are a few more SQL tricks available in the Stack Overflow answer. Consider this adaption of the example from the answer:

select @cols = STUFF((SELECT ',' + QUOTENAME(t.Alias)
    FROM Question q
    CROSS JOIN Tag t
    LEFT JOIN QuestionTag qt ON qt.QuestionId = q.QuestionId
    AND qt.TagId = t.TagId
    WHERE q.QuestionId = @qid
    FOR XML PATH(''), TYPE
    ).value('.', 'NVARCHAR(MAX)') 
    ,1,1,'')

set @query = N'SELECT ' + @cols + N' from 
    (
    SELECT CASE q.Title WHEN NULL THEN NULL ELSE ''x'' END
        AS [Title], t.Alias
    From Question q
    JOIN QuestionTag qt on qt.QuestionId = q.QuestionId
    JOIN Tag t on t.TagId = qt.TagId
    WHERE q.QuestionId = ' + @qid + N'
) x
PIVOT 
(
    MAX(Title)
    for Alias in (' + @cols + N')
) p '

exec sp_executesql @query;

For a single question @qid, this produces the tag portion of the spreadsheet row: a column for each tag name, marked with an x if the tag applies to the question.

The idea is to use the STUFF command as described above to generate a comma-separated list of column names. Since T-SQL queries uses comma-separated lists in this format, the list can then be concatenated into a SELECT query and executed as dynamic SQL. The PIVOT command is used at the end to turn rows into columns.

The SQL way of doing things can seem complicated to developers who are used to thinking procedurally (i.e., most of them). But it can perform better than the equivalent procedural solution, assuming good database design and indexes.

Categories: CPFAQ

Prev
Next

Stay in the Know

I'm trying out the latest learning techniques on software development concepts, and writing about what works best. Sound interesting? Subscribe to my free newsletter to keep up to date. Learn More
Unsubscribing is easy, and I'll keep your email address private.

Getting Started

Are you new here? Check out my review posts for a tour of the archives:

  • Lessons from the 2020 LeetCode Monthly Challenges
  • 2019 in Review
  • Competitive Programming Frequently Asked Questions: 2018 In Review
  • What I Learned Working On Time Tortoise in 2017
  • 2016 in Review
  • 2015 in Review
  • 2015 Summer Review

Archives

Recent Posts

  • LeetCode 1022: Sum of Root To Leaf Binary Numbers January 27, 2021
  • LeetCode 1288: Remove Covered Intervals January 20, 2021
  • LeetCode 227: Basic Calculator II January 13, 2021
  • A Project for 2021 January 6, 2021
  • Lessons from the 2020 LeetCode Monthly Challenges December 30, 2020
  • Quora: Are Math Courses Useful for Competitive Programming? December 23, 2020
  • Quora: Are Take-Home Assignments a Good Interview Technique? December 17, 2020
  • Quora: Why Don’t Coding Interviews Test Job Skills? December 9, 2020
  • Quora: How Much Time Should it Take to Solve a LeetCode Hard Problem? December 2, 2020
  • Quora: Quantity vs. Quality on LeetCode November 25, 2020
Red-Green-Code
  • Home
  • About
  • Contact
  • Project 462
  • CP FAQ
  • Newsletter
Copyright © 2021 Duncan Smith