CPFAQ: A Question Database, Part 2

Question Data

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

Last week, I created the first version of a database schema that will store the source content for creating the FAQ. The database has a record for each question, along with the number of people following that question, a set of tags to classify it, and a canonical question title that represents a potential FAQ entry.

This week, I’ve been working on a tool to import data into the question database.

Accessing a SQL Server Database Using C♯

Here’s a quick and dirty C# approach to retrieve data from a local SQL Server database, without using anything fancy like Entity Framework. This code creates a connection to the server, opens the connection, executes a SELECT query, and builds a list of objects containing the results.

using (var connection = 
  new SqlConnection("Server=(local);Database=Quora;Trusted_Connection=True;"))
{
    connection.Open();
    const string sql = "SELECT * FROM Tag";
    var cmd = new SqlCommand(sql, connection);

    using (var reader = cmd.ExecuteReader())
    {
        while (reader.Read())
        {
            var t = new Tag()
            {
                Alias = (string)reader["Alias"],
                Description = (string)reader["Description"],
                Name = (string)reader["Name"],
                TagId = (int)reader["TagId"]
            };

            Tags.Add((string)reader["Alias"], t);
        }
    }
}

To insert a record, I used this approach:

const string sql = "INSERT INTO Question (Title, Url, NumFollowers) " +
    "VALUES (@Title, @Url, @NumFollowers) SET @QuestionId = SCOPE_IDENTITY()";
var cmd = new SqlCommand(sql, connection);
cmd.Parameters.AddWithValue("@Title", title);
cmd.Parameters.AddWithValue("@Url", url);
cmd.Parameters.AddWithValue("@NumFollowers", numFollowers);
cmd.Parameters.AddWithValue("@CanonicalQuestionId", canonicalQuestionId);
cmd.Parameters.Add("@QuestionId", SqlDbType.Int, 4).Direction =
    ParameterDirection.Output;
cmd.ExecuteScalar();
var questionId = (int)cmd.Parameters["@QuestionId"].Value;

Notice that I’m getting the primary key for the new record as an output parameter.

Importing and Exporting Questions

Using the SqlCommand approach, I can import and export a list of questions and associated properties.

Import

As I read and classify questions, I keep track of them using a spreadsheet containing the question title, URL, number of followers (for sorting and prioritizing), and canonical title. Each of these properties is in a separate spreadsheet column. The first three are imported from source HTML, while the canonical title is entered manually. I also have a list of tags, one per column. I put an x in each tag column that applies to the question.

To import this spreadsheet into the database, I first manually insert a Tag record for each tag column in the spreadsheet, and then use the following algorithm for each spreadsheet row:

  • If the canonical question already exists, retrieve its ID. Otherwise, create it.
  • Insert the Question record, with its title, link, number of followers, and canonical question ID.
  • Insert a QuestionTag record for each associated tag.

Export

Although this process is designed to facilitate editing in a spreadsheet, I might find a scenario where it’s more convenient to make edits directly in the database (or from an application that connects to the database). To get the questions back into the spreadsheet, I’ll need to use the opposite of the process above, creating a row for each question and an x in each tag column.

Database Queries

The main advantage of having question data in the database is being able to use T-SQL queries to analyze the data. Here are a few examples.

SELECT AVG(NumFollowers) FROM Question

Calculate the average number of question followers

SELECT * FROM Question q
JOIN QuestionTag qt ON qt.QuestionId = q.QuestionId
JOIN Tag t ON t.TagId = qt.TagId
WHERE t.Name = 'algorithms-and-data-structures'

Return all questions with the algorithms and data structures tag

SELECT COUNT(*)
FROM Question
WHERE CanonicalQuestionId IS NULL

Count the number of questions that still need to be assigned a canonical question

SELECT *
FROM Question
WHERE Title LIKE '%TopCoder%'
ORDER BY NumFollowers DESC

Return all questions that mention TopCoder in the question title