CPFAQ: A Question Database, Part 2

Question Data

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

Last week, I created the first version of a database schema that will store the source content for creating the FAQ. The database has a record for each question, along with the number of people following that question, a set of tags to classify it, and a canonical question title that represents a potential FAQ entry.

This week, I’ve been working on a tool to import data into the question database.

« Continue »

CPFAQ: A Question Database

Database Schema

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

So far this year, I’ve been building tools that operate on text files in tab-separate value (TSV) format. The advantage of this format is that it’s easy to read from and write to in code, and it imports directly into Excel for manual processing. For example, last week I worked on a TSV file in which each line contains one Quora question title, link, and follower count. I extracted this information automatically using one of my tools. I then imported the TSV file into Excel so I could manually edit each row to add a canonical question and set of tags.

As I classify each question, I find it useful to review how previous questions are classified. That helps ensure that question classification is consistent. For example, I classified one question as follows:

I later came across the question What are some must-do problems on Codeforces?, which I thought should have the same classification. So I looked through the current list of classified questions to copy the classification decisions I made earlier.

If I only had a few questions classified, it would be easy enough to scan through the list and find a similar one. But as the list grows longer, that becomes impractical. So I decided this week that it’s time to upgrade my storage technology.

« Continue »

CPFAQ: Classifying Quora Questions

Sorting

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

On Quora, it’s common to see the same questions, or variations of the same questions, show up repeatedly. The competitive programming topic is no exception to this rule. Merging similar questions is an option, but it presents a couple of challenges:

A competitive programming FAQ that is independent of Quora can resolve these problems. If QCR doesn’t want questions to be merged, they can still be listed under the same question in the FAQ. And the FAQ can categorize questions in a way that makes it easy to find appropriate merge targets.

Using some of the data collection from previous weeks, I have an initial list of categorized questions to use as a starting point for the FAQ categories.

« Continue »

CPFAQ: Classifying Quora Topics

Competitive Programming Wiki

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

If you study the current Quora topic ontology for competitive programming, it’s clear that it needs some work. The six top-level topics don’t cover all of the subjects that people ask about. And many related topics aren’t even in the ontology, usually because people created them without selecting parent topics.

Here are two steps to create a better ontology: 1) Evaluate the related topics that already exist, and add each one to the appropriate place in the ontology, and 2) Write a clear description of what questions belong in each topic.

« Continue »