CPFAQ: Patterns in Question Titles

Question Titles

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

Now that I have a database of competitive programming questions, I thought it would be interesting to look for patterns in the way that questions are asked. This will be useful in writing canonical question titles, which will result in the set of questions included in the FAQ.

I briefly considered loading the list of questions into a text mining program, and even tried out a free online tool. But ultimately I decided that was overkill, so I wrote a simple program to find the words that are most frequently used to start the questions in my list.

« Continue »

CPFAQ: SELECT Queries

PIVOT query

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

Data in a relational database is often not arranged in a way that makes sense to the end user. Instead, it’s optimized to reduce duplication and improve query performance. So when it’s retrieved from the database, it needs to be converted into a more optimal format through a combination of SQL queries and application logic. As an example, I’ll use data from my question database.

« Continue »

CPFAQ: A Question Database, Part 2

Question Data

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

Last week, I created the first version of a database schema that will store the source content for creating the FAQ. The database has a record for each question, along with the number of people following that question, a set of tags to classify it, and a canonical question title that represents a potential FAQ entry.

This week, I’ve been working on a tool to import data into the question database.

« Continue »

CPFAQ: A Question Database

Database Schema

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

So far this year, I’ve been building tools that operate on text files in tab-separate value (TSV) format. The advantage of this format is that it’s easy to read from and write to in code, and it imports directly into Excel for manual processing. For example, last week I worked on a TSV file in which each line contains one Quora question title, link, and follower count. I extracted this information automatically using one of my tools. I then imported the TSV file into Excel so I could manually edit each row to add a canonical question and set of tags.

As I classify each question, I find it useful to review how previous questions are classified. That helps ensure that question classification is consistent. For example, I classified one question as follows:

I later came across the question What are some must-do problems on Codeforces?, which I thought should have the same classification. So I looked through the current list of classified questions to copy the classification decisions I made earlier.

If I only had a few questions classified, it would be easy enough to scan through the list and find a similar one. But as the list grows longer, that becomes impractical. So I decided this week that it’s time to upgrade my storage technology.

« Continue »

CPFAQ: Classifying Quora Questions

Sorting

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

On Quora, it’s common to see the same questions, or variations of the same questions, show up repeatedly. The competitive programming topic is no exception to this rule. Merging similar questions is an option, but it presents a couple of challenges:

A competitive programming FAQ that is independent of Quora can resolve these problems. If QCR doesn’t want questions to be merged, they can still be listed under the same question in the FAQ. And the FAQ can categorize questions in a way that makes it easy to find appropriate merge targets.

Using some of the data collection from previous weeks, I have an initial list of categorized questions to use as a starting point for the FAQ categories.

« Continue »

CPFAQ: Classifying Quora Topics

Competitive Programming Wiki

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

If you study the current Quora topic ontology for competitive programming, it’s clear that it needs some work. The six top-level topics don’t cover all of the subjects that people ask about. And many related topics aren’t even in the ontology, usually because people created them without selecting parent topics.

Here are two steps to create a better ontology: 1) Evaluate the related topics that already exist, and add each one to the appropriate place in the ontology, and 2) Write a clear description of what questions belong in each topic.

« Continue »

CPFAQ: Quora Topic Ontology

Quora Topic Ontology

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

This week, I published a post to the Quora Topic Gnomery blog on the subject of competitive programming topic cleanup. It generated some discussion about how competitive programming topics on Quora should be organized, and prompted me to look into the Quora Topic Ontology.

« Continue »

CPFAQ: Quora Topic Cleanup

Gnome

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

As part of the FAQ research process, I’m creating a canonical list of Quora topics related to competitive programming. Like many things on Quora, topics in this area are a bit messy. Or as the Quora Topic Gnomes say:

Quora’s Topics are a free-for-all, and that often creates greatness, but it really requires crowdsourced curation to make it all it can be.

« Continue »

CPFAQ: Collecting Quora Topics

Quora Related Topics

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

In recent weeks, I have been experimenting with ways to collect Quora questions, especially those that don’t appear in standard views like search engine results and the All Questions page. But I also want to make sure that the questions I collect from alternative sources are relevant, since I eventually need to manually evaluate the best questions for a FAQ. Last week I started filtering on the Competitive Programming topic tag, but I realized that this can filter out relevant questions. To see why that is, I’m investigating how Quora topics work.

« Continue »