Duncan Smith, Author at Red-Green-Code

CPFAQ: Patterns in Question Titles, Part 2

By Duncan Smith Leave a Comment Apr 21 0

Question Titles 2

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

Last week, I did some simple text mining to classify Quora Competitive Programming questions based on the first word (How, What, Why, etc.) of the question title. This week, I’m extending that a bit by looking at starting phrases containing 3-4 words each.

« Continue »

CPFAQ: Patterns in Question Titles

By Duncan Smith Leave a Comment Apr 14 0

Question Titles

Now that I have a database of competitive programming questions, I thought it would be interesting to look for patterns in the way that questions are asked. This will be useful in writing canonical question titles, which will result in the set of questions included in the FAQ.

I briefly considered loading the list of questions into a text mining program, and even tried out a free online tool. But ultimately I decided that was overkill, so I wrote a simple program to find the words that are most frequently used to start the questions in my list.

« Continue »

CPFAQ: SELECT Queries

By Duncan Smith Leave a Comment Apr 4 0

PIVOT query

Data in a relational database is often not arranged in a way that makes sense to the end user. Instead, it’s optimized to reduce duplication and improve query performance. So when it’s retrieved from the database, it needs to be converted into a more optimal format through a combination of SQL queries and application logic. As an example, I’ll use data from my question database.

« Continue »

CPFAQ: A Question Database, Part 2

By Duncan Smith Leave a Comment Mar 28 0

Question Data

Last week, I created the first version of a database schema that will store the source content for creating the FAQ. The database has a record for each question, along with the number of people following that question, a set of tags to classify it, and a canonical question title that represents a potential FAQ entry.

This week, I’ve been working on a tool to import data into the question database.

« Continue »

CPFAQ: A Question Database

By Duncan Smith Leave a Comment Mar 21 0

Database Schema

So far this year, I’ve been building tools that operate on text files in tab-separate value (TSV) format. The advantage of this format is that it’s easy to read from and write to in code, and it imports directly into Excel for manual processing. For example, last week I worked on a TSV file in which each line contains one Quora question title, link, and follower count. I extracted this information automatically using one of my tools. I then imported the TSV file into Excel so I could manually edit each row to add a canonical question and set of tags.

As I classify each question, I find it useful to review how previous questions are classified. That helps ensure that question classification is consistent. For example, I classified one question as follows:

Title: What are some good questions on CodeChef from which I will learn more algorithms?
Canonical title: What are some good competitive programming problems?
Tags: competitive-programming, specific-online-judge, competitive-programming-problems

I later came across the question What are some must-do problems on Codeforces?, which I thought should have the same classification. So I looked through the current list of classified questions to copy the classification decisions I made earlier.

If I only had a few questions classified, it would be easy enough to scan through the list and find a similar one. But as the list grows longer, that becomes impractical. So I decided this week that it’s time to upgrade my storage technology.

« Continue »

CPFAQ: Classifying Quora Questions

By Duncan Smith Leave a Comment Mar 14 0

Sorting

On Quora, it’s common to see the same questions, or variations of the same questions, show up repeatedly. The competitive programming topic is no exception to this rule. Merging similar questions is an option, but it presents a couple of challenges:

Quora Content Review (a human + bot system) has its own merging rules, and will unmerge questions that it thinks are insufficiently similar. For example, here’s a question with over 2000 followers and 60 answers: What basic data structures and algorithms should one learn before starting competitive programming? I tried to merge in a similar question, What are the data structures and algorithms used in competitive programming? (3 followers, 1 answer), but it was quickly unmerged by QCR. A FAQ could collect these similar questions, even when they couldn’t be merged on Quora.
It can be time-consuming to find the best merge target, especially given QCR rules. Quora’s merge suggestion list often doesn’t provide the best merge choice. A Google search works better, but for frequently-asked questions, it’s more efficient to search a FAQ than search all of Quora.

A competitive programming FAQ that is independent of Quora can resolve these problems. If QCR doesn’t want questions to be merged, they can still be listed under the same question in the FAQ. And the FAQ can categorize questions in a way that makes it easy to find appropriate merge targets.

Using some of the data collection from previous weeks, I have an initial list of categorized questions to use as a starting point for the FAQ categories.

« Continue »

CPFAQ: Classifying Quora Topics

By Duncan Smith Leave a Comment Mar 7 0

Competitive Programming Wiki

If you study the current Quora topic ontology for competitive programming, it’s clear that it needs some work. The six top-level topics don’t cover all of the subjects that people ask about. And many related topics aren’t even in the ontology, usually because people created them without selecting parent topics.

Here are two steps to create a better ontology: 1) Evaluate the related topics that already exist, and add each one to the appropriate place in the ontology, and 2) Write a clear description of what questions belong in each topic.

« Continue »

CPFAQ: Quora Topic Ontology

By Duncan Smith Leave a Comment Feb 28 0

Quora Topic Ontology

This week, I published a post to the Quora Topic Gnomery blog on the subject of competitive programming topic cleanup. It generated some discussion about how competitive programming topics on Quora should be organized, and prompted me to look into the Quora Topic Ontology.

« Continue »

CPFAQ: Quora Topic Cleanup

By Duncan Smith Leave a Comment Feb 21 0

Gnome

As part of the FAQ research process, I’m creating a canonical list of Quora topics related to competitive programming. Like many things on Quora, topics in this area are a bit messy. Or as the Quora Topic Gnomes say:

Quora’s Topics are a free-for-all, and that often creates greatness, but it really requires crowdsourced curation to make it all it can be.

« Continue »

CPFAQ: Collecting Quora Topics

By Duncan Smith Leave a Comment Feb 14 0

Quora Related Topics

In recent weeks, I have been experimenting with ways to collect Quora questions, especially those that don’t appear in standard views like search engine results and the All Questions page. But I also want to make sure that the questions I collect from alternative sources are relevant, since I eventually need to manually evaluate the best questions for a FAQ. Last week I started filtering on the Competitive Programming topic tag, but I realized that this can filter out relevant questions. To see why that is, I’m investigating how Quora topics work.

« Continue »