CPFAQ: Patterns in Question Titles, Part 3

Word Cloud

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

I’m mining my Quora question corpus to find patterns and collect data to help write a list of canonical questions. In recent weeks, I’ve been looking at the words used to start question titles. This week, I’m analyzing the full text of the question titles in the list.

« Continue »

CPFAQ: Patterns in Question Titles, Part 2

Question Titles 2

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

Last week, I did some simple text mining to classify Quora Competitive Programming questions based on the first word (How, What, Why, etc.) of the question title. This week, I’m extending that a bit by looking at starting phrases containing 3-4 words each.

« Continue »

CPFAQ: Patterns in Question Titles

Question Titles

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

Now that I have a database of competitive programming questions, I thought it would be interesting to look for patterns in the way that questions are asked. This will be useful in writing canonical question titles, which will result in the set of questions included in the FAQ.

I briefly considered loading the list of questions into a text mining program, and even tried out a free online tool. But ultimately I decided that was overkill, so I wrote a simple program to find the words that are most frequently used to start the questions in my list.

« Continue »

CPFAQ: SELECT Queries

PIVOT query

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

Data in a relational database is often not arranged in a way that makes sense to the end user. Instead, it’s optimized to reduce duplication and improve query performance. So when it’s retrieved from the database, it needs to be converted into a more optimal format through a combination of SQL queries and application logic. As an example, I’ll use data from my question database.

« Continue »