CPFAQ: Scraping with Selenium

Selenium

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

When you’re logged in to Quora, you see more information than an anonymous user does. For example, on the all_questions page for a topic, logged-in users see a title for each question along with how many answers it has, when it was last followed or requested, how many followers it has, and various available actions. Anonymous users just see the question titles.

When I started collecting Quora questions for the FAQ, I noticed this discrepancy between the anonymous and logged in experiences. To collect as much information as possible, I often manually saved pages while logged in, and then ran my tools on the saved HTML. But for individual question pages this wasn’t practical since I’m tracking over 15,000 questions. For those, I wrote a program to download pages automatically. And since that program did not log in, some useful information was not available.

It would be ideal to combine the convenience of automation with the extra data provided to logged-in users. This week, I experimented with using the Selenium testing framework to achieve this. It turned out to be a simple process.

« Continue »

CPFAQ: Canonical Question Statistics, Part 2

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page. I have now classified over 1000 Quora questions, using 552 canonical titles, and I think that’s […]

Continue

CPFAQ: Good Answers to Bad Questions

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page. As I mentioned at the end of last week’s post, it’s hard to write a good […]

Continue

CPFAQ: The Value of Canonical Questions

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page. Last week I discussed how question merging works for Quora and CPFAQ. Related to question merging […]

Continue

CPFAQ: Merging Questions

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page. If a CPFAQ page has a canonical title and contains a list of Quora questions that […]

Continue

CPFAQ: Canonical Question Statistics

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page. As I mentioned last week, I’m currently creating FAQ pages, and those FAQ pages rely on […]

Continue

CPFAQ: Creating a FAQ Page

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page. For at least the next few weeks, I’ll be creating competitive programming FAQ pages for the […]

Continue

CPFAQ: Adding Wiki Pages

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page. We’re officially halfway through the year, as measured by weekly blog posts. That means I’m also […]

Continue

CPFAQ: CPWiki

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page. With the halfway point of 2018 approaching, it’s time to focus on the website that will […]

Continue

CPFAQ: Document Classification

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page. To organize my list of Quora questions, I have started giving each one a primary category […]

Continue