Intro to classifiers

Training a classifier

To train a classifier in Amplified you need three things:

A tag
Positive training examples: example relevant patents
Negative training examples: example patents that are not relevant although not entirely noise (ideally from the same field)

The secret to training top notch classifiers? Quality > Quantity.

Classifiers can only learn from what they have seen. Let's say you give a classifier hundreds of red apples as positive examples and green pears as negative. Now you show it a green apple. The classifier has never seen a green apple. It doesn't know if you would consider this positive or negative. In other words, are you trying to classify apples? Or green fruit?

The lesson to learn here is that a classifier needs to see a range of examples to understand what you want. In the example above adding just 1 green apple to the training data will have a larger impact than adding another 100 red apple examples.

Let's get training!

Once you have a tag with both positive and negative example patents you, you can click on Manage and then select Train classifier. After training completes clicking on Manage will show "View classifier" instead of "Train classifier".

Using a classifier to predict tags

Now we can use our trained classifier to predict tags! Let's open a project with the patents that we want to run the classifier on.

Use the Predict All shortcut button to run the classifier on your entire result set or select specific patents and

then open the Actions menu.

Predicting tags on your entire result set

Predicting tags on selected patents only

Reviewing predicted tags

After running your classifier, you will see predicted tags on your results. Predicted tags will have a confidence score on them between 0.0 and 1.0.

What do the green and red colors mean?

Predicted tags have a Cutoff value. Confidence scores above the cutoff are predicted positive and scores below the cut-off are predicted negative. Scores that are close to the cutoff are low-confidence predictions. Scores that are far away from the cutoff are high-confidence predictions.

Predicted positive tags are green
Predicted negative tags are red

Sort by confidence score

You can also use the predicted scores to sort the results in your list. For example, if we want to see the patents that are most likely positive for Qubit generation, we can predict on the entire result set and then select Qubit generation from the Predicted tag score option in the Sorted by dropdown.

Customize cut-off values

Amplified automatically calculates cutoff values but you have the option to customize this value. This can be helpful for processing large volumes of data since you can control which patents are labelled predicted positive and predicted negative. For example, you may want to consider everything with a score below 0.20 as negative. To do that you would simply set the cut-off value to 0.20.

From the tag management page click on Manage and View classifier. This will open the classifier page where you can see the current Cutoff value. Click on the three red dots to change this.

Here you can select custom and use the slider to pick a desired value then Save. You can also select one of Amplified's presets and click Recalculate.

Retraining: giving feedback and improving the classifier

Classifiers often need some tweaking to get them to do exactly what you want. The best way to do this is to predict tags, review some of the predictions, and mark them as either positive or negative with check or X button. Sorting by confidence score can help to prioritize your review around the most and least confident predictions.

You can then re-train the classifier and it will learn from the updated training data based on your feedback. To re-train a classifier, navigate back to the tag management page click on Manage then View classifier. This will open the classifier page where you can click Train.

After retraining a classifier you can Predict tags again to overwrite the previous scores. Then you can iterative between reviewing and retraining until you're confident in the predictions and cutoff value.