Wednesday, September 23, 2015

Automated Essay Scoring Updates

Today, September 23rd, 2015, we are rolling out the most significant change to our Automated Essay Scoring system in its history.  This involves many improvements summarized below:

  • Enhanced usage of grammar features in our predictive models
  • Ensemble methods for increased accuracy and reduced variability
  • Elimination of length bias
  • Additional predictors
  • More uniform distribution of scores
  • Incorporation of other Machine Learning and NLP techniques...
Although these changes represent an improvement in our AES technology, we recognize that classrooms as well as individuals may track changes in a score on a thesis or other written work over time, and that these changes could disrupt that process.  To mitigate this issue and ease the transition, we are blending the scores from our previous AES model with scores generated using our new scoring models.  As always, we welcome any feedback on the new scoring system.

We hope to continue with another round of major enhancements to the automated grader in the summer of 2016 when it will likely be less disruptive to most users of our service.

Monday, September 21, 2015

Anatomy of a Plagiarism Checker

If you've ever wondered how our plagiarism checker works on the inside or what our originality score means, then this article is required reading.  The green plus icon and the "100% originality" are a wonderful reassurance for writers that submit their work to our service, but what does it mean?  Similarly, if you receive an Originality of 70%, should you be concerned? Each of these questions will be answered as we take a look under the hood of our plagiarism detector.

A Daunting Task

When we say that we are checking for plagiarism, we are attempting to discover if portions of the given text might have been taken from other previously written texts.  Fifty years ago a teacher may be concerned that a work was taken from a book, or perhaps from another student in the same class or from a class years earlier.  Today, there are many reasons to want a measure of a text's originality, and this task is as daunting as ever.  While our ability to process text has improved dramatically from the scenario fifty years ago, so has the availability of text to would be plagiarizers.  In fact, the text that is publicly available on the Internet already exceeds trillions of pages and continues to grow exponentially.  The task of plagiarism detection is all about finding the proverbial needle in the haystack.  

How It's Done

For the curious, precise details of storing and rapidly searching massive amounts of text can be found under the field of Information Retrieval.  But for our purposes, high level details will suffice.  As stated earlier, we wish to search trillions of documents efficiently, so we turn to the companies that already do exactly this -- search engines.  Google, Microsoft, and Yahoo maintain the software and thousands of computers necessary to track, store, and search the massively growing index of Internet content.  They offer to us the ability to search their content via an API.  By using their search APIs, we tap into their vast data stores without the overhead of attempting to crawl the entire Internet ourselves.  

Specifically, when a document comes into our plagiarism detection service, we chop it up into small snippets of text and run a sample of those snippets through the search APIs. Consider the following snippets pulled from a paper on Abraham Lincoln:

  • Lincoln grew up on the western frontier in Kentucky
  • confronted Radical Republicans, who demanded harsher treatment of the South
  • remaining land he held in Kentucky in
  • became an able and successful lawyer with a reputation as a formidable
  • compensation for the owners, enforcement to capture fugitive slaves
  • ....
Imagine pulling 100 snippets from this document and then running a Google phrase search on each of these snippets.  How many of the 100 snippets would match a document on the Internet?  Since these excerpts of text come from the Wikipedia page, we would expect all (or nearly all) of them to have at least that page in the search results.  If these excerpts came from a completely original source, then we would expect all of the search results to come back with no matches (or perhaps a few false positives).  This is approximately the approach taken by our plagiarism detection service.  The originality score that you receive is represented by this simple formula:

1 - (Number of Searches with Matches / Total Number of Searches)

According to this formula, the originality is 0% when all of the searches have matches, which is exactly what we expect. Now this is a simplified overview of what is actually a much more complicated process, but it conveys a general appreciation of the methodology used at PaperRater.

Checking Against Past Submissions

One question we receive from time to time is whether past submissions are used in calculating the originality score.  The answer is 'No', but this deserves an explanation.  Sites like TurnItIn bank previous submissions and check against these in addition to using search APIs.  This creates concerns for false positives as well as privacy that we would rather avoid.  Imagine submitting an original paper to our service before you turn it in and then being accused of plagiarism when your teacher checks it with the same service one week later.  Rest assured that PaperRater checks papers using only the search APIs.

Tuesday, June 30, 2015

Sentence Beginnings

Just how do we start that perfect sentence? It can be a tough decision. Sentence beginnings are like first impressions, and we want to make sure they’re right. And while there’s no single correct way to start a sentence, we need to vary them to ensure our writing doesn’t get stale or boring.

So, what makes a bad sentence beginning anyway? Are there rules? Well, no, but let’s look at some examples of how repeated simple sentence openings can become stilted and tough to read. For example:

John went to the store. Anthony went with him. They bought food and drinks. They bought flowers for John’s mother. The man at the cash register smiled at them and gave them a discount. They returned home and put away the groceries.

What do all these sentence beginnings have in common? As we can see, each one consists of a noun followed by a verb. Unfortunately, these simple sentences quickly become boring, decreasing the impact of our writing. Let’s see what happens when we vary them a bit.

John and Anthony went to the store, where they bought food, drinks, and flowers for John’s mother. Smiling, the man at the cash register gave them a discount. Returning home, the two put the groceries away.

See how much better this flows? Here, we’ve left the simple noun-verb structure for the first sentence, but, in the second, we’ve used “smiling” as a participial or transitional word. And in the third, we’ve emphasized the action of returning home, rather than the two boys.

Let’s look at some other ways to rearrange simple sentence beginnings (e.g. common noun-verb constructions) and add some variety to our work. Remember, our goal here is not only to minimize the use of simple sentence openings - it’s to become better writers!

Example #1

The dog barked with great ferocity.

With great ferocity, the dog barked.

Here, we’ve placed “with great ferocity” at the top of the sentence. Notice how it highlights the action of the dog barking rather than the dog itself.

Example #2

The car’s engine made a loud boom from around the corner.

From around the corner came the loud boom of the car’s engine.

Here, we’ve rearranged the sentence so the initial subject (the car’s engine) is placed at the end. The second sentence highlights a location and a sound rather than the car’s engine.

Example #3

Billy stayed home from school, sad about the loss of his grandmother.

Sad about the loss of his grandmother, Billy stayed home from school.

Here, we’ve started with an adjective. In a normal noun-verb sentence opening, the adjective would follow, rather than lead, the noun and verb.

Example #4

Briana danced the night away.

Here we have a very simple sentence, which may work well. However, you could try adding a present or past participle to the beginning, describing a little more action or detail.

Laughing at her own silliness, Briana danced the night away.

Not only does it paint a better picture, it breaks up the noun-verb opening.

Example #5

The furious man shook his fist at the car turning the corner.

Enraged, the man shook his fist at the car turning the corner.

In this example, we can actually leave the sentence intact, but add a transitional word or phrase to the front for little variety.

We can also try this with an adverb.

Furiously, the man shook his fist at the car turning the corner.

Or, an appositive.

A furious old man, Robert shook his fist at the car turning the corner.

Example 6

Now, let’s synthesize some of the above examples in the context of a full paragraph. First, let’s read a paragraph made of simple noun-verb sentence openings.

Jacob ran into the street. Anne met him. They ran together down the block. Their shoes flew off the pavement. A car passed by and nearly hit them. They didn’t care. They held hands and kept running. They swerved out of the street and onto the grass. They each grabbed onto the giant oak tree trunk and climbed up. They sat there, catching their breath.

Did you notice that you started to tune out while reading? While the actions described in this paragraph are clear, we lose interest, because each sentence has the same structure.

Anne and Jacob met in the street and ran together down the block, shoes flying off the pavement. Around the corner came the car, nearly hitting them, but they held hands and kept running. Suddenly, they swerved out of the street and onto the grass. Up the giant oak tree they climbed, then sat there, catching their breath.

See how this paragraph keeps our attention? Varied sentence beginnings emphasize different actions and locations, whereas a simple noun-verb construction always emphasizes the subject.

Our goal is to keep our sentences diverse, which makes for more interesting reading. Remember, sentence beginnings (and sentence structure in general) create a tone and rhythm for the reader - and that tone is as important as the content itself!

For more modules and other helpful instructional writing pieces, visit our blog at:

PaperRater’s FREE Plagiariam and Grammar Checker

Remember to visit for our FREE plagiarism and grammar checking tool. Get detailed reports on all your writing, including grammar, readability, transitional phrase reports - and even a comprehensive “grade” on any piece of writing!

Tuesday, May 19, 2015

Passive Voice vs Active Voice

Passive Voice

Passive Voice module of essay checker
English and grammar teachers love to tell their students to use the active voice because it tends to make sentences shorter, clearer, and more impactful.

But is it a crime to write in the passive voice? Absolutely not. In many cases, the passive voice is actually preferable to the active voice. However, it does present many dangers that could make our writing wordy or unclear. Let’s define the active and passive voices, then discuss some potential problems with passive writing.

Examples of Active and Passive Voices

Whether a sentence is active or passive depends on the relationship between the verb and the subject. In the active voice, a subject performs a verb. For example:

Barry Bonds hit 762 home runs.

Barry Bonds (subject) hit (verb) 762 home runs (object).

In the passive voice, the subject is switched, so the object of the active sentence becomes the subject of the passive sentence.

762 home runs were hit by Barry Bonds.

Unlike the subject in the active voice, the subject in the passive voice does nothing. In other words, the subject, home runs, takes no action. Instead, the home runs are acted upon.

Another example of an active sentence:

My parents bought groceries for my sister’s birthday party.

The subject of the sentence, the parents, performs the action of buying groceries. The parents are the focus of the sentence.

However, in the passive voice, the subject is switched, so the object of the active sentence becomes the subject in the passive.

Groceries were bought by my parents for my sister’s birthday party.

The subject, or focus, of the sentence takes no action. Instead, the groceries are acted upon by the parents.

Problems with Passive Voice

A passive voice can create confusion; it often disrupts rhythm and makes a sentence harder to understand. In many cases, verbs and subjects become vague or ambiguous.

Evidence was presented to support the idea that homelessness is experienced by more than 600,000 people.

A couple questions: Who is presenting this evidence? And how is the number of homeless people an idea? Also, because the subject, the evidence, doesn’t perform any action, the sentence is inherently confusing.

Let’s clarify this sentence with a few simple fixes:

The U.S. Census Bureau estimated the number of homeless people at 600,000.

Here, the U.S. Census Bureau becomes the subject who drives the estimation of the number of homeless people. The focus of the sentence has shifted, creating a simple, straightforward structure. Another example:

A talk was given by the college professor; she cited a paper that said homelessness went down last year.

Again, the subject of the sentence, the “talk,” doesn’t do anything. Here, the passive voice creates a clunky break that requires a semicolon to keep the sentence grammatically correct. The subject of the first part of the sentence is the “talk,” but the talk doesn’t cite the paper, the professor does. See how confusing the subject can become in a passive sentence?

The college professor cited a paper stating homelessness decreased last year.

By changing the subject of the sentence to the doer of the action (the college professor), we get a simple, easy to read statement.

How to Identify the Passive Voice

The easiest way to identify the passive voice is to look for the following in any sentence:

passive voice = form of “to be” + past participle (verb)

A past participle is a verb that takes the past tense form. Look for it in conjunction with a form of “to be,” which usually includes words like is, are, am, was, were, has been, have been, had been, will be, will have been, and being.

Also, look for the need to attribute the perceived doer of action with the word “by.”

When the car was driven by the racer, he sped out of control and hit the guardrail.

Reasons to Use the Passive Voice

Remember, we want our writing to be clear. So when we talk about passive versus active voices, keep in mind that either voice can work, depending on the situation.

Here are a few occasions where the passive voice may be preferable to the active voice:

  1. When the agent is more important than the subject. Take the example: “My car was hit.” We want to focus on the car itself, since we care more about the car being damaged than we do about who damaged the car.
  2. When the agent of action is a secret or an authoritative figure. Take common disclaimers like “Trespassers will be prosecuted,” or “Access is denied.”
  3. When we want variety in our writing. Any kind of writing, no matter how active, tends to grow dull after awhile. Sentence rhythm and structure will feel stilted and repetitive, especially when each sentence is focused only on the drivers of action.

How can PaperRater help you construct more active sentences?

Use our free essay checker today and receive a full report on all your writing, including grammar correction, plagiarism checker, and a “Passive Voice” analysis that automatically scans your document for passive sentence constructions. Instantly improve your writing today with PaperRater’s FREE electronic spelling, grammar, word choice, vocabulary and style grades!

Tuesday, May 5, 2015

Effective Use of Sentence Length

The issue of sentence length leaves many writers scratching their heads. Short, long, medium length sentences - which are better? Does it make a difference? Why should we pay attention to sentence length anyway?

For one, it adds as much meaning to a text as the words you choose. Sentence length conveys a specific mood and rhythm and matches the actions being described. For example, if you were writing a tense car race, shorter sentences may help heighten the suspense of the scene. On the other hand, longer sentences may work better when writing about complex philosophical abstractions.

Let’s look at a couple examples.

“As the number one car slammed its brakes around the turn, my foot hit the gas, and I swung around him, crossing the finish line and winning the race.”

It’s not bad, but let’s see what happens when we break it up into several sentences.

“The number one car slammed its brakes around the turn. My foot hit the gas, and I swung around him. I crossed the finish line, winning the race.”

We find that shorter sentences help tighten the action, accentuating the descriptions of “slamming the brakes,” “hitting the gas” and “crossing the finish line.”

Other texts may demand longer sentences:

“Descartes stated that the mind is mental. The body is physical. Mind and body are, therefore, not identical. This is the mind-body problem.

Philosophical problems are often complex and may work better with longer sentences and more description:

“Descartes stated that since the mind is mental and the body physical, the two cannot be identical. This dilemma is known as the mind-body problem.”

See how much clearer this version reads? The longer sentence length creates a nice, flowing structure that leads logically from one idea to the next.

Let’s look at more examples in which short and long sentences can be problematic, followed by some strategies for correcting them.

Short Sentences

Short sentences are useful for supplying small bits of information. They cut to the chase and emphasize one, maybe two, points. But, their stop-and-start rhythms can make them difficult to read:

“Short sentences are hard to read. They stop and start. What happens when you read them? You feel like you’re stuttering. They break up the thought process. Sometimes they’re useful. Other times they’re not. They’re frustrating. Right?”

How do we stretch out these short sentences so they’re not so clunky? Try lengthening them with conjunctions, which are words that join two sentences, or independent clauses, together.

Specifically, let’s look at coordinating and subordinating conjunctions. Coordinating conjunctions include words like and, but, or, nor, for, yet and so. Common subordinating conjunctions include although, because, once, unless, wherever and many, many more. Read a complete list of them here.

Let’s rework our short-sentence paragraph:

“Short sentences are hard to read because they stop and start, making you feel like you’re stuttering. Although they’re useful for breaking up the thought process, they can be quite frustrating to read. What do you think?”

See how conjunctions create a simple chain of ideas to help round out the sentences’ rhythms? Try using them the next time you find yourself writing sentences that are too short or do not reflect the proper mood.

Long Sentences

Long sentences provide more detail and information than shorter sentences and are used to investigate in-depth ideas. However, they, too, can be problematic, for repeated use of long sentences can bore the reader. They may also become difficult to read, since the reader must hold several ideas in his or her head at once.

Let’s look at an example of a long, somewhat complicated sentence:

“Although I prefer to write long sentences, they are also problematic, as they quickly become boring and long-winded; in turn, their inherent difficulty can disengage the reader, causing him or her to stop reading.”

It’s not a completely terrible sentence, but it is long, complex and may be more effective if we break it up into several sentences:

“Although I prefer to write long sentences, they quickly become boring and long-winded. They’re also difficult to read and may cause the reader to stop reading.”

See how our points become sharper? Instead of five or six ideas, each sentence contains two, making them easily digestible.

How Can PaperRater Help You With Your Sentence Length?

Check out PaperRater’s FREE sentence length module (part of its online proofreader and grammar correction) to help keep your sentence length within an acceptable range. 

By analyzing the amount of short and long sentences in your document, we’ll show you where you might need improvement on lengthening or cutting down your work. Instantly improve your writing by combining our sentence length tool with our spelling, grammar, transitional phrases module and more!

Tuesday, April 21, 2015

Readability Indices

It’s widely known that easy and enjoyable reading helps learning and comprehension. Most people prefer reading “plain English,” and tend to turn off when a passage is too difficult to read. So when we speak of a text’s readability score, or of a readability index, what we mean is: How easy is it for readers to understand?

Measuring readability is important for a number of reasons. For one, teachers need to know if their students are capable of writing at their grade level, or whether they need more schooling in a particular area. Readability scores also help teachers and school systems grade whether a textbook is right for their students.  

Writers also need readability scores - especially those who write for children and pre-teens. They’ve definitely helped me make sure I’m writing for the correct audience; plus, they’ve taught me how to increase or decrease my usage of complex sentences depending on my readership. Never write outside your audience!

PaperRater Premium-Only Module: Readability Scores

PaperRater is proud to offer a NEW series of twelve readability scores as part of our premium service. Just sign in, visit our proofreader, enter your text, and receive a full, side-by-side comparison of the most common readability indices.

Why do you offer twelve indices? Shouldn’t I use just one?

Each readability index uses different criteria to create a score. As you’ll see, some use input based on syllables, and others based on word length. Plus, the equations used are slightly different.

So we give our users a set of twelve to eliminate bias and to provide you a wider range of input. That way you can choose for yourself which ones you want to incorporate into your work.

Here’s a quick breakdown of each index we provide:

Automated Readability Index (ARI)

The ARI grades text based on a combination of word and sentence structure. Computers find it difficult to analyze syllables, so the ARI uses a formula based on the number of characters per word, although it’s debatable whether counting characters or syllables is more helpful.

Coleman-Liau Index

Creators Meri Coleman and T. L. Liau constructed this readability score for the Office of Education to standardize textbooks in the United States. Like the ARI, it operates on the assumption that characters per word is a better indicator of readability than syllables. From Wikipedia: “L is the average number of letters per 100 words and S is the average number of sentences per 100 words.”

Dale-Chall Readability Formula

The Dale-Chall Readability Formula uses a different kind of input. Instead of using the number of characters in a word, it calculates the approximate grade level of a text by measuring “hard words.”

What exactly are hard words? The Dale-Chall list contains a list of about 3,000 words known by at least eighty percent of the children in the fifth grade. Words considered difficult are those not listed. The higher the score, the higher the text’s grade level.

Flesch Reading Ease
Unlike the first two indices, the Flesch Reading Ease calculates readability by the average sentence length and the average number of syllables per word. Text is rated on a scale from one to a hundred; the lower the score, the harder the text is to read. Plain English is set at 65, with the average word containing two syllables. The average sentence contains 15 to 20 words.

By the way, the above passage received a score of 73.7 on the Flesch Reading Ease, which means it’s slightly easier to read than plain English.

Flesch-Kincaid Grade Level

The Flesch-Kincaid Grade Level is a companion index to the Flesch Reading Ease, and uses the same inputs (average sentence length + average number of syllables per word). However, the measurements are weighted differently, and so it produces a score as an approximate grade level (e.g. 1-12, or higher).

Fry Readability Formula

Instead of taking a complete word count, the Fry Readability Formula randomly chooses three 100-word samples throughout the text. It then counts the average number of syllables and the average number of sentences per hundred words, and plots them onto a graph. The intersection of the two averages represents the appropriate reading level. It is used widely in the healthcare industries.

Gunning Fog Index

The Gunning Fog Index is calculated by the average length of a sentence and the percentage of complex words. Its inventor, Robert Gunning, complained that then-current writing was too complex (had too much fog) and needed to be simplified. Complex words in this case are described as having three or more syllables. Yet while complex words can be a good indicator of readability, it fails to account for the fact that words with three or more syllables are not necessarily difficult to comprehend.


LIX, or the Lasbarhetsindex Swedish Readability Formula, was specially designed for the readability of texts in foreign languages. Its formula uses (a) the number of words; (b) the number of periods; and (c) the number of long words (more than six letters). Multicultural teachers prefer its emphasis on long words and average sentence length to predict readability.

\text{LIX} = \frac{A}{B} + \frac{C \cdot 100}{A}

Linsear Write

Similar to the Flesch readability indices, Linsear Write helps calculate a text’s readability by sentence length and the number of “hard words” - words with three or more syllables. It was adopted by the U.S. Air Force to grade the readability of their flight manuals.

Raygor Estimate Graph

A simple readability estimate, the Raygor Estimate allows you to calculate an approximate grade level by taking 100 words from your text and counting the number of sentences. Then you count the number of words within your sample that contain six or more letters. Plot your points on the graph below, and you will receive an approximate U.S. grade level.


SMOG is a playful acronym for “Simple Measure of Gobbledygook,” and was developed as a replacement for the Gunning fog index. It is estimated by taking three 10-sentence samples from a piece of text, counting the words with three or more syllables, estimating the square root of the number of words, then adding three.

Spache Readability Formula

This formula was designed for third-grade texts or below, comparing the amount of “unfamiliar words” to the number of words per sentence. “Unfamiliar words,” in this case, are determined to be words that those in third grade or below do not understand. It is recommended to use the Space Readability Formula for those in third grade or below, and to use the Dale-Chall Readability Formula for those in fourth grade or above.

Ready to get started using PaperRater’s premium services, including our twelve readability indices?

Sign up today and receive the following:

  • Longer document uploads (up to 20 pages!)
  • Enhanced plagiarism detection
  • Faster processing
  • NEW readability indices
  • No banner ads
  • File uploads (.doc, .docx, .txt, .odt and .rtf documents)
And remember to try our FREE plagiarism checker and online grammar check for a full report on your use of grammar, spelling, transitional phrases and more!