Site icon Doc Sity

Marymount University It Strcuture of Programming Paper

Marymount University It Strcuture of Programming Paper

Question Description

1)

a) Using one of the Corpora in the last lab. Calculate the average “Tokens” per sentence.

b) Using the same or different corpus, which category has the longest sentences on average, which has the shortest?

2) Download your own “Corpus” on https://www.gutenberg.org/ (Links to an external site.)

a) How many sentences are in the document (use NLTK to split the sentences)? How does this differ from the amount of lines in the file (readlines)?

b) After tokenizing the sentences, find 3 errors and describe why you think this error might of occurred. What in the algorithm might have gone wrong?

Have a similar assignment? "Place an order for your assignment and have exceptional work written by our team of experts, guaranteeing you A results."

Exit mobile version