Tool for lexical analysis of text data
---text1.tx---
name of the analyzed file
Char count: 761674
number of characters in the file (without white-spaces)
Word count: 188912
number of words in the file
Dict size: 10762
dictionary size: number of different words used
Most used words (>1%): [a,and,had..<more>]
all words that make up for more than 1% of the text
'love' count: 60
number of times that the word ‘love’ occurred
'hate' count: 4
number of times that the word ‘hate’ occurred**
'music' count: 4
number of times that the word ‘music’ occurred
vowels %: 38
what part of all characters are vowels (a,o,i,e,u)
a:e count ratio: 1.54
the ratio of ‘a’ to ‘e’ occurrences
[ G -> 2.16] [ R -> 5.36] ....<and the rest>
% of in whole text of all the letters
- The main goal is to implement all classes exactly how they're presented in the class diagram. Please, analyse it carefully multiple times and make sure that your implementation is compatibly with it at the end.
- Don't focus to much on the validity of your results. For now, the software design is your priority,
- Remember about different kind access modifiers. Small tip: you need to use the default access modifier to implement the code according to the UML class diagram.
- Private methods are not shown on the diagram. You are free to implement any necessary/helpful private methods in any class.
- Create custom implementation of iterator pattern.
- The input files will only consist of alphabetic characters and white spaces.
- The input file will consist both upper and lower case characters. You should use only one case to ensure that the results of the analysis will be consistent.
- Handle only those exceptions which are required to be handled by the functions and methods you use.
- The structure of namespaces is not restricted by anything but clean code principles.
- Don't think too much about the idea of interfaces. Just do it. We'll discuss this topic later.
It's an interface. Create the interface with required methods signuteres.
class responsible for reading text files and providing its content through implementing **IterableText ** interface. The content will be provided through methods **CharIterator() ** and WordIterator().
successively iterates over alphabetic characters (single letters) of the text. It skips all other characters (like white-spaces). Implements MoveNext() and HasNext() from **Iterator ** interface. NOTE: Chars are represented as Strings for the sake of simplicity. Remember that characters’ case (UPPER or lower) should be consistent in our analysis.
successively iterates over words of the text. Words in the text are separated with white-spaces. Implements MoveNext() and HasNext() from **Iterator ** interface.
It's the dataset. It is initialized with Iterator< and provides functions for statistical analysis of the text elements (Strings). NOTE: You may choose any data structure to store the processed data. The choice will affect the amount of code you will have to write.
Returns sum of all occurrences (in the dataset) of the elements given as arguments ( elems). Note: One string may be given as well. NOTE: To clarify - read how params keyword works
Returns the number of unique elements of the dataset. It’s the size so-called Author’s Dictionary - the set of different words the author of the text uses.
Returns the total number of all elements in the dataset. This will be total number of words or total number of characters.
Returns all the elements that occur more than _n __times _ in the dataset
This class should be responsible for presenting the data to the user. You are free to implement any functions you find sensible. NOTE: In case of this class you don’t have to follow the UML diagram.
The Main class of the program. It should allow the user to pass to text files for analysis as CLI arguments. Also, measure the time of the program's execution. For example:
dotnet run -- text1.txt text2.txt