- Python 3
- pymystem3: Russian lemmatizer by Yandex
https://pypi.org/project/pymystem3/
pip install pymystem3
- FFMPEG: Command-line tool for audio conversion
https://ffmpeg.org/download.html - The OpenRussian dictionary downloaded as a CSV file.
- Specifically, the following files are needed in
_materials\openrussian
:- words.csv
- translations.csv
Prerequisite:
- A Google service account key for access to the Speech API
https://console.cloud.google.com/apis/library/speech.googleapis.com
https://console.cloud.google.com/apis/credentials
- Place
ServiceAccountKey.json
into solution root - Edit values at top of GoogleTranscriber.cs (projectId and bucketName) to match what you set up in API console.
- Place
The code in Tool
itself takes care of uploading the FLAC file, initiating the transcription, polling for progress, and retrieving
the transcribed text.
Prerequisite:
- A subscription key for the speech-to-text service in Azure App Services
https://portal.azure.com/#blade/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/SpeechServices
How to:
- Transcription is performed by a dedicated tool in the repository,
MSTranscriber
. - Use from-mp3.cmd to creat a WAV file with the required parameters. Upload the file to a publicly available URL.
- Create a
TranslationConfig.json
in the solution root (you can use the sample file). Enter your subscription key, and fill in the URL, language, and output file name - Build and run MSTranscriber. Working directory must be the solution root.
To prepare, open the Properties of the Tool
project in Visual Studio (right-click in Solution Explorer),
and under Debug set the solution root as the working directory.
- Convert mp3 file to other formats:
Scripts\from-mp3.bat SAMPLE
- Prepare cleaned-up plain text file of text, one line per paragraph. This will live as
_work\SAMPLE-orig.txt
- In the Tool project, edit line in Main() to match current conversion:
doOrigAlignRus("SAMPLE", (decimal)0.35, "Чехов: Анна на шее");
- Edit the Method
doOrigAlignRus
for uncomment thereturn
. - Run the Tool.
- Edit
Scripts\rulem.py
to work on SAMPLE, and execute: python3 rulem.py - Edit the Method
doOrigAlignRus
for comment thereturn
. - Run the Tool again. Second run skips transcription, and finishes annotation because
SAMPLE-lem.txt
is now available.
The player is is based on Vue.js; it is currently being built with Node.JS v14. The prerequisites are:
- Node.js v14 (download page)
- Yarn (download page)
- Vue CLI (download page)
- In the
ProsePlayer
folder, restore Node modules:
> yarn
- Build the App:
yarn build
The outcome is in the dist
folder, which can be published as-is. Notes:
- Before building, make sure the data for the episodes is in
public/media
. Alternatively, the same files can be copied directly intodist/media
, or the equivalent folder in the published tool online. The pre-processing script copies its outputs into this folder by itself. - To devolop the tool (run it locally, with livereload), you need:
yarn serve
- You only need to run
yarn
again to update the Node modules if the content ofpackage.json
has changed.