Description:
Part-of-speech tagged and lemmatized corpora on wind power in 7 languages: English, French, German, Spanish, Russian, Latvian and Chinese.
Resource type:
corpus
Resource availability:
available for commercial use
available for research purposes
free
Can the resource be directly downloaded?:
Yes
Modality:
text
Production date:
2012
Domain:
Format explanation:
*.txt files containing clean text, *.xml files containing the DublinCore metadata of the previous text files, *.xmi files containing tagged and lemmatized corpus in XMI format compliant with the UIMA Type System for TTC TermSuite, *.tsv files containing tagged and lemmatized corpus in TSV format (tabulated-separated values) i.e. one word per line, 3 columns per word (the word itself, its category and its lemma)