StanfordNLP

stanford NLP :

used for building models in other languages
stanford NLP librabry can support more than 53 human languages
collection of pre-trained state-of-art models
models are built on pytorch and can be trained with our own data
core NLP(official wrapper)

setup: stanford NPL:

conda create -n stanfordnlp python=3.7.1

source activate stanfordnlp

pip install stanfordnlp

import stanfordnlp

stanfordnlp.download('en')

language model size: english: 1.96 GB
error possibilities:
older version of pytorch might crash
check pytorch version:
```
  pip freeze | grep torch
```
non gpu machine will throw a memory error

pipelines

sequence of data processing tasks and algorithm(while tokenization)
processors:
- tokenization
- mwt
- lemma
- pos
- depparse

tokenization

gives particular tokens to parts of sentences
contains index,list of words(if multi word token)
all word objects contains useful info(index,lemma,pos tag, morphological features)

lemmatization

lemma property of words(root word of a particular word)
.lemma method gives the lemma of each word

pos

pos(parts of speech) tagger gives the parts of speech property of a word in a sentences
ex: could: modal verb, for: preposition

dependency extraction

gives the grammatical relationship between words in sentence
sentence.print_dependencies()

stanford nlp for hindi language:

stanfordnlp.download('hi')

core_nlp

time tested
industry grade
performance
accuracy

setup(core_nlp)

    wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip

    unzip stanford-corenlp-full-2018-10-05.zip

    java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000

export $CORENLP_HOME as the location of your folder(to amke sure stanfordnlp knows location of core_nlp):
```
export CORENLP_HOME=stanford-corenlp-full-2018-10-05/
```

annotators:

read the text
locate target entities
highlight them
predetermine list of labels (‘tokenize’,’ssplit’,’pos’,’lemma’,’ner’,’depparse’,’coref’): ssplit: sentence split(dividing text into sentences) ner: named entity recognition(categorize parts of sentence) depparse: dependency parsing(outputs based on dependency extraction) coref: coreference resolution(finds mentions of the same entity in text, ex: ‘deepika may’ and ‘she may’, refers to the same entity: deepika)

Dependency Parsing and POS:

print(dependency_parse)
print(token.pos)

Named Entity Recognition and Co-Reference Chains:

print(token.ner)
print(ann.corefChain)

Pros and Cons:

Pros:

multiple languages
goin to be official python interface(improves functionality and easy to use)
fast
straightforward setup in python

cons:

large language models
quickly scripting prototype is not possible
missing visualizing features

wiki

Wiki Kactii

StanfordNLP

stanford NLP :

setup: stanford NPL:

error possibilities:

pipelines

tokenization

lemmatization

pos

dependency extraction

stanford nlp for hindi language:

core_nlp

setup(core_nlp)

annotators:

Dependency Parsing and POS:

Named Entity Recognition and Co-Reference Chains:

Pros and Cons:

Pros:

cons: