SlideShare a Scribd company logo
4
Most read
16
Most read
17
Most read
Introduction to spaCy
2017-10-19
@auzen_
Goal
spaCy を使おうという気持ちになる
spaCy をすぐに使ってみることができる
2 / 17
What's spaCy?
"Industrial-Strength" NLP Library
Fastest in the world
written in Cython
Get things done
easy to install
simple API
Deep learning
interoperates seamlessly with TensorFlow, Keras,
Scikit-Learn, Gensim 3 / 17
Fastest in the world
Syntactic parsing
(Choi et al., IJCNLP, 2015)
https://spacy.io/docs/api/ 4 / 17
Fastest in the world
Detailed speed comparison
https://spacy.io/docs/api/
5 / 17
Get things done
Installation:
$ pip install spacy
$ python -m spacy download en
Load model and process text:
import spacy
nlp = spacy.load('en')
doc = nlp('Can you process this text?')
6 / 17
Get things done
POS tagging:
for token in doc:
print(token, token.pos_)
Can VERB
you PRON
process VERB
this DET
text NOUN
? PUNCT
7 / 17
Get things done
Dependency parsing:
for token in doc:
print('{} -({})-> {}'.format(token.head, token.dep_, token))
process -(aux)-> Can
process -(nsubj)-> you
process -(ROOT)-> process
text -(det)-> this
process -(dobj)-> text
process -(punct)-> ?
8 / 17
Get things done
Named entity recognition:
doc = nlp('The current capital of Japan is Tokyo.')
print(doc.ents)
(Japan, Tokyo)
9 / 17
What's next?
More about spaCy
Natural Language Processing in 10 Lines of Code
How spaCy Works
Incorporate with Deep learning library
Deep Learning with custom pipelines and Keras
Sense2vec with spaCy and Gensim
10 / 17
Try spaCy on the website
Dependency parsing
Named entity recognition
Sentence similarity
11 / 17
textacy
higher-level NLP built on spaCy
Documentation / GitHub / API Reference
textacy is a Python library for performing higher-level natural la
nguage processing (NLP) tasks, built on the high-performance s
paCy library.
textacy focuses on tasks facilitated by the ready availability of t
okenized, POS-tagged, and parsed text.
12 / 17
Features
https://github.com/chartbeat-labs/textacy
13 / 17
うちの研究室的に便利そうな機能
textacy.preprocess.preprocess_text
x "broken" unicode
replace all URL strings with 'URL'
replace all email strings with 'EMAIL'
replace all phone number strings with 'PHONE'
replace all number-like strings with 'NUMBER'
...
14 / 17
うちの研究室的に便利そうな機能
textacy.preprocess.preprocess_text
from textacy.preprocess import preprocess_text
text = 'ここの研究室すごいよ!!! http://www.cl.ecei.tohoku.ac.jp'
preprocess_text(text, no_urls=True)
'ここの研究室すごいよ!!! *URL*'
15 / 17
うちの研究室的に便利そうな機能
textacy.extract.pos_regex_matches
from textacy.extract import pos_regex_matches
from textacy.constants import POS_REGEX_PATTERNS
list(pos_regex_matches(nlp('Can you process this text?'),
POS_REGEX_PATTERNS['en']['NP']))
[this text]
16 / 17
... and more!
17 / 17

More Related Content

PPTX
Building Named Entity Recognition Models Efficiently using NERDS
PPTX
Industrial strength - Natural Language Processing
PPTX
Natural Language processing
PDF
Asymptotic notation
PPTX
NAMED ENTITY RECOGNITION
PPTX
NLP State of the Art | BERT
PDF
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
PDF
IE: Named Entity Recognition (NER)
Building Named Entity Recognition Models Efficiently using NERDS
Industrial strength - Natural Language Processing
Natural Language processing
Asymptotic notation
NAMED ENTITY RECOGNITION
NLP State of the Art | BERT
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
IE: Named Entity Recognition (NER)

What's hot (20)

PDF
Learn Python Programming | Python Programming - Step by Step | Python for Beg...
PPTX
Web development with Python
PPTX
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
PPTX
NLTK - Natural Language Processing in Python
PPTX
PDF
Natural Language Processing (NLP)
PDF
Processing Arabic Text
PPTX
Natural Language Processing: Parsing
PDF
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
PPTX
Artificial Intelligence: Natural Language Processing
PDF
Simplifying Model Management with MLflow
PPTX
Schönhage Strassen Algorithm
PPTX
Nlp toolkits and_preprocessing_techniques
PDF
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
PDF
Syntax analysis
PDF
A Review of Deep Contextualized Word Representations (Peters+, 2018)
PPTX
Lecture 1: Semantic Analysis in Language Technology
PDF
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
PPTX
Symbol Table
PDF
Natural language processing (Python)
Learn Python Programming | Python Programming - Step by Step | Python for Beg...
Web development with Python
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
NLTK - Natural Language Processing in Python
Natural Language Processing (NLP)
Processing Arabic Text
Natural Language Processing: Parsing
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Artificial Intelligence: Natural Language Processing
Simplifying Model Management with MLflow
Schönhage Strassen Algorithm
Nlp toolkits and_preprocessing_techniques
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Syntax analysis
A Review of Deep Contextualized Word Representations (Peters+, 2018)
Lecture 1: Semantic Analysis in Language Technology
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Symbol Table
Natural language processing (Python)
Ad

Similar to Introduction to spaCy (20)

PPTX
Web Dev 21-01-2024.pptx
PDF
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
PPTX
PaaSing Your Code Around
PDF
Building an Apache Sling Rendering Farm
PDF
Top 100 PHP Questions and Answers
PPT
CHX PYTHON INTRO
PDF
[4developers2016] The ultimate mobile DX using JS as a primary language (Fato...
PDF
Extending DevOps to Big Data Applications with Kubernetes
PDF
Mrigendra kumar bharti resume
PDF
ApacheCon 2021 Apache Deep Learning 302
PPTX
Using PnP JS Core in Node.js
PDF
[WebCamp2014] Towards functional web
PPTX
Introduction to the core.ns application framework
PDF
Unleash your Symfony projects with eZ Platform
PDF
Ai meetup Neural machine translation updated
DOCX
Guidelines php 8 gig
PPTX
Building Apis in Scala with Playframework2
DOCX
Ramkumar_python_perl_unix shell script developer
PDF
Knoxbug2016
PDF
DBI for Parrot and Perl 6 Lightning Talk 2007
Web Dev 21-01-2024.pptx
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
PaaSing Your Code Around
Building an Apache Sling Rendering Farm
Top 100 PHP Questions and Answers
CHX PYTHON INTRO
[4developers2016] The ultimate mobile DX using JS as a primary language (Fato...
Extending DevOps to Big Data Applications with Kubernetes
Mrigendra kumar bharti resume
ApacheCon 2021 Apache Deep Learning 302
Using PnP JS Core in Node.js
[WebCamp2014] Towards functional web
Introduction to the core.ns application framework
Unleash your Symfony projects with eZ Platform
Ai meetup Neural machine translation updated
Guidelines php 8 gig
Building Apis in Scala with Playframework2
Ramkumar_python_perl_unix shell script developer
Knoxbug2016
DBI for Parrot and Perl 6 Lightning Talk 2007
Ad

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
cuic standard and advanced reporting.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Modernizing your data center with Dell and AMD
PPTX
Cloud computing and distributed systems.
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Machine learning based COVID-19 study performance prediction
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Advanced IT Governance
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Big Data Technologies - Introduction.pptx
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Unlocking AI with Model Context Protocol (MCP)
Review of recent advances in non-invasive hemoglobin estimation
Network Security Unit 5.pdf for BCA BBA.
Diabetes mellitus diagnosis method based random forest with bat algorithm
cuic standard and advanced reporting.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
MYSQL Presentation for SQL database connectivity
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Modernizing your data center with Dell and AMD
Cloud computing and distributed systems.
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Machine learning based COVID-19 study performance prediction
Chapter 3 Spatial Domain Image Processing.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Advanced IT Governance
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Big Data Technologies - Introduction.pptx
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf

Introduction to spaCy