spaCy

spaCy is an open-source, industrial-strength Natural Language Processing (NLP) library for Python. It is designed for building real-world products and performing large-scale information extraction tasks efficiently.

Features

Support for 75+ languages
84 trained pipelines for 25 languages
Multi-task learning with pretrained transformers (e.g., BERT)
Pretrained word vectors
Linguistically-motivated tokenization
Components for:
- Named Entity Recognition (NER)
- Part-of-Speech (POS) tagging
- Dependency parsing
- Sentence segmentation
- Text classification
- Lemmatization
- Morphological analysis
- Entity linking
- Span categorization
Extensible with custom components and attributes
Support for custom models in PyTorch, TensorFlow, and other frameworks
Built-in visualizers for syntax and NER
Easy model packaging, deployment, and workflow management
Production-ready training system
Robust and rigorously evaluated accuracy
State-of-the-art speed
Large Language Model (LLM) Integration:
- The spacy-llm package for integrating LLMs into NLP pipelines
- Modular system for prototyping and prompting
- Structured outputs from unstructured LLM responses, no training data required
Reproducible training for custom pipelines
- Comprehensive configuration system for training runs
- Easily rerun and track experiments
End-to-end workflows:
- Project system for managing data transformation, preprocessing, and training steps
- Source asset download, command execution, checksum verification, and caching
Benchmarks:
- Transformer-based pipelines with state-of-the-art accuracy
- Multiple pre-trained pipelines with published accuracy metrics on datasets like OntoNotes 5.0 and CoNLL-2003
Ecosystem:
- Wide variety of plugins and integrations
- Community resources, online course, and interactive learning tools

Pricing

spaCy is open-source and free to use.

spaCy

About this tool

spaCy

Features

Pricing

Tags

Information

Categories

Tags

Connect with us

Stay Updated

Product

Company

Resources