API¶
This part of the documentation lists the full API reference of all public classes and functions.
Word Embedding Representation¶
Embedeval uses a generic internal representation for Word Embeddings.
This representation is not dependent on the source format of the Embedding itself,
to give a Task
the chance to be Embedding format independent.
However, a Task may reference and therefore depend on the source format of the
Embedding.
-
class
embedeval.embedding.
WordEmbedding
[source]¶ Representation of a loaded immutable Word Embedding
This interface should be implemented to represent concrete parsed Word Embeddings of a particular type.
A Word Embedding always consists of a one-dimensional vector of words and a n-dimensional vector representing the position in the vector space for each word.
-
get_word_vector
(word: str) → numpy.array[source]¶ Get the word vector for the given word from Word Embedding
-
path
¶ Get the path to the Word Embedding file
-
shape
¶ Get the shape of the Embedding.
The first value in the tuple is the amount of words and the second value the vector size of each word.
-
Task API¶
The Tasks are the heart piece of Embedeval.
The Task
must be subclassed by a concrete implementation of
an evaluation Task to implement the evaluation algorithm.
-
class
embedeval.task.
Task
[source]¶ Base Class for the Task API
Subclass this Task to automatically register an evaluation Task to the Task Registry.
The Task Evaluation Algorithm must be implemented in the
evaluate()
method.-
NAME
= ''¶ Holds the name for this Task. This name is used for the discovery.
-
evaluate
(embedding: embedeval.embedding.WordEmbedding) → embedeval.taskreport.TaskReport[source]¶ Evaluate this Task on the given Word Embedding
The evaluation algorithm should always produce some kind of comparable statistics or measures which can be provided to the user to verify the quality of the given Word Embedding.
This measure must be returned as a string from this method.
It should contain everything needed by the user to verify the Embedding.
-
Implemented Word Embedding Parsers¶
Embedeval implements parsers for some well-known Word Embedding formats.
-
class
embedeval.parsers.word2vec_gensim.
KeyedVectorsWordEmbedding
(path, keyed_vectors)[source]¶ Represents a word2vec KeyedVectors specific Word Embedding
The word2vec file will be parsed by
gensim
.The gensim
KeyedVectors
instance is made available in theself.keyed_vectors
attribute.-
get_word_vector
(word: str) → numpy.array[source]¶ Get the word vector for the given word from Word Embedding
-
keyed_vectors
= None¶ Holds the gensim KeyedVectors instance
-
path
¶ Get the path to the Word Embedding file
-
shape
¶ Get the shape of the Embedding.
The first value in the tuple is the amount of words and the second value the vector size of each word.
-
-
class
embedeval.parsers.word2vec_simple.
SimpleWordEmbedding
(path, word_vectors)[source]¶ Represents a word2vec specific Word Embedding
This Word Embedding should only be used for small datasets as it’s purely implemented in Python and therefore somewhat slow.
-
get_word_vector
(word: str) → numpy.array[source]¶ Get the word vector for the given word from Word Embedding
-
path
¶ Get the path to the Word Embedding file
-
shape
¶ Get the shape of the Embedding.
The first value in the tuple is the amount of words and the second value the vector size of each word.
-
Top-Level Package Exports¶
The embedeval
Python package exports the Task API as top-level names:
from embedeval import (
Task,
TaskReport,
EmbedevalError
)