API

This part of the documentation lists the full API reference of all public classes and functions.

Word Embedding Representation

Embedeval uses a generic internal representation for Word Embeddings. This representation is not dependent on the source format of the Embedding itself, to give a Task the chance to be Embedding format independent. However, a Task may reference and therefore depend on the source format of the Embedding.

class embedeval.embedding.WordEmbedding[source]

Representation of a loaded immutable Word Embedding

This interface should be implemented to represent concrete parsed Word Embeddings of a particular type.

A Word Embedding always consists of a one-dimensional vector of words and a n-dimensional vector representing the position in the vector space for each word.

get_word_vector(word: str) → numpy.array[source]

Get the word vector for the given word from Word Embedding

get_words() → List[str][source]

Get a list of all words in the Word Embedding

path

Get the path to the Word Embedding file

shape

Get the shape of the Embedding.

The first value in the tuple is the amount of words and the second value the vector size of each word.

Task API

The Tasks are the heart piece of Embedeval. The Task must be subclassed by a concrete implementation of an evaluation Task to implement the evaluation algorithm.

class embedeval.task.Task[source]

Base Class for the Task API

Subclass this Task to automatically register an evaluation Task to the Task Registry.

The Task Evaluation Algorithm must be implemented in the evaluate() method.

NAME = ''

Holds the name for this Task. This name is used for the discovery.

evaluate(embedding: embedeval.embedding.WordEmbedding) → embedeval.taskreport.TaskReport[source]

Evaluate this Task on the given Word Embedding

The evaluation algorithm should always produce some kind of comparable statistics or measures which can be provided to the user to verify the quality of the given Word Embedding.

This measure must be returned as a string from this method.

It should contain everything needed by the user to verify the Embedding.

Implemented Word Embedding Parsers

Embedeval implements parsers for some well-known Word Embedding formats.

class embedeval.parsers.word2vec_gensim.KeyedVectorsWordEmbedding(path, keyed_vectors)[source]

Represents a word2vec KeyedVectors specific Word Embedding

The word2vec file will be parsed by gensim.

The gensim KeyedVectors instance is made available in the self.keyed_vectors attribute.

get_word_vector(word: str) → numpy.array[source]

Get the word vector for the given word from Word Embedding

get_words() → List[str][source]

Get a list of all words in the Word Embedding

keyed_vectors = None

Holds the gensim KeyedVectors instance

path

Get the path to the Word Embedding file

shape

Get the shape of the Embedding.

The first value in the tuple is the amount of words and the second value the vector size of each word.

class embedeval.parsers.word2vec_simple.SimpleWordEmbedding(path, word_vectors)[source]

Represents a word2vec specific Word Embedding

This Word Embedding should only be used for small datasets as it’s purely implemented in Python and therefore somewhat slow.

get_word_vector(word: str) → numpy.array[source]

Get the word vector for the given word from Word Embedding

get_words() → List[str][source]

Get a list of all words in the Word Embedding

path

Get the path to the Word Embedding file

shape

Get the shape of the Embedding.

The first value in the tuple is the amount of words and the second value the vector size of each word.

Top-Level Package Exports

The embedeval Python package exports the Task API as top-level names:

from embedeval import (
    Task,
    TaskReport,
    EmbedevalError
)