1. Github Slot Filling Tools
  2. Slot Filling Github
  3. Github Slot Filling Software
  4. Github Slot Filling Machine
  5. Slot Filling Github
  6. Github Slot Filling Tool

GitHub - GaoQ1/ner-slotfilling: 中文自然语言的实体抽取和意图识别(Natural Language Understanding),可选Bi-LSTM + CRF 或者 IDCNN + CRF. Dismiss Join GitHub today. GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together. GitHub Gist: instantly share code, notes, and snippets. Evaluateslotfilling (querydatatest, slotdatatest) Sign up for free to join this conversation on.

This model solves Slot-Filling task using Levenshtein search and different neural network architectures for NER.To read about NER without slot filling please address NER documentation.This model serves for solving DSTC 2 Slot-Filling task. In most of the cases, NER task can be formulated as:

Given a sequence of tokens (words, and maybe punctuation symbols)provide a tag from a predefined set of tags for each token in thesequence.

For NER task there are some common types of entities used as tags:

  • persons

  • locations

  • organizations

  • expressions of time

  • quantities

  • monetary values

Furthermore, to distinguish adjacent entities with the same tag manyapplications use BIO tagging scheme. Here “B” denotes beginning of anentity, “I” stands for “inside” and is used for all words comprising theentity except the first one, and “O” means the absence of entity.Example with dropped punctuation:

In the example above, FOOD means food tag, LOC means locationtag, and “B-” and “I-” are prefixes identifying beginnings andcontinuations of the entities.

Slot Filling is a typical step after the NER. It can be formulated as:

Given an entity of a certain type and a set of all possible values ofthis entity type provide a normalized form of the entity.

In this model, the Slot Filling task is solved by LevenshteinDistance search across all known entities of a given type.

For example, there is an entity of “food” type:

chainese

It is definitely misspelled. The set of all known food entities is{‘chinese’, ‘russian’, ‘european’}. The nearest known entity from thegiven set is chinese. So the output of the Slot Filling system will bechinese.

Github Slot Filling Tools

Configuration of the model¶

Configuration of the model can be performed in code or in JSON configuration file.To train the model you need to specify four groups of parameters:

  • dataset_reader

  • dataset_iterator

  • chainer

  • train

In the subsequent text we show the parameter specification in configfile. However, the same notation can be used to specify parameters incode by replacing the JSON with python dictionary.

Dataset Reader¶

The dataset reader is a class which reads and parses the data. Itreturns a dictionary with three fields: “train”, “test”, and “valid”.The basic dataset reader is “ner_dataset_reader.” The dataset readerconfig part with “ner_dataset_reader” should look like:

where class_name refers to the basic ner dataset reader class and data_pathis the path to the folder with DSTC 2 dataset.

Slot Filling Github

Dataset Iterator¶

For simple batching and shuffling you can use “dstc2_ner_iterator”.The part of the configuration file for the dataset iterator looks like:'dataset_iterator':{'class_name':'dstc2_ner_iterator'}

There are no additional parameters in this part.

Github

Chainer¶

Github Slot Filling Software

The chainer part of the configuration file contains the specification ofthe neural network model and supplementary things such as vocabularies.The chainer part must have the following form:

The inputs and outputs must be specified in the pipe. “in” means regularinput that is used for inference and train mode. “in_y” is used fortraining and usually contains ground truth answers. “out” field standsfor model prediction. The model inside the pipe must have outputvariable with name “y_predicted” so that “out” knows where to getpredictions.

The major part of “chainer” is “pipe”. The “pipe” contains thepre-processing modules, vocabularies and model. However, we can useexisting pipelines:

This part will initialize already existing pre-trained NER module. Theonly thing need to be specified is path to existing config. Thepreceding lazy tokenizer serves to extract tokens for raw string oftext.

The following component in the pipeline is the slotfiller:

The slotfiller takes the tags and tokens to perform normalization ofextracted entities. The normalization is performed via fuzzy Levenshteinsearch in dstc_slot_vals dictionary. The output of this component isdictionary of slot values found in the input utterances.

The main part of the dstc_slotfilling componet is the slot valuesdictionary. The dicttionary has the following structure:

Slotfiller will perform fuzzy search through the all variations of allentity values of given entity type. The entity type is determined by theNER component.

The last part of the config is metadata:

It contains information for deployment of the model and urls fordownload pre-trained models.

Github Slot Filling Machine

You can see all parts together in deeeppavlov/configs/ner/slotfill_dstc2.json

Usage of the model¶

Please see an example of training a Slot Filling model and using it forprediction:

This example assumes that the working directory is the root of theproject.

Slotfilling without NER¶

An alternative approach to Slot Filling problem could be fuzzy searchfor each instance of each slot value inside the text. This approach isrealized in slotfill_raw component. The component uses needle inhaystack

The main advantage of this approach is elimination of a separate NamedEntity Recognition module. However, absence of NER module make thismodel less robust to noise (words with similar spelling) especially forlong utterances.

Usage example:

Slot Filling Github

Hi, my name is Yinpei Dai (戴音培), and I am a third year master's student at the Department of Electronic Engineering, Tsinghua University. I am currently working in Speech Processing and Machine Intelligence (SPMI) lab, advised by Professor Zhijian Ou. Prior to that, I received my B.S. degree in Mathematics and Physics from Tsinghua University in 2016.

Github Slot Filling Tool

My main interests are machine learning and natural language processing. I am especially interested in dialogue systems, including spoken language understanding, dialogue management and natural language generation.

Coments are closed
Scroll to top