top of page


Author : Het Patel

Conditional Random Fields is a probabilistic discriminative graphical model that has many applications in gene prediction, image recognition, natural language processing, neural sequence labelling, etc. You must have guessed that they are usually used to learn sequenced information.

What is a graphical model?

A graphical model is a probabilistic model which uses conditional dependence between random variables. There are two types of graphical models, Bayesian network and Markov Random Fields. CRFs falls under the category of Markov Random Fields. CRFs are used when data about neighbouring labels are important for calculating a label for a particular item.

What is a discriminative model?

There are two kinds of models, Generative and Discriminative Models:

  • Generative models describe how a label vector y can probabilistically create a feature vector X. Example Naive Bayes.

  • Discriminative models describe how to input feature vector X and give them an output vector y. Example Logistic Regression.

Mathematics behind CRFs

CRFs are used to when the input data is co-related with each other i.e. It must consider previous information when predicting on new data. Also note that both input and output is a sequence of data.

For this we will use a function f with multiple input with outputs a sequence:

  • fX,i,yi-1,yi

  • X = Input vectors

  • i = Position of the point we want to predict

  • yi-1 = Label of point i-1

  • yi= Label of point i

  • The conditional probability is given as p (y | X)

  • X = input sequence

  • y = output vector

To get the desired sequence we must maximize the probability.

y'=argmax p(y|X)

Here, Z(X) is the normalization function:


Maximum likelihood estimation is used to find the parameter λ with the help of the loss function as follows:

LX, λ,y=-lok=1mpXk,L(X,λ,y)=-k=1m log⁡1ZXmexp⁡i=1n j jfiXm,i,yi-1k,yik
L(X,λ,y)=-1mk=1m Fjyk,Xk+k=1m py∣Xk,λFjy,Xk
Where Fj(y,X)=j=1n fi(X,i,yi-1,yi}

To update to the optimal value of λ we will use gradient descent as follows:

λ=λ+αk=1mFjyk,Xk+k=1m py∣Xk,λFjy,Xk

Some Application of CRFs

Let us understand the application by a simple problem:

Given two identical dice, one is unbiased and other is biased. Given a sequence of X rolls, predict which dice I used for each roll.

To solve this problem, we can take help of CRFs.

So, to conclude, CRFs are often used in NLP. For example, Parts-of-Speech. Parts of speech of a sentence depends on previous words, and by using feature functions that takes help of this, we can use CRFs to distinguish which words of a sentence correspond to which POS. Another example is Named Entity recognition and getting Proper nouns.

CRFs can be used to find any sequence in which different variables are interdepended.

Classification using CRFs

Now finally, we can implement CRFs for a text classification problem.


  1. Annotating training data

  2. Annotations using general architecture for text engineering

  3. Building and Training

  • Annotating training data

Annotation is a process of tagging the word with its tag. The annotated text needs to be in an XML format so it can be trained on the model.

  • Annotations using general architecture for text engineering

Download annotated emails/texts in XML format in one folder using the GATE framework.

Link to GATE:

  • Building and Training

Download pycr. (pip install python-crfsuite)

Here’s the structure of the code for reference:

  1. Importing all the modules.

  2. Define the helper functions.

  3. Import the annotated training set.

  4. Features generation.

  5. Create train and test data set.

  6. Test the model.

  7. Performance of the model.

  8. Predict on new data.











15 views0 comments

Recent Posts

See All
bottom of page