How To Write Pig UDF Example In Java

How To Write Pig UDF Example In Java

What is Pig UDF ?

Generally Pig having some Built-in functions,we can use that Built-in functions for our Pig Script with out adding any extra code but some times user requirement is not available in that built-in functions at that time user can write some own custom user defined functions called UDF (user defined function).Here is the simple steps of How To Write Pig UDF Example In Java.

Steps to create Pig UDF

Step 1 :-

Open your Eclipse and Create a java Class Name like Ucfirst.java

Step 2 :-

You should add jar files to that Project folder like

Right Click on project —> Build Path —> Configure Build Path —> Libraries —> Add External Jars —> Select Hadoop and Pig Lib folder Jars files and Add other Jars files In Hadoop folder —–> Click Ok.

Step 3 :-

Now your Pig java program is supported in your eclipse with out any errors.The basic step in Pig UDF is

public class Ucfirst extends EvalFunc<Class DataType> and you return the value.

How To Write Pig UDF Example In Java

Step 4 :-

public String exec(Tuple input) throws IOException {
if (input.size() == 0)
return null;

Class Name String and The entire row in text file is consider as Tuple and first of all it will check the input is zero or not if the input is zero then it return null.

Step 5 :-

Try Catch Block,we have to write the logic in Try Block

try {
String str = (String) input.get(0);
char ch = str.toUpperCase().charAt(0);
String str1 = String.valueOf(ch);
return str1;

Step 6 :-

Catch Block only for exception Handling

How to Execute this code In Pig UDF ?

Step 1 :-

Right click on program —> Export —> create Jar

Step 2 :-

Register Jarname;

Step 3 :-

Write The Pig Script

REGISTER ucfirst.jar;
A = LOAD ‘sample.txt’ as (logid:chararray);
B = FOREACH A GENERATE myudfs.Ucfirst(logid);
DUMP B;

In the above Script myudfs is Package name and Ucfirst is class name

pig -x local ucfirst.pig

Output

(M)
(S)
(R)
(R)

This is the way to write Pig UDF example in Java

Comments

  1. thanks for the example using pig UDF

  2. Thanks for the example.

    I am practicing below example in my system, where UDF(in PIG) isnt working if the data file contains null values.
    Please let me know what’s wrong here.
    /home/cloudera/pig/climate.txt
    1970,34,Blr
    1959,40,chn
    1940,,del
    1950,,

    Upper.java
    ————–
    package upper;
    import java.io.IOException;
    import org.apache.pig.EvalFunc;
    import org.apache.pig.data.Tuple;

    public class Upper extends EvalFunc
    {
    public String exec(Tuple input) throws IOException {
    if (input == null || input.size() == 0)
    return null;
    try{
    String str = (String)input.get(0);
    return str.toUpperCase();
    }catch(Exception e){
    throw new IOException(“Caught exception processing input row “, e);
    }
    }
    }

    pig -x local
    REGISTER ‘/home/cloudera/pig/Upper.jar';
    A = LOAD ‘/home/cloudera/pig/climate.txt’ using PigStorage(‘,’)
    as (year:int, temp:int, city:chararray);
    B = FOREACH A GENERATE year,temp,Upper(city);

    This isnt working for above dataset. But the same is working if I dont have any null value’s.
    Can you please let me know what’s wrong here.

Speak Your Mind

*