Hive - Hive函数 - 《Coco的技术宝典》

已有函数
自定义函数

已有函数

查看所有函数

show functions;

查看函数定义和解释

hive> desc function to_unix_timestamp;
OK
to_unix_timestamp(date[, pattern]) - Returns the UNIX timestamp
hive> desc function extended to_unix_timestamp;
OK
to_unix_timestamp(date[, pattern]) - Returns the UNIX timestamp
Converts the specified time to number of seconds since 1970-01-01.

自定义函数

Java实现

第一步：编写UDF，并编译成jar包

package com.example.hive.udf;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
public final class Lower extends UDF {
    // 可重载evaluate方法。
    // evaluate方法不能返回void类型。可以返回null。
    public String evaluate(final String s) {
        if (s == null) {
            return null;
        }
        return s.toLowerCase();
    }
}

<dependency>
  <groupId>org.apache.hadoop.hive</groupId>
  <artifactId>hive-exec</artifactId>
  <version>0.7.1</version>
  <scope>provided</scope>
</dependency>
<build>
    <plugins>
      <plugin>
        <artifactId>maven-assembly-plugin</artifactId>
        <executions>
          <execution>
            <id>make-assembly-jar</id>
            <phase>package</phase>
            <goals>
              <goal>single</goal>
            </goals>
            <configuration>
              <finalName>${project.artifactId}</finalName>
              <appendAssemblyId>false</appendAssemblyId>
              <descriptorRefs>
                <descriptorRef>jar-with-dependencies</descriptorRef>
              </descriptorRefs>
            </configuration>
          </execution>
        </executions>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <configuration>
          <source>1.8</source>
          <target>1.8</target>
        </configuration>
      </plugin>
    </plugins>
  </build>

第二步：将jar包加入Hive classpath

hive> add jar /tmp/my_jar.jar;
hive> list jars;
-- hive0.13开始，可以在注册UDF时指定jar的位置
CREATE FUNCTION myfunc AS 'SomeClass' USING JAR 'hdfs:///path/to/jar';

第三步：注册UDF

CREATE TEMPORARY FUNCTION my_lower AS 'com.example.hive.udf.Lower';
-- 从hive 0.13开始，也可以注册永久函数(可注册到当前db，也可指定db)
CREATE FUNCTION my_db.my_lower AS 'com.example.hive.udf.Lower';

Python实现

第一步：编写脚本

#!/usr/bin/python
import sys
for line in sys.stdin:
    print line.lower()

第二步：添加文件

ADD FILE /tmp/lower.py

使用：

select transform(<columns>)
using 'python <python_script>'
as (<columns>)
from <table>;
select transform(t)
using 'python lower.py'
as (lowered_t string)
from my_table;

两种实现方式比较

Java需要饮用包含Hive API的外部jar包，而Python无需引用外部包；
Java实现UDF后需打成jar包，Python直接上传脚本文件即可；
Java实现UDF，可以作用域一条记录的指定列数据，输出结果也可以直接在HiveQL的WHERE用作判断条件。Python实现的UDF，必须批量读取和输出指定列，不能方便地作用于某个列。