python连接

  • 开始之前还要装一些依赖
  1. pip install sasl
  2. pip install thrift
  3. pip install thrift-sasl
  4. pip install PyHive
  1. from pyhive import presto
  2. conn = presto.connect(protocol='https', host="~.com", port=~, username="你的账号", password="你的密码")
  3. cursor = conn.cursor()
  4. sql = "select * from hive.tmp.adjust_adid0408 limit 10"
  5. cursor.execute(sql)
  6. res = cursor.fetchall()
  7. print(res)
  8. for i in res:
  9. print(i)
  10. # 转为DataFrame
  11. df = pd.DataFrame(res)
  12. # 列名还要自己写,100多个字段难搞

使用python虽然可以连接,但是再数据分析上还是pandas用起来爽

pandas连接

  1. from pyhive import presto
  2. import pandas as pd
  3. conn = presto.connect(protocol='https', host="~.com", port=~, username="#", password="#")
  4. df = pd.read_sql_query("select * from hrder_detail limit 20", conn)
  5. # df = pd.read_sql("select * from hrder_detail limit 20", conn)
  6. df

python连接hive,并通过pandas进行分析 - 图1