OVER clause is useful for windowing and analytics.
In practice, it feels like a STATA statement:
// STATA CODE
bysort varlist: egen varname = func() ...
Read these documents below for further information.
- Official documentation: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=31819589
- HIVE OVER example in practice: https://blog.csdn.net/sherri_du/article/details/53312085
- Summary on HIVE Windowing and Analytics: https://www.jianshu.com/p/12eaf61cf6e1 ★
Including row_number()、rank()、dense_rank(), etc. - A short story of row_number(), rank() and dense_rank(): https://www.cnblogs.com/cc11001100/p/8978279.html
Earlier, when I wanted to do a random sample by group, I should have made it this way, by using rand() function.
SELECT *
FROM (
SELECT varlist,
row_number() over(PARTITION BY dim ORDER BY rand()) AS rn FROM ...
) AS tmp
WHERE tmp.rn > 10