1. Tips

/short statement is better
collapse the two sql statements from before into one
/

2. SQL Window Functions

Case1

•Business Use Case – We want to isolate only rows for employees with the their most recent salaries

•Could be still
active employees, i.e. to_date = ‘9999-01-01’ or employees who have left the company**, what was their salary when they left?

Screen Shot 2020-04-28 at 6.37.10 PM.png

3. Fact and Demension Table Structures

basic fact table techniques

1.fact types

1) Additive facts

can be summed across any of the dimensions associated with the fact table.
eg: Sales can be summed by weeks, months, years

2) Semi-additive

can be summed across some dimensions, but not all
eg: daily balance in checking amount
Screen Shot 2020-04-28 at 7.18.41 PM.png

3) Non-additive

such as ratios or percentages.
eg: it doesn’t make sense that discounts of different items are added together

2.table types

1) Transaction fact table

2) Periodic snapshot fact table

summarizes many measurement events occuring during over a standard period

3) Accumulating snapshot fact table

a sequence of events captured in the same row

4) Factless table

Screen Shot 2020-04-28 at 7.22.26 PM.png

5) Transaction fact table VS Periodic snapshot fact table

Screen Shot 2020-04-28 at 7.21.37 PM.png
for periodic snapshot fact table, even during a period nothing happens, it still has a record.

Screen Shot 2020-04-28 at 7.39.48 PM.png

6) Accummulating snapshot fact table

Screen Shot 2020-04-28 at 7.40.31 PM.png

3.Primary Keys to use in Dimension Tables

1) Natural Key

  • Found in the raw data
  • identifies unique row (e.g. CPT Code from lecture 2, or city from HW1 location table)

    2) Surrogate Key

  • Artificially generated, usually an integer from 1 to n (e.g. the Index ID from HW1)

  • can’t be the Natural Key

    4.Dimension Types

    1) Snowflaked Dimensions

    Hierarchical relationship is normalized across several tables, e.g. city table links to state on state_id per HW1

2) De-normalized Flattened Dimension

Intentionally violate 3NF for ease of use, all attributes available on one row in the dimension table

3) Multiple Hierarchies in Dimensions

e.g. geography

4) Calendar Date Dimensions

attached to fact tables to allow dates, months, quarters, years, week start date, etc

5) Degenerate Dimension

– no content except for the PK, usually stored in a fact table but nowhere else

6) Junk Dimensions

misc. low cardinality flags all stored together in one junk dimension (i.e. not a single entity)

4. ERD补充

  • 两个pk组成一个unique pk,不一定要重新设一个unique pk
  • char和varchar区别

Screen Shot 2020-04-28 at 8.27.13 PM.png

  • non-identifying relationship ——> dash line join ❌
  • 一个表中可以有多个pk,共同定义行

Screen Shot 2020-04-28 at 8.27.16 PM.png