The power of two

Finding the middle ground

Three proposed alternative midpoint functions are given below.

int mid_A(int low, int high)
{
    return low + ((high - low) / 2);
}
int mid_B(int low, int high)
{
    return ((unsigned int)low + (unsigned int)high) >> 1;
}
int mid_C(int low, int high)
{
    return (low & high) + ((low ^ high) >> 1);
}

a. Both mid_A and mid_B work correctly as long as both inputs are non-negative. This constraint is never clearly stated, just implicit in the original context that low and high are array indexes. Things may not be quite so rosy if one or both inputs is negative. Unambiguously identify in which situations a negative input causes trouble for mid_A and demonstrate with a specific call that returns an incorrect result.
b. mid_B also has its own problem with negative inputs. Unambiguously identify in which situations negative inputs are trouble for mid_B and demonstrate with a specific call that returns an incorrect result.
c. Knuth’s mid_C is the real deal, correct result for all inputs, no possibility of overflow! How this approach works is not at all obvious as first glance, so invest some quality time to work through the bitwise manipulation and integer representation until you see how it all fits together. Hint: consider the bitwise representation and its relationship to a binary polynomial/powers of 2. Once you have it figured out, write a short explanation of how and why it works. Your answer may focus solely on the case where both inputs are nonnegative inputs. (While it is possible to generalize to the case when one or more inputs is negative, it is trickier to reason about.)
d. Including an optional haiku about bitwise magic and the genius of Don Knuth will earn a smile from your grader. We’ll post the best ones to the forum!
mid_C是高德纳写的，算法分析的鼻祖，代码的原理，我的理解是这样的，首先观察二进制表示下，数值的加法和除2：加法就是，两个数字对应二进制位都是1时进位，都是0时不变，互异时直接或上去，为1；除2就是所有位向右移位。low & high能取出所有二进制相同的位，而直接与上的作用就是，让需要进位的位不进位，也就是代替了进位后向左移1位又除2向右移1位的操作，而同时为0的情况就自动被处理了，因为所有有效0都在最高的有效1的右边，最高的1没有进位，当然0相当于也自动右移了；后面的(low ^ high) >> 1就取出了所有互异的部分，然后需要除2直接右移1位。这样两部分加起来就完成了除2操作。

Ground zero

The code below was taken from musl (http://www.musl-libc.org) and lightly paraphrased. The task is to determine whether any of the 8 bytes in a long is a zero byte. The simple and naive approach would mask out each byte and test it individually. The function below manages to simultaneously test all bytes. What?! Puzzling out the operation of this extraordinary function is going to take some effort, but I promise it will be worth it!

bool has_zero_byte(unsigned long val)
{
    unsigned long ones = ~0UL/UCHAR_MAX;
    unsigned long highs = ones << (CHAR_BIT - 1);
    return ((val - ones) & (~val & highs)) != 0;
}

a. Describe the bitmask constructed by the expression ~0UL/UCHAR_MAX . The suffixes on the constant 0UL are critical. Explain the effect on the expression if the L were removed. Now explain the effect of removing the U .
b. The original version of the code expressed the second line as:
unsigned long high = (ones * (UCHAR_MAX/2+1))
Demonstrate why the original and replacement expressions are equivalent.
c. Describe what is computed by the expression val - ones .
d. Describe what is computed by the expression ~val & highs .
e. Now take it over the finish line: explain how combining those two expressions produces the desired result.
QQ图片20210409170707.jpg
第一行是64个1除以8个1的11111111，所以第一行的结果是每个byte都是0000 0001，也就是0x01，所以ones就是0x101010101010101。
第二行将ones左移7次，也就是8位的错开了，highs为0x8080808080808080。
第三行，val - ones也就是在每个byte上都减1，如果val某个byte是0，那么就会向高位的byte借位，那么必然在结果中该byte全是1，如果最高位byte全是0，那么就是val小于ones，就会溢出，而这时ones最高位byte只是1，溢出回来同样导致结果最高位byte全是1。~val & highs将val取了反，如果取反前val某个byte最高位是0，那么和highs取&就会得到1，其他非最高位只能是0。所以，如果val某个byte全是0，那么 ((val - ones) & (~val & highs))就是非0，函数返回1。

Review and comment starter code

Linux输入man strtol

Cellular automata

元胞自动机
The Nature of Code Simulating Natural Systems with Processing by Daniel Shiffman (z-lib.org).pdf这本书的第七章讲元胞自动机。

// each cell is drawn as LIVE_STR or EMPTY_STR
// uncomment the alternate #define below to use + as LIVE_STR
// if your terminal doesn't play nice with inverted box
//#define LIVE_STR INVERTED_BOX
#define LIVE_STR "+"
#define EMPTY_STR " "

#define MSB sizeof(long)*CHAR_BIT-1
const int POW2[3] = {1, 2, 4};

void ruleset2array(unsigned char ruleset, int a[]){
/*
translate the bit pattern of ruleset into a bool array
eg:
ruleset 15=00001110
a[] = {0, 1, 1, 1, 0, 0, 0, 0}
*/
  for (int i = 0; i < CHAR_BIT; i++)
    a[i] = (('\1' << i) & ruleset) > '\0';
}

unsigned long acyclicShiftL(unsigned long num, int shift){
// returns num << shift without the cyclic behavior
  if ((shift <= MSB) && shift >= 0)
    return num << shift;
  else
    return 0UL;
}

int findPatternCode(int i, unsigned long gen){
/*
find pattern code
eg if bits are 110, patternCode=6
i: center bit
gen: bit pattern
*/
  int out = 0;
  for (int j = 1; j >= -1; j--)//j is step bit: 1, 0, -1
    out += ((acyclicShiftL(1L, i+j) & gen) > 0) * POW2[j+1];
  //at the start, look at 64th, 63rd, 62nd bits
  //there is no 64th. ok. gives 0 contribution
  return out;
}

unsigned long advance(unsigned long cur_gen, unsigned char ruleset){
// advance cur_gen
  int ruleArray[CHAR_BIT];
  ruleset2array(ruleset, ruleArray);
  unsigned long out = 0;
  for (int i = MSB; i >= 0; i--){
    // start from the 63rd bit
    int patternCode = findPatternCode(i, cur_gen);
    //at the start, look at 64th, 63rd, 62nd bits
    //there is no 64th. ok. gives 0 contribution
    out += (long)ruleArray[patternCode] << i;
    //at the start, feed result to 63nd bit
  }

  return out;
}

void draw_generation(unsigned long gen){
// draw gen, starting from the MSB
  const char *font[] = { EMPTY_STR, LIVE_STR };

  //printf("%s\n", font[gen != 0]);
  for (int i = MSB; i >= 0; i--){
    long bit = 1L << i;
    printf("%s", font[(gen & bit) > 0]);
  }
  printf("\n");
}

元胞自动机的位级实现，代码逻辑很棒。

UTF-8

#define US (unsigned short)

int to_utf8(unsigned short cp, unsigned char seq[]){
// takes in a utf-8 code point
// returns a utf-8 encoded sequence

  //printf("%hu", (short)0x0800);
  unsigned short edges[4] = {US 0x0000,
                             US 0x0080,
                             US 0x0800};
  if (edges[0] <= cp && cp < edges[1]){
    seq[0] = cp;
    return 1;
  } else if (edges[1] <= cp && cp < edges[2]){
    seq[0] = '\xC0' + (char)((cp & US 0x07C0) >> 6);
    seq[1] = '\x80' + (char) (cp & US 0x003F);
    return 2;
  } else if (edges[2] <= cp){
    seq[0] = '\xE0' + (char)((cp & US 0xF000) >> 12);
    seq[1] = '\x80' + (char)((cp & US 0x0FC0) >> 6);
    seq[2] = '\x80' + (char) (cp & US 0x003F);
    return 3;
  } else
    exit(1);
  return 0;
}

Stanford CS107 2021 WINTER 课程笔记