Go 101

Go basics

Following a tour of go, the notes are taken at this link: https://github.com/sgzerolc/go101

  1. Types

    Character sets: ASCII: 0-127; Unicode defines code point.

    UTF-8 encoding: English remains the same. The char under 128 takes up one byte. Only char >=128 will take >= 2 bytes.

    The problem is how many bytes should a char consume? To interpret a string, we need to know how it is encoded first. Go uses UTF-8 encoding.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    bool
    string

    int int8 int16 int32 int64
    uint uint8 uint16 uint32 uint64 uintptr

    byte // alias for uint8
    rune // alias for int32; represents a Unicode code point
    float32 float64
    complex64 complex128

    fmt.Printf() prints the string, while fmt.Sprintf() creates a string and returns it.

  2. style: naming, go uses camel style while c uses snake style.

  3. const: iota, Greek letter (9th), representing incremental values. When it is used, it simplifies the creation of incrementing values.

  4. defer: last-in-first-out order. Execute after return before control is returned to the caller.

    1
    2
    3
    4
    5
    go func() {
    mu.Lock()
    defer mu.Unlock()
    // modifies data
    }()

References:

  1. go tutorial, a tour of go, gobyexample
  2. how to write go code, memory model
  3. http://nil.csail.mit.edu/6.824/2022/notes/l-rpc.txt

Intro

1
2
3
go build xx.go
go run xx.go
go test -run singleTest
  1. Go is better than C++ in a way of threads (goroutines) and garbage collection.
  2. The go runtime executes goroutines on all available cores in parallel. If the cores are less than runnable goroutines, the runtime will pre-emptively time-share the cores among threads.
  3. Channels, queue-like synchronization primitive, are synchronized between many threads. It means that some goroutine should be waiting when a goroutine sends a message on the channel. At a high level, consider a channel as a struct holding a buffer and a lock.
  4. Go’s sync.Cond, conditional variables.
  5. Go’s select
  6. sync.WaitGroup: waits for multiple threads to finish.
  7. Do a periotic task: time.Sleep()
  8. The view of performance on threads:
    • how many threads can be run in parallel? If the machine has 16 cores, then it’s 16 threads (goroutines).
    • how many threads are needed to take up the full capacity of one service? (100/(100*0.1) = 10 threads)
  9. Channels only work within one program. RPC allows talking to other programs over the internet.
  10. A slice contains a pointer to an array, a start and end index to the array. It is more flexible than go array.
  11. Printf is a common debugging tool.
  12. synchronous RPC calls vs asynchronous RPC calls
    • It depends which one and when needs the reply. When the reply is needed to proceed, use synchronous calls. However, in a server-client model, the client can launch many RPCs by asynchronous calls, giving the client has other tasks on its plate or the connection to server is broken from time to time.
    • Send one RPC without waiting for the result: create a goroutine and have it make a synchronous call to wait.
  13. Go has generics for what c++ style inheritance support.
  14. A mutex in a struct cannot be made a value receiver. The mutex is copied and loses its purpose.
  15. DRF-SC: data-race-free programs execute in a sequentially consistent manner.
  16. Don’t write infinite loops
  17. select and timeout: select must send or receive values from channels.
  18. Goroutines: when a function creates a goroutine and returns, the goroutine will continue running. Goroutines run independently of the function that created them. They will only stop when:
    • The goroutine’s function completes execution
    • The entire program exits
    • The goroutine is explicitly stopped

Go memory model: -> How to write threaded code?

The Go memory model specifies the conditions under which reads of a variable in one goroutine can be guaranteed to observe values produced by writes to the same variable in a different goroutine.

It means to serialize a program that modifies data accessed by multiple goroutines using synchronization primitives.

synchronized before: Cond arranges that a call to [Cond.Broadcast] or [Cond.Signal] “synchronizes before” any Wait call that it unblocks.

The relation A happens before B: B observes what A has observed. “All code after B must be able to observe everything up to A.”

common bugs

  1. i of a closure references outer i which has been mutated

  2. WaitGroup is reused before previous Wait has returned in multiple go routines

    1
    2
    3
    4
    for {
    wg:=sync.WaitGroup{}
    go funcfile2(&wg)
    }
  3. the initialization statement is only executed once before the loop starts.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    func main() {
    // Initialization statement runs only once
    for s, val := check(); s != 1; {
    fmt.Println("Inside loop:", s, val)

    // Update the loop variable manually (or it would be an infinite loop)
    s++
    }
    }
  4. When using rpc, encode a struct (such as a mutex) which has no exported fields. Unexported fields (starting with lowercase) are private to the package where they are defined.

  5. xx

debugging

  1. Printf: Use DPrintf in util.go. Can put in test_test.go to see the phase of the test
    ex: DPrintf(“TESTACTION: leader disconnects”)

  2. use -race to help detect data races

  3. SIGQUIT: the default SIGQUIT handler for Go prints stack traces for allgoroutines (and then exits)

    Ctrl+\ will send SIGQUIT to current process

  4. Dealing with leaking goroutines

    • use ps to see the running processes

    • send SIGQUIT or SIGKILL using kill -QUIT pid or kill -KILL pid

  5. parallel: running tests in parallel makes it easier to find concurrency bugs

syntax

  1. The ... is Go’s variadic parameter syntax - it allows a function to take a variable number of arguments.
  2. x

Threads

  1. per-thread state: program counter, stack, registers, returned value

  2. alternative to threads: event-driven: code that explicitly interleaves activities, in a single thread

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    func isYellow() {
    while true:
    wait_for(Yellow)
    return
    }
    func isWhite() {
    while true:
    wait_for(White)
    return
    }

    main() {
    while events {
    isYellow(evens_i)
    isWhite(events_i)
    }
    }
    • keep a table of state about each activities
    • one event loop checks for new input, advances to next step and updates state.
    • the good side is getting I/O concurrency without thread costs. The bad side is no real multi-core speedup and hard to program.
  3. Takeaways from web crawler example: I use channel + waitgroup, terminated by a thread that keeps waiting for fetches.

    • A: serial crawler: depth first, recursively call Fetch() + map.
    • B: concurrent mutex: waitgroup + map with mutex, parallel Fetch() after first fetch.
    • C: concurrent channel: channel + map

    A+B: Terminated by exhaustion and synchronized by wg.Wait().

    B requires mutex lock. Because there may be two threads of Fetch accessing same url of map (shared). Only one of them should read map[url] = false. C doesn’t use lock because there is no sharing of map.

    C: Terminated by using local counter of goroutines. Why break if n == 0?

    1
    2
    3
    4
    n = 1 (first worker 1)
    n += 1 (first worker: starts its worker 2. Let's say it has one worker) -> n = 2
    n -= 1 (first worker finishes its job) -> n = 1
    when worker 2 has done, n -= 1 -> n = 0, terminates.
  4. locks + sharing V.S. channels

    • state – sharing and locks
    • communication – channels

    (F) For the 6.824 labs, I recommend sharing+locks for state, and sync.Cond or channels or time.Sleep() for waiting/notification.

    (Robert) I personally avoid channels for the most part and program with shared memory and use (mutexes and condvars) instead

  5. go run -race crawler.go, detects races even when output is correct

  6. Lock of Go does not enforce any relationship between locks and data: locks protect invariants, not “locks protect access to shared data”

  7. Condition variables: wait, signal, broadcast

    how to avoid bugs: 1) lock around use. 2) check condition (a read) in loop

  8. Channels: an object that can communicate and synchronize.

    • Channels are cheap
    • Sender blocks until the receiver receives
    • avoid deadlock
  9. immediately-invoked function expression: func is an “anonymous function”. It creates a lexical scope that is local

    1
    2
    3
    4
    5
    func parent() {
    go func(u string) { // parameters
    fmt.Printf("%s\n", u)
    }("dog") // -> argument to pass in
    }

RPC

  1. RPC is a piece of distributed system machinery that provides an esay interface of client-server communication

  2. A toy key-value server: It does Put(key,value), Get(key)->value.

    • Declare Args and Reply struct for each server handler.
    • Client: connect() -> Dial(); get(), put(); Call() asks the RPC library to perform the call. The library takes in arguments, sends requests, waits and puts the output to reply.
    • Server: it must declare an object with methods as RPC handlers; register() -> net.Listen(), server accepts network connections; Get(), Put() must lock; The library creates a new goroutine for each request, dispatches the named object, and writes the reply to the network connection.
    • binding to which server: determined by the arguments to Dial() for go’s RPC that are server name/port. Name or configuration server for big systems.
    • marshalling: format data into packets
      • can pass: strings, arrs, objects, maps, &c
      • cannot pass: channels or functions
  3. Go RPC uses at-most-once form

failure handling

  1. (simple) best-effor RPC: Calls() waits for response until timeout. If none arrives, re-send the request. Repeat a few times. Give up and return an error. incorrect for many situations. It’s ok only when the operations take no effect if repeated such as read-only operations.

  2. at-most-once RPC: client re-sends req if no anwer while server detects duplicate requests and returns previous reply instead of following ones.

    • How to avoid duplicates? Client attaches unique ID (XID) with each req to identify duplicate ones.
    • What if two clients use the same XID? Use big random number
    • How to avoid a huge seen [xid] table? Per-client RPC seq numbers and client includes “seen all repliest <= (before) X” with every RPC. The server can keep O(# clients) state rather than O(# XIDs)
    • Server must discard info about old RPCs or old clients. When to discard?
    • How to handle a duplicate req while the previous one is executing? “Pending” flag per executing RPC; wait or ignore
  3. exactly once:

    • unbounded retries
    • duplicate detection
    • fault-tolerant service