SC test&debug

Sam Li

2021-10-29

software engineering › 6.031 software construction

R3 测试
R9: 避免调试
- 声明
R13: debugging

everything is hard.

R3 测试¶

Goal: systematic testing

验证包括：

formal reasoning:verification caonstructs a formal proof that a program is correct.
code review: have others read your code
Testing

测试很难👀¶

暴力测试很难进行
haphazard testing(just try it and see if it works)如果程序的错误不明显很难被随意选择的测试例子发现，而假如那么明显程序中只会含有更多错误
- 我一般就是这种？所以根本不会写测试
Random or statistical testing：软件测试在可能的输入项中是不连续和离散的。

测试优先编程¶

terms: module, spec, implementation, test case, test suite
Steps
- Spec: types of parameters and constraints, type of return value and how return value relates to the inputs
- Test: write tests that exercise the spec
- Implement: write the implementation
先写测试。拖到最后测试找bug会很痛苦😖

系统式编程¶

Goal

correct: 当运行时通过所有测试
thorough：有漏洞时报错
Small

> 分区选择测试样例¶

divide the input space into subdomains, which form a partition
Include boundaries in the partition
- Bugs happen at boundaries
  - 0, empty, null
  - Maximum and minimum
  - Emptiness for collection types
  - first and last element of a sequence
- why bugs happen at boundaries?
  - Off-by-one mistake
  - Special case
  - places of discontinuity in the code’s behavior
Use multiple partitions
- 如何划分space of input
  - Cartesian-product
  - 独立地看，然后再考虑交互产生的效果
  - Smaller test suit as possible to cover the input space

> 自动化单元测试: JUnit¶

Unit test: tests for every individual module
all JUnit assertions that compare values: expected first, actual second
Every assertion method accepts an optional message argument which will show when error happens
1
assertTrue(..., "...");

> 记录测试方案¶

document the partitions and subdomains at the top of test class
For each test case: comment which subdomains it covers

public class Multiply {
  /*
   * Testing strategy
   *
   * cover the cartesian product of these partitions:
   *   partition on a: positive, negative, 0
   *   partition on b: positive, negative, 0
   *   partition on a: 1, !=1
   *   partition on b: 1, !=1
   *   partition on a: small (fits in a long value), or large (doesn't fit)
   *   partition on b: small, large
   * 
   * cover the subdomains of these partitions:
   *   partition on signs of a and b:
   *      both positive
   *      both negative
   *      different signs
   *      one or both are 0
   */
  
  
    // covers a is positive, b is negative, 
  //        a fits in long value, b fits in long value,
  //        a and b have different signs
  @Test
  public void testDifferentSigns() {
      assertEquals(BigInteger.valueOf(-146), BigInteger.valueOf(73).multiply(BigInteger.valueOf(-2)));
  }

  // covers a = 1, b != 1, a and b have same sign
  @Test
  public void testIdentity() {
      assertEquals(BigInteger.valueOf(33), BigInteger.valueOf(1).multiply(BigInteger.valueOf(33)));
  }

测试方法¶

> 玻璃箱和黑箱测试¶

Goal: full test cases
Difference:
- black box testing means choosing test cases only from the specification
- Glass box chooses test cases from the implementation of the function

> 覆盖度¶

statement coverage: every statement
- Coverage tool: counts the number of times each statement is run by your test suite
Branch coverage: if/while/…, both the true and false direction
path coverage:combinations of branches

> 单元和集成测试¶

Unit test a single module in isolation
integration test tests a combination of modules
Isolating a higher-level module is hard
- write stub versions of the modules that it calls: mock object

> 自动化回归测试¶

Regression test: test new changes to the code against existing test cases
- new changes: fix bugs, add new features, optimization
- a test is good if it elicits a bug
- save regression tests: avoid reintroducing the bug
- Test-first debugging
automated test: running the tests and checking the results automatically
- Test driver
- using junits
Automate regeression test: use in combination

总结¶

迭代式测试优先编程：

Practice

Write spec
write test that exercise the spec. As u find problems, iterate on the spec and the tests
Write an implementation. As u find problems, iterate on the spec, the tests, and the implementation

writing tests to understand the spec and fix specs
Plan for iteration

large spec: 先写一部分spec，然后写测试和实现那一部分，迭代此步骤
Complex test suite: 选择少部分重要的写一个小的测试集，再写一个简单的实现通过测试，迭代
tricky implementation：先暴力写个解法通过测试以确定真的理解了spec和测试。then move on to the harder implementation.

R9: 避免调试¶

first defense: make bugs impossible

by immutability
static checking.

second defense:localize bugs

by defensive programming such as throwing exception when checking preconditions
By incremental development

一点点写，写一点测试一点（单元测试和回归测试）
By modularity

Modularity means dividing up a system into components, or modules, each of which can be designed, implemented, tested, reasoned about, and reused separately from the rest of the system.
encapsulation

Encapsulation means building walls around a module so that the module is responsible for its own internal behavior, and bugs in other parts of the system can’t damage its integrity.
- Access control: use “public, private, protect” to control the visibility and accessibility of the variables and methods.
- variable scope: minimize the scope of variables
  - Always declare a loop variable in the for-loop initializer.
    1
    2
    int i;
    for (i = 0; i < 100; ++i) {
    Better:
    1
    for (int i = 0; i < 100; ++i) {
  - Declare a variable only when you first need it, and in the innermost curly-brace block that you can.
  - Avoid global variables.

声明¶

java tips:A serious problem with Java assertions is that assertions are off by default.

由于assertion需要额外的时间消耗检视数据，所以在运行时关掉但在测试时最好打开以定位可能的程序设计错误。
1
2
3
4
@Test
public void testAssertionsEnabled() {
    assertThrows(AssertionError.class, () -> { assert false; });
}

what to assert?

Method argument requirements: like preconditions
method return value requirements: postconditions

when to assert?

As you write the code, you can still remember the invariants of the program.

what not to assert?

external conditions

外界条件比如文件是否存在，网络可用性，输入数据的正确性。assertion应该考虑的是程序是否能符合spec的规范。

When an assertion fails, it indicates that the program has run off the rails in some sense, into a state in which it was not designed to function properly.
Expression whether is executed
conditional statements cover all the possible cases: instead, throw an exception in the illegal cases when checking

R13: debugging¶

learn the systematic debugging techniques.

复现问题¶

Have a test case for the bug, and make this test work.

First work on reducing the size of the buggy input to something manageable that still exhibits the same (or very similar) bug.

科学实验法寻找问题¶

Problem: where is the bug?

scientific methods:

1.研究数据¶

查看测试输入导致的问题，检查错误结果，失败声明，和stack traces

2.提出假设¶

think about your program as a flow of data, or steps in an algorithm, and try to rule out whole sections of the program at once. Binary search can help minimize the search space.

Slicing:找到产生出一个特定值的所有计算部分。

如果该计算错误则问题出现在这一部分的某一步骤，那么从错误的结果出发回溯代码中涉及到的所有部分。有利的设计选择有助于帮助缩小检索范围。比如：immutability,这提示我们这一变量是否会受代码的其他部分影响，不过应该注意final修饰的限定变量的不同（final int bonus vs. final Sale s, 尽管s不会重新赋值，但Sale仍可能会改变从而影响s)；scope minimization, 比如local variable和global variable。

With instance variables, the slicing search might have to expand to include the entire class. For a global variable (gasp), the search expands to include the entire program.
Delta debugging: it uncovers two closely-related test cases that bracket the bug, in the sense that one succeeds and one fails.One hypothesis is that the bug lies in those lines of code, the delta between the passing run and the failing run.

利用版本控制来比较两种情况（A成功，B失败；A失败，B成功）下代码的改变情况，知道找到导致出错的问题。
prioritize hypotheses: trust lower levels code until you’ve found good reason not to. Like: java compiler and runtime, os, hardware>well-tested code > your code.

比如在db pj0 autograder导致的一系列问题，实际上是对smart pointer的部分使用不熟。

3.实验¶

Make a prediction and test it. The best experiment is a probe.

print debugging
- print statement: write good debugging print statement to keep track of code
  
  需要注意在测试完成后除去print statement
- Logging
  
  能够从全局用变量控制是否启用logging
  
  A logging framework like Log4j can also direct the logging to a file or to a server across the network, can log structured data as well as human-readable strings, and can be used in deployment, not just development.
assertion

好处是不用手动检查变量，只用设置条件。
breakpoint with a debugger like gdb

对一些方法的分析：

Swap components: don’t do it unless you have good reasons
If you hypothesize that the bug is in a module, a different implementation of it that satisfies the same interface, then one experiment you can do is to try swapping in the alternative.
- If you suspect your binarySearch() implementation, then substitute a simpler linearSearch() instead.
- If you suspect java.util.ArrayList, swap in java.util.LinkedList instead.
one bug at a time
- keep a bug list
- Don’t get distracted from the bug you’re working on.Keep your code changes focused on careful,controlled probes of one bug at a time.
  
  a bug can be reproduce another bug and get into recursive debugging process which is bad cause you don’t know how those changes might affect your debugging experiments.
  
  because you may have a hard time popping your mental stack to return to the original bug. And don’t edit your code arbitrarily while you are debugging, because you don’t know whether those changes might affect your debugging experiments.
don’t fix yet: mere probe > fix the hypothesized bug

fix the real problem.

你可能只是掩盖了错误而不是找到了真正的问题所在。我觉得我在db pj1遇到的就是这样。

4.重复¶

重复以上步骤，如果假设正确缩小产生错误的可能区域，假设错误修改假设以适应观察。

When to apply this method? 10-minute rule. 十分钟肉眼调试调不出来就用
Taking note during this process:

Hypothesis. Based on what you’ve learned so far, what’s your next hypothesis about the location or cause of the bug?

Experiment. What are you about to try that will shed light on the hypothesis, by verifying or falsifying it?

Predictions. What do you expect, based on your hypothesis, to be the result of the experiment?

Observations. What actually happened when you did the experiment?

修复问题¶

直到你找到问题并理解导致原因时，再修改代码。

先问问题是哪种类型的：

coding error
- Misspelled variable/interchanged method parameters
design error -> step back and revisit your design
- Underspecified or insufficient interface

Method：

Look for related bugs, and newly-created ones.
Undo debugging probes.
Make a regression test

make sure that :

(a) the bug is fixed

(b)no new bugs have been introduced

Tips¶

Keep an audit trail: 当进行了很多次科学实验的循环（肉眼调试）找bug

keep a log in a text file of what you dit, in what order, and what happened as a result.
- the hypothesis you are exploring now
- the experiment you are trying now to test that hypothesis
- what you observe as a result of the experiment:
  - whether the test passed or failed this time
  - the program output, especially your own debugging messages
  - any stack traces
check the plug

当反复迭代找问题仍然不可解时，可以怀疑是不是assumption（外置条件）错了。就像电脑无法启动怀疑插头没插电源没开没电了而不是怀疑机器或者开关坏了。比如db的autograder只支持当年的project（函数名正确）。
- Make sure your source code and object code are up to date.
if YOU didn’t fix it, it isn’t really fixed

当你并没有明白问题为什么消失时，问题并没有真正解决而只是暂时藏了起来。尤其在并行编程出现问题的时候。

因此系统的调试帮助我们理解为什么问题被解决而不是暂时隐藏问题。这就是最初要先让问题显现的原因，你必须在问题仍没被解决是发现导致问题的原因，知道你真正理解自己在做什么。

you want to see that your change caused the system to transition from failing to working, and understand why.
Get a fresh view

小黄鸭测试法。解释为什么你的代码可以运行，它在做什么。

小黄鸭->知道你在做什么的同事->staff/mentor
- Minimizing your bug will help you make a minimal,reproducible example to throw at stackoverflow.
Sleep on it. Trade latency for efficiency

总结¶

In this reading, we looked at how to debug systematically:

reproduce the bug as a test case, and put it in your regression suite

find the bug using the scientific method:

generate hypotheses using slicing, binary search, and delta debugging

use minimially-invasive probes, like print statements, assertions, or a debugger, to observe program behavior and test the prediction of your hypotheses

fix the bug thoughtfully