Talk:Black-Box Optimization using Bayesian Optimization

From UBC Wiki

Contents

Thread titleRepliesLast modified
Critique 1003:21, 14 February 2023
Critique 2003:05, 14 February 2023
Some feedback001:31, 14 February 2023

Critique 1

Super interesting article! some minor feedback points,

  • You can maybe start with an example where someone might be needing bayesian optimzation to better motivate the article.
  • Maybe correct the definition of a black box function or specify that it is specific to bayesian optimization. Do all black box functions map from R^d to R? Not super certain about this statement.
  • Could you introduce what objective functions in BO look like before the surrogate model section? Maybe give an example to further make it easier for the reader to get it.
  • Adding to David’s points about EI, could you explain how the first term signify exploitation and second term exploration?
NIKHILSHENOY (talk)03:21, 14 February 2023

Critique 2

Easy to understand with the example presented. Apart from the comments from David, I would like to add upon the following comments:

1.I think you need to explain more about the iterations graphs (at the end). It would be best if you also put the labels in the different axis of the plots. 2. It would be better if you can add a conclusion for the article. 3. Please explain the Noisy objective function plot. And add the labels on the graph. 4. It was confusing how BO updates the GP model in every iteration. Does it retrains the models ?

ANUBHAVGARG (talk)03:05, 14 February 2023

Some feedback

  • You have a funny definition of the black box functions for which Bayesian optimization is applicable. It requires smooth functions with bounded derivatives. It doesn't work for functions such as 1 for irrational numbers and 0 on rational numbers. I am not sure why you think "Black-box functions are usually complex"; it is more than BO methods are particularly useful for functions that are expensive to evaluate and for which there are only noisy observations and for which it is difficult/impossible to compute the derivative (as gradient-based methods are not applicable).
  • Separate what it is trying to do -- maximize a function given only noisy observations of the value some few points, while minimizing the number of observations -- from how (using a surrogate model)
  • I thought that it *is* "easy to mathematically model the Neural Network's accuracy" -- it is just difficult/expensive to compute.
  • You use "optimize" then do a maximize, without telling us that the aim is to maximize.
  • It is okay to just explain Expected Improvement, but then it should be explained better than "considering the highest expected improvement over the current best observation"; you say formally how it defined, but we need a better intuition as to what expected improvement is meant to measure (why does the first term represent exploration and the second exploration?)
  • In your pseudo-code, what is "bounds"? The two blocks of pseudo-code need to be related.
  • In your end-to-end example, the function is a funny example as it doesn't have an obvious maximum in the plot. (Tell us where the maximum is!) It might be better to have an example that goes up and down multiple times.
  • Your prior (that gives the variance) seems very poor. Perhaps justify how a prior can be obtained.
  • Explain the qualitative difference in the left plot between iterations 7 & 8 (the plots go from rectangles to curves), and between 5 & 6 (the error bars get very big).
  • The English needs fixing.
  • I don't think you need more, it just needs to be explained more clearly.
DavidPoole (talk)01:29, 14 February 2023