Thread by @AlejandroPiad, Great question!I'm gonna try to give an alternative (and somewhat novel) point [...]

Alejandro Piad Morffis

AlejandroPiad

Great question!

I'm gonna try to give an alternative (and somewhat novel) point of view, which is not necessarily ready for production scenarios but is something I'm very fond of. It involves #AutoML

First read this thread to understand the setting and the challenges

https://twitter.com/haltakov/status/1304507409256374272

https://twitter.com/haltakov/status/1304507409256374272

One major problem here is that the performance of our system as a whole does not depend on *just* the machine learning component.

Or said in a different way, the metric we want to optimize (final performance) is not a differentiable function of your input data (the image).

There are a couple ways we can deal with optimizing non-differentiable pipelines.

My preferred one: use an AutoML optimizer (<insert a blatantly self-indulgent link to @auto_goal here>).

This allows us to simultaneously train the ML component (as regularly) and the rest.

This way, the full system is optimized with respect to any black-box metric, while each component is tuned with the most efficient way we know for that specific sub-problem.

There still remains the problem of deciding which metric to optimize in the ML semaphore detector.

Now here comes the magic.

We can transform that decision into an optimizable variable itself.

We train the detector as usual but passing some parameterized "metric" that comes from "outside" the ML domain.

For example, a loss function that minimizes alpha*FN + (1-alpha)*FP.

Then, we make "alpha" a parameter of the AutoML optimizer. It doesn't matter that the relation between "alpha" and the final performance is non-differentiable. That is not required.

The overall metric to optimize is, I guess, something like minimizing the number of crashes.

At this point, we have two layers of optimization, a nested loop if you will.

The inner loop receives a value for "alpha" and trains the neural network as usual.

The outer loop selects a value for "alpha", calls the inner loop, puts that trained NN in the car, and evaluates.

The evaluation can be anything, even if non-differentiable. For example, we can take that NN, put it in a simulated car in GTA, run a few simulations and see how many pedestrians we kill.

Here "alpha" is a hyper-parameter, it controls indirectly how the model behaves.

Now it should be intuitive that we can change that internal loss function as we want. We can select from a list of loss functions and compose them in interesting ways.

As long as we can describe a parameterizable search space of hyperparameters, we can optimize it with AutoML.

Dunno how well this "solution" will work, though. That is something the AutoML community is still working on, getting these ideas out of theory and lab problems, and actually solving real-world problems.

I have a lot of optimism here, but there is still a long road ahead.

Finally, check @svpino's answer, which comes from a very different perspective, as someone who was actually worked on these types of problem for real: https://twitter.com/svpino/status/1304515899475591168?s=20

https://twitter.com/svpino/status/1304515899475591168?s=20

You can follow @AlejandroPiad.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: