Oh ok, you want to claim this is compressing the entirety of the internet in a model that isn’t even 1 terabyte of data and be unimpressed that is something.
But it isn’t compression. It is a mathematical fact that neural networks are universal function approximators, this is undisputed, and analytic functions are continuous so to be an analytical function approximator it must be able to fill in the gaps between discrete data points by itself, which necessarily means spiting out data outside of the input distribution, data it has not seen.
Oh ok, you want to claim this is compressing the entirety of the internet in a model that isn’t even 1 terabyte of data and be unimpressed that is something.
But it isn’t compression. It is a mathematical fact that neural networks are universal function approximators, this is undisputed, and analytic functions are continuous so to be an analytical function approximator it must be able to fill in the gaps between discrete data points by itself, which necessarily means spiting out data outside of the input distribution, data it has not seen.