What’s the difference between fit and fit_transform in scikit-learn models?


To center the data (make it have zero mean and unit standard error), you subtract the mean and then divide the result by the standard deviation:x′=x−μσx′=x−μσ

You do that on the training set of the data. But then you have to apply the same transformation to your test set (e.g. in cross-validation), or to newly obtained examples before forecasting. But you have to use the exact same two parameters μμ and σσ (values) that you used for centering the training set.

Hence, every scikit-learn’s transform’s fit() just calculates the parameters (e.g. μμ and σσ in case of StandardScaler) and saves them as an internal object’s state. Afterwards, you can call its transform() method to apply the transformation to any particular set of examples.

fit_transform() joins these two steps and is used for the initial fitting of parameters on the training set xx, while also returning the transformed x′x′. Internally, the transformer object just calls first fit() and then transform() on the same data.

Similar Posts