Skip to main content

Inference

Inference is the phase of the machine learning lifecycle when an already-trained model is applied in a production environment to process new, previously unseen data and make predictions. Unlike the training phase, which requires enormous computational capacity and time to optimize weights, inference aims for fast and efficient responses, often under real-time requirements. Optimizing this process (e.g., model compression, quantization) is critical for reducing costs and improving user experience, enabling AI applications to run on mobile devices as well.