NanoNets OCR for Handwritten Notes

Nanonets has released Nanonets-OCR-s, a state-of-the-art small 3B image-to-markdown OCR model that goes far beyond traditional text extraction. The model is available on Hugging Face and integrated with their docext tool for immediate use. Medium post can be found here.

Image Segmentation with PaliGemma 2 mix, Transformers, Docker, FastAPI, and GitHub Actions

In today’s fast-paced machine learning landscape, deploying AI models is just as important as developing them. In this blog post, we are going to walk through an image segmentation application using Google’s PaliGemma 2 Mix model and transformers, containerized with Docker, and served through a FastAPI backend. We are also going to discuss the CI/CD pipeline using GitHub Actions to automate building the Docker image and pushing it to Docker Hub. Let’s explore this service, why we chose these technologies, and how you can get started and use the service yourself!

Chat with Qwen3 on your iPhone: A Step-by-Step Guide

Have you ever wanted to run a powerful large language model directly on your iPhone without sending your data to the cloud? Thanks to Apple’s MLX Swift framework, you can now run the remarakably capable Qwen3 models right on your iPhone.

Image Segmentation with PaliGemma 2 Mix and MLX

In this post, we are going to explore Google’s PaliGemma 2 mix vision-language model (VLM), and its capabilities to perform image segmentation. What’s interesting is that we are going to perform this task by only using Apple’s MLX framework, and MLX-VLM. This would eliminate the dependency of using JAX/Flax as in the original Google’s segmentation script, and would allow us to fully and seamlessly utilise Apple’s unified memory. Medium post can be found here.

Qwen2.5-vl with MLX-VLM

In this post, we are going to show a tutorial on using the Qwen2.5-VL model with MLX-VLM for visual understanding tasks. We are going to cover:

links

social