Machine Learning (ML) on the edge is key to enabling a new breed of IoT and autonomous system applications. The departure from the traditional cloud-centric architecture means that new deployments can be more power-efficient, provide better privacy and reduce latency for inference. At the core of this paradigm is TinyML, a framework allowing the execution of ML models on low-power embedded devices. TinyML allows importing pre-trained ML models on the edge for providing ML-as-a-Service (MLaaS) to IoT devices. This article presents a TinyMLaaS (TMLaaS) architecture for future IoT deployments. The TMLaaS architecture inherently presents several design trade-offs in terms of energy consumption, security, privacy, and latency. We also present how TMLaaS architecture can be implemented, deployed, and maintained for large-scale IoT deployment. The feasibility of implementation for the TMLaaS architecture has been demonstrated with the help of a case study.