Context
Water supply disruptions — whether planned maintenance or unplanned incidents — affect millions of consumers across Selangor, Kuala Lumpur, and Putrajaya. Air Selangor needed a centralized system to manage the full lifecycle of disruption events: scheduling planned works, tracking active disruptions in real-time, coordinating response teams across regions, and providing accurate information for public communication. Before iERP, this coordination relied heavily on manual processes and fragmented tools.
Constraints
- Critical operational system — downtime or inaccuracy during an active disruption directly impacts public communication and response coordination
- Real-time monitoring requirements for active disruption events across multiple concurrent regions
- Multi-region coverage spanning all Selangor districts, Kuala Lumpur, and Putrajaya
- Integration with existing Air Selangor operational workflows and notification systems
- Must support concurrent users from operations, communications, and management teams during peak events
- Deployed within the shared Alibaba Cloud Kubernetes environment
Architecture
Built on Laravel with Livewire for reactive UI components, providing real-time dashboard updates without the overhead of a separate SPA frontend. The system manages the complete disruption lifecycle — from scheduling and impact zone mapping through active monitoring to post-event reporting.
Backend: Laravel handling business logic, scheduling engine, and data persistence. UI Layer: Livewire components for reactive updates — operations teams see status changes in real-time without manual refreshing. Infrastructure: Containerized deployment on the shared Alibaba Cloud Kubernetes cluster.
Key Decisions
- Livewire over a separate SPA: For an internal operations tool, Livewire provided the right balance — reactive UI updates for monitoring dashboards without the complexity and maintenance burden of a separate frontend application and API layer. This kept the development footprint lean and reduced the number of moving parts in a critical system.
- Single integrated platform: Combined disruption scheduling, real-time monitoring, team coordination, and reporting into one system rather than separate tools. This eliminated data synchronization issues and gave operations teams a single source of truth during disruption events.
- Shared Kubernetes deployment: Leveraged the established Air Selangor cloud infrastructure. Kubernetes provided the reliability guarantees needed for a critical system — automatic restarts, health checks, and resource isolation via namespace configuration.
- Event-driven status tracking: Disruption events flow through defined status transitions (Scheduled → Active → Monitoring → Resolved), with each transition triggering relevant notifications and dashboard updates for different stakeholder groups.
Security & Reliability
- Role-based access controls with separate permission levels for operations staff, communications teams, and management viewers
- System reliability prioritized given its role in emergency response — health monitoring, automatic recovery, and alerting configured at the infrastructure level
- Deployed within a secured Kubernetes namespace with defined resource limits
- Audit trail for all disruption schedule changes and status updates — critical for post-event review and accountability
- Data integrity safeguards ensuring scheduling and monitoring records remain consistent under concurrent access
Execution
- Designed the disruption lifecycle model and system architecture
- Developed the iERP system using Laravel and Livewire with reactive monitoring dashboards
- Built the scheduling engine for planned disruption management with regional impact mapping
- Deployed on Alibaba Cloud Kubernetes infrastructure with reliability configurations
- Integrated with Air Selangor's operational workflows and notification channels
- Maintained system reliability and evolved features over a 5-year operational lifecycle
Outcome
- Served as Air Selangor's primary emergency response planning and monitoring tool for 5 years
- 100% staff adoption across all operational teams
- System capacity of up to 300,000 visitors per day
- Centralised emergency response across all regions, reducing call centre dependency during crises
- Improved transparency and public trust during water disruption events
- Replaced fragmented manual coordination processes with a centralized, real-time system
- Supports SDG 13 (Climate Action) through improved crisis response coordination
Recognition
- Malaysia Technology Excellence Awards 2022