DevOps engineer

Taipei City Permanent View Job Description
We are seeking a DevOps Engineer to design and optimize container infrastructure for AI scheduling and LLM workloads. This role focuses on building scalable, high-performance systems across compute, storage, and networking layers, with deep integration into Kubernetes and observability stacks.
  • Be part of a AI startup building the next-generation cloud platform.
  • Own and scale mission-critical backend systems with global impact

About Our Client

The company is a cutting-edge startup focused on revolutionizing AI infrastructure through its GPU cloud platform. With operations in Singapore, Taiwan, and the Bay Area, They are backed by industry leaders and partners with top-tier GPU and cloud providers. The company fosters a high-growth, innovation-driven culture where engineers are empowered to build, scale, and make a global impact

Job Description

  • Architect and develop container-related subcomponents for AI workloads, including runtime, storage, and networking plugins.
  • Optimize Kubernetes-based infrastructure to support heterogeneous computing environments and large-scale LLM training/inference.
  • Build observability features such as monitoring, logging, alerting, and auditing tailored to AI container systems.
  • Contribute to the development of unified platforms for containerized AI workloads, ensuring stability, scalability, and cost-efficiency.
  • Collaborate with cross-functional teams to integrate infrastructure with scheduling, orchestration, and developer-facing APIs.
  • Maintain and improve CI/CD pipelines to streamline deployment and operational workflows.
  • Automate infrastructure tasks to enhance system reliability and reduce manual overhead.
  • Ensure security and compliance across containerized environments.

The Successful Applicant

  • 3+ years of experience in Kubernetes platform development, with hands-on expertise in container runtimes (e.g., containerd, runc), storage, and networking.
  • Strong familiarity with Kubernetes internals, including device plugins and custom exporters.
  • Experience with observability tools such as Prometheus, Grafana, and EFK.
  • Proficient in scripting and automation for CI/CD and infrastructure management.
  • Solid understanding of cloud platforms (AWS, GCP, Azure) and infrastructure-as-code tools.
  • Prior experience in the GPU or cloud service provider space is highly preferred.
  • Bilingual in English and Chinese, with strong communication skills for cross-regional collaboration.
  • Ability to translate complex technical requirements into scalable, production-ready solutions.

What's on Offer

  • Opportunity to shape the container infrastructure powering the future of AI.
  • Work with cutting-edge technologies in a high-growth, high-ownership environment.
  • Competitive compensation, equity, and career growth in a global startup.
  • Collaborate with world-class engineers, partners, and customers in the AI and cloud ecosystem.



Contact
Nick Wei
Quote job ref
JN-092025-6833966
Phone number
+886 2 8729 8222

Job summary

Job function
IT
Specialisation
IT Development
What is your area of specialisation?
Technology & Telecoms
Location
Taipei City
Contract Type
Permanent
Consultant name
Nick Wei
Consultant phone
+886 2 8729 8222
Job Reference
JN-092025-6833966

Diversity & Inclusion at Michael Page

We don't just accept difference - we celebrate it. We encourage applicants from all backgrounds to apply for this role and are committed to building inclusive, diverse workplaces where everyone can thrive. If you require any support or reasonable adjustments during the recruitment process, please let us know.