HPC Engineer - Infiniband
NexGen Cloud is a rapidly growing IaaS company focused on providing innovative cloud solutions and infrastructure services. Our GPU cloud infrastructure solutions accelerate development in industries such as Artificial Intelligence & Machine Learning, VFX & Rendering, Data Science & IoT, and Computer Aided Engineering & MDO.
We are dedicated to helping our clients navigate the complexities of the digital world and achieve success through cutting-edge, scalable, secure and affordable solutions.
At the company's heart stands a group of very talented, experienced, and motivated individuals who want to make a positive change and a lasting impact on the tech world.
Position Summary:
We are looking for an experienced HPC Engineer with proven expertise in deploying large-scale GPU infrastructures, particularly with Infiniband technologies. The ideal candidate will have a strong background in HPC environments and a track record of successful implementations for major industry providers.
Key Responsibilities:
- Design, deploy, and manage large-scale GPU infrastructures for HPC applications.
- Optimize HPC systems for performance and scalability, focusing on Infiniband
- Experience in deploying and managing NVIDIA UFM Enterprise
- Build management and monitoring tools to enable the Operations team to gain oversight of the complex Infiniband networks
- Design and perform acceptance testing procedures for new cluster deployments
- Collaborate with cross-functional teams to integrate HPC solutions into existing environments.
- Troubleshoot and resolve complex technical issues related to HPC deployments.
- Stay updated with the latest advancements in HPC technologies and best practices.
- Develop and maintain documentation for HPC infrastructure and processes.
Qualifications and Skills:
- Proven experience in deploying and managing large-scale GPU infrastructures.
- In-depth knowledge of InfiniBand networking.
- Strong understanding of HPC architectures and performance optimization techniques.
- Experience with Linux operating systems and scripting languages (e.g., Python, Bash).
- Excellent problem-solving skills and the ability to work independently and as part of a team.
- Strong communication skills and the ability to collaborate effectively with technical and non-technical stakeholders.
Good to have:
- Experience with containerization technologies (e.g., Docker, Kubernetes).
- Familiarity with cloud platforms (e.g., AWS, Azure, Google Cloud) and hybrid HPC environments.
- Certification in relevant HPC technologies or platforms.
What We Offer:
- Competitive salary
- Opportunity to work with a diverse team of talented professionals who are passionate about technology and innovation.
- A collaborative and supportive work environment that encourages professional growth and development.
- Exposure to cutting-edge technologies and the opportunity to make a significant impact on the future of cloud computing.
We encourage applications from candidates of all backgrounds and experiences. Our commitment to diversity and inclusion drives our success as a company and reflects our dedication to fostering a diverse and innovative workforce.
Join our team and become a part of the NexGen Cloud Team, where innovation, collaboration, and growth are at the heart of everything we do. If you are a passionate, talented, and motivated individual looking to make a difference, apply now!
- Department
- Tech
- Locations
- Remote
- Remote status
- Fully Remote
Remote
Workplace
Your work at NexGen Cloud will have a real impact. We're not about mundane tasks. We're about tackling meaningful projects that make a difference. Whether it's developing innovative solutions for our clients, contributing to research, or driving digital transformation, your work will be purposeful and meaningful.
HPC Engineer - Infiniband
Loading application form
Already working at NexGen Cloud?
Let’s recruit together and find your next colleague.