Audvik Labs

BACKGROUND

DataLab, a leading Data Science and Modeling platform, empowers Explorers and Builders to create cutting edge models and tools. Business end users leverage these tools for advanced R&D work. To further enhance the platform’s capabilities, the HPC and Extended Compute bare metal services underwent an upgrade, incorporating the latest packages and versions, alongside architectural modification

PROBLEM STATEMENT

Post-upgrade, it was observed that certain software, such as Turbomole, experienced performance issues on HPC Workers due to throttling effects. Additionally, the Extended Compute service, while enabling user submissions effectively, posed confusion for admins during app installations, requiring unnecessary user switches. The goal was to address these issues, ensuring optimal performance, cost-efficiency, and a seamless user experience.

SOLUTION

  • The solution involved upgrading and enhancing the HPC and Extended Compute bare metal services. Specific measures included selecting appropriate Azure File Share tiers based on bandwidth, IOPS, and throttling limits, optimizing storage costs, integrating OneDrive, enabling Condor_ssh on nodes for job monitoring, and leveraging HTCondor to initiate jobs on spot instance machines for cost reduction

RESULT

  1. Optimized Storage: Implementation of cost-effective long-term storage solutions.
  2. User Experience Improvement: Integration of OneDrive and direct conversion of “mduser” to user accounts, eliminating the need for unnecessary user switches during app installations.
  3. Performance Enhancement: Condor_ssh enabled for efficient job monitoring and HTCondor utilized to start jobs on spot instance machines, significantly reducing costs.

AT A GLANCE

Objectives

  1. Enhance the performance of HPC workers, particularly addressing the throttling effect on applications like Turbomole.
  2. Streamline the user experience on Extended Compute by eliminating the need for admin user switches during app installations

Achievements

The project successfully achieved enhanced performance, streamlined user experiences, and optimized costs for the DataLab HPC and Extended Compute services

Tech Stack

  • Kubernetes
  • Jupyter
  • PyCharm