BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

Industrializing AI Software Development

Forbes Technology Council

Asim Razzaq, CEO of Yotascale and former Head of Platform Engineering at PayPal.

Large language models (LLMs) are ushering in a revolutionary era with their remarkable capabilities. From enhancing everyday applications to transforming complex systems, generative AI is becoming an integral part of our lives.

However, the surge in demand for AI-powered solutions exposes a critical challenge: the scarcity of computational resources required to meet the growing appetite for logic and voice-based interfaces. This scarcity leads to a pressing need for cost-efficient platforms that can support the development and deployment of LLMs.

Industrializing AI software development will require transforming the processes for developing, deploying and maintaining AI systems from a research or ad-hoc approach into a structured, systematic and scalable industrial process. By focusing on cloud cost optimization and platform engineering, businesses can foster growth, profitability, and innovation in the field of AI.

The Challenge Of Compute Demand

According to industry experts, the demand for computing resources outstrips supply by a factor of 10. This scarcity is a significant determinant of success for AI companies, as access to cost-effective computing resources is crucial. Surprisingly, in the past, some companies have allocated over 80% of their total capital raised solely for acquiring computing resources, as reported by Andreessen Horowitz in its publication, "Navigating the High Cost of AI Compute." This escalating cost factor necessitates a more strategic approach to managing cloud-based computational expenses and how we industrialize the development of AI.

Platform Engineering And Cost Efficiency

Platform engineering—a prevalent software engineering paradigm—focuses on optimizing costs while delivering advanced functionality for building modern digital applications. Distributed clusters handle data pipelines and input/output (I/O) processes, supporting the expansion of neural networks. To manage workload costs effectively, businesses prioritize access to high-value resources, such as graphics processing units (GPUs), for critical applications. By incorporating cost-efficiency into the underlying platform engineering for LLMs, businesses can establish a virtuous cycle that drives growth and profitability.

Leading digital companies like Netflix and Uber have created platform engineering teams that build scalable, efficient software infrastructures for delivering their services. Netflix's platform engineering team, for instance, has built numerous open-source tools like Spinnaker and Nebula, which have helped streamline its deployment processes and manage its services more efficiently. And Uber’s engineering team created a platform called Michelangelo that manages the deployment, serving and monitoring of their machine learning (ML) models. By handling these aspects centrally, Michelangelo has reportedly reduced the time to deploy ML models from months to days and provided significant cost savings.

The Benefits Of Cost Management

Traditionally, cost management has been deprioritized during periods of innovation due to a focus on engineering agility and enablement. However, with the advent of AI, these challenges can no longer serve as excuses for neglecting cost management. In an era of constrained cloud computing resources, efficiency becomes a mission-critical element of AI and ML development and operations. By actively managing costs, businesses can navigate difficult financial periods and position themselves for sustainable growth. Furthermore, the recovery of cost savings can fund additional innovations that might have otherwise been priced out of the market.

Economic Metrics For Software Industrialization

At the business level, cost management aligns with familiar economic metrics such as gross profit margins, capital preservation and cost attribution by product, instance and operations. However, at the product or service level, new economic metrics specific to AI and ML emerge. These metrics include rightsizing ModelOps, workload prioritization of algorithm components and optimizing inference operations. By leveraging these metrics, businesses can make informed decisions that drive efficiency and cost-effectiveness in their AI initiatives.

Industrializing AI Software Development

In the new era of AI development, software reliability engineering (SRE), finance and engineering teams will play a pivotal role in driving AI industrialization. SRE teams ensure the reliability and performance of AI systems, while finance teams focus on capital allocation and cost optimization. Engineering teams will be responsible for developing platforms that balance cost-efficiency and advanced functionality.

By aligning these teams and their efforts with an emphasis on cost management, businesses can build AI-based systems that drive innovation, growth and profitability. This new approach to AI industrialization, driven by SRE, finance and engineering teams, holds the potential to unlock unprecedented innovation and establish a solid foundation for reliably producing high-quality, efficient and sustainable AI systems at scale.


Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?


Follow me on Twitter or LinkedInCheck out my website