logo
Login
Register
Bin responded to investors' "doubts": the appearance of DeepSeek will increase global computing power demand, not reduce it.
But Bin believes that the results of DeepSeek will increase the demand for global AI computing power, not weaken it.
On February 12th, Orient Harbour released a public article in response to investor inquiries. However, Bin believes that the achievements of DeepSeek will increase the demand for global AI computing power, rather than decrease it. The biggest misconception in the market is fundamentally pitting algorithms, computing power, and data against each other; in reality, algorithms, data, and computing power have a "synergistic relationship". There will be various investment opportunities emerging in the field of AI applications in both China and the US, while the business models of large model enterprises will continue to face challenges. Only by continuously maintaining a leading edge in model innovation can they sustain a large user base and pricing advantage, to offset the high initial exploration costs. This difficulty is becoming increasingly challenging. During the end of the year, the Chinese quantitative fund team Deepseek consecutively released the V3 pedestal large model and the R1 inference large model, shaking the world with performance that rivals OpenAI's strongest model at a significantly lower inference cost. Orient Harbour received inquiries from many investors, with the three most prominent questions being: 1) Despite the situation where Chinese teams are limited by computing power, their development of globally leading AI large models, does this indicate that future progress in AI does not require computing power? 2) By modifying the PTX instruction set, the Deepseek team optimized the use of GPUs, does this mean bypassing the barrier of CUDA and being able to freely use domestically-produced chips in the future? 3) What investment opportunities and risks will the cost reduction and democratization of Chinese models bring? Regarding the first question, Orient Harbour's viewpoint is: the achievements of Deepseek will increase the demand for global AI computing power, rather than decrease it. Firstly, the biggest misconception in the market is fundamentally pitting algorithms, computing power, and data against each other, mistakenly thinking that the innovative progress of algorithms is in competition or substitution with computing power and data. In reality, there is a "synergistic relationship" between algorithms, data, and computing power. In the development of artificial intelligence over the past 70 years, progress in all three elements is necessary; if one element is hindered, artificial intelligence will stagnate: the first wave of artificial intelligence was halted by algorithmic flaws, the second wave was stopped by insufficient computing power. And now, in the third wave, thanks to advances in algorithms, computing power, and big data, there has been an unprecedented leap in the internet age. Similarly, the development of any of these three elements will enhance the value of the others. Just like in a family, the success of the father's career will bring more opportunities for the children's growth and the wife's entrepreneurship. Assuming in the past when algorithms were inefficient, one unit of chip could only serve 10 users in a use case; now that algorithms have improved, the same chip can serve 100 people. If the chip does not raise its price and its value has definitely increased by ten times, not decreased. If the value of a commodity increases tenfold and the price remains the same, then the demand must be rising, this is a common economic principle. The reason why the market wrongly pits algorithms against computing power may stem from the current competitive relationship between China and the US. When Chinese model enterprises make breakthroughs in engineering algorithms in the context of limited computing resources, the market psychology easily mirrors the "competition between China and the US" onto the competition between algorithms and computing power. Coupled with the "mystique of the Eastern force", Wall Street easily amplifies the emotions of "surprise". Secondly, the "cost reduction and democratization" of mature AI models has been a big trend in the past 2 years. As Deepseek launches its cost reduction and democratization "gift package" as a "follower" at the beginning of the year, apart from the unexpected emotional points of being "from China" and "open-source", it also fits into this trend, heading towards the necessary path of "universal application". However, the cost reduction of mature models and the exploration of cutting-edge models are two different things. To aspire to be a leader in the AI model era, the required computing power and resources are not negligible, which is also the ambition of many giants aside from OpenAI. The development of any technology generally follows the development model of "innovation-following-cost reduction". The frontiers "explorers" will spend a lot of money and time on experimental exploration, eventually finding an effective technical solution and commercializing it; subsequently, society will see a large number of "followers" who replicate the product based on the explorer's ideas and further optimize it in engineering. This cost optimization approach will then return to the explorer for integration and cost reduction, benefiting both parties. This principle applies to familiar fields such as innovative and generic drugs, Tesla and Chinese electric cars, TSMC and other foundries, as well as the field of large models. Currently, in the majority of the capabilities of large models (such as chatbots, real-time multimodal models, logical reasoning models, etc.), OpenAI temporarily plays the role of an explorer, followed by the four major models in North America (Gemini, Claude, Xai, Llama); closely following North American companies are Chinese internet giants (such as ByteDance Douyin, Alibaba Thousand Questions, Baidu Wenzin, Tencent HuiYuan) and a group of model startups (such as Deepseek, Zhilu GLM, MiniMax Conch, Moon's Dark Side Kimi), while outside of China and the US, there are few followers. The following figure describes well the cost reduction speed of China and the US in the "GPT4 and o1" race tracks opened by OpenAI in the past two years: since the launch of GPT4 in April 2023, a large number of followers have reduced the cost of the model with the same performance by 1000 times in 1.5 years - three orders of magnitude; since the launch of the o1 version series in September 2024, the follower DeepseekR1 has reduced the cost by 27 times in three months - one order of magnitude, and the follower Gemini2.0 flashthinking has reduced the cost by 100 times in the same period - two orders of magnitude. Therefore, we say that "democratization and cost reduction" are currently the biggest trend in the AI era, and Deepseek has not escaped this trend. People are immersed in the shock of Deepseek and are unable to let it go, and not even Google's more exaggerated cost reduction effect is being discussed. The reason why followers can achieve a cost reduction of several orders of magnitude compared to explorers has also been briefly mentioned in the previous viewpoint.There are many methods, and the article explaining the Deepseek technology report is also very detailed, so we will not go into too much detail. Apart from engineering innovation, data distillation, and the continuously decreasing cost of computing power over time, the biggest difference between explorers and followers comes from the "cost of exploration". Just like the difference in cost between innovative drugs and generic drugs lies in the experimentation and clinical trials. Like other followers in the United States, Deepseek is the same, if they want to be at the forefront of the times and not just be followers, the cost they have to pay will be many times greater than it is now.Furthermore, with the significant decrease in the cost of AI, the demand for reasoning brought about by the popularization of AI applications is where computing power is most needed. We have compared the reasoning costs of the o1 model in our annual reflections. At an output price of $55 per million tokens, the use of reasoning models for Agent applications was almost impossible. However, in less than a month, the cost of reasoning models was reduced by a factor of 100 due to engineering optimization by competitors, and the expected Agent application ecosystem is rapidly approaching us. Deepseek has sparked a concept - the Jevons Paradox, referring to the economic phenomenon where the total consumption of resources does not decrease but instead increases when resource efficiency improves. This theory was first applied to the issue of coal consumption in the 19th century. When Watt improved the steam engine and increased the efficiency of coal use (coal consumption per unit of power decreased by 75%), coal-fired steam engines were more widely used in factories, railways, and ships, accelerating the total consumption of coal and raising coal prices. A similar situation occurs when the fuel efficiency of cars improves (less fuel consumption per kilometer), leading to a significant increase in mileage and total fuel consumption, as well as LED lights saving energy leading to longer lighting times and more places installing lights, resulting in a rise in overall energy consumption. Before a technology is widely adopted, the decrease in the unit resource consumption actually promotes an increase in the total resource consumption. A similar situation can also occur in the application of AI models, as the AI era is just beginning. Let's revisit the concept of "per capita computing power": If AI technology is destined to be popular in hundreds of industries, impacting the global population of 8 billion, based on the current global AI computing power deployment of 4500 ExaFlops, each person would have 0.6 Tops of computing power, which has great potential. The chip required for an autonomous driving car needs over 500 Tops, and the computing power of Tesla's latest FSD chip AI5 is estimated to be over 1500 Tops. There is still significant room for growth in AI computing resource consumption, provided that the efficiency of computing resources is significantly improved. In fact, since the release of Deepseek, we have seen a rapid increase in rental prices on the Spot market for computing power (small percentage of long orders), leading to a shortage of computing power. Many AI application companies have started using the Deepseek model as a testing solution, leading to the shortage of computing power. Deepseek's official website has also experienced crashes and rejections due to the rapid increase in user numbers to 40 million (with only 60 million users for Douyin). Additionally, this month, Microsoft, Meta, Google, and Amazon, in their financial reports, have increased their capital expenditures on AI equipment for 25 years in preparation for the upcoming reasoning application market. Regarding the second question, the view of Eastern Harbor is that CUDA has not been bypassed, but rather the barriers have been strengthened. In the paper of Deepseek V3, it is described that in order to optimize the efficiency of using NVIDIA chips, the team was not satisfied with the high-level language editing of CUDA. Instead, they directly edited the PTX instruction set at the low level to modify the task allocation of the stream processors in the H800 chip, thereby improving the communication efficiency and stability of the entire interconnect to a certain extent. Many people see this and think that Deepseek did not use CUDA software, but instead used PTX assembly language to modify the functions of the GPU, so the team has the ability to bypass CUDA and use assembly language to replicate the model's training on chips from other manufacturers. This is a very big misunderstanding. First, let's explain what PTX is. NVIDIA chips cover a wide range of top-level application scenarios, including gaming graphics, autonomous driving, large language models, scientific simulations, and more. To accelerate tasks in each specific area using GPUs, corresponding software libraries are required. For example, OptiX ray tracing acceleration in games, TensorRT-LLM for large language model acceleration, and so on. On the other hand, the bottom-level hardware design of NVIDIA chips has evolved from the past Pascal and Volta architectures to the widely known Ampere, Hopper, and Blackwell architectures, continuously upgrading in terms of process technology, computing precision, instruction set complexity, and more. Therefore, both software and hardware are constantly iterating and developing, which also brings compatibility issues. Developers often worry about whether the software they design today will still be compatible with updated chip architectures in the future. To address this issue, NVIDIA has designed a dedicated "intermediate representation" (PTX) to connect software with hardware. Regardless of how software and hardware upgrade and change, the code only needs to be translated through PTX to adapt to different GPU architectures and generate corresponding machine code. This is similar to the trade between China and Europe, where people in both places speak various languages. If there is a translator proficient in both Chinese and European languages acting as an intermediary, there is no need for every Chinese businessman to learn all European languages, as everyone can communicate directly in English. The role of PTX in the field of computing is similar to this "general translation layer", translating the high-level language of CUDA software into an intermediate representation, and then converting it into SASS languages that NVIDIA GPUs can understand (this part is confidential). To enhance the adaptability of CUDA developers to GPU hardware, NVIDIA has opened up the editing rights of PTX, allowing developers to not only write CUDA code but also directly adjust the PTX layer to optimize the execution efficiency of the code on different GPU architectures. This process can be likened to a CEO (CUDA code) assigning tasks to a marketing director (PTX), who further refines the tasks and assigns them to various salespersons (SM stream processors). If the CEO believes that the director's allocation is unreasonable, they can intervene directly to adjust the task allocation and improve the efficiency of parallel tasks. Therefore, Deepseek's ability to optimize task execution using PTX (Parallel Computation Task Thread Execution) is permitted by the "editability" of NVIDIA architecture. NVIDIA often integrates developerThe editing of PTX's innovative engineering methods optimizes the official CUDA operators in turn, which is also the advantage of feeding back to the CUDA ecosystem. However, the chips of AMD, Huawei, and Cambricon, although they also have this kind of intermediate representation layer (IR code), their IR code is not editable.After understanding the principles above, we can understand that Deepseek uses PTX for hardware task execution optimization, not bypassing CUDA, but instead strengthening and nourishing the CUDA ecosystem. First of all, PTX is part of the CUDA architecture. CUDA is not just software, but also includes PTX and the underlying hardware architecture, the full name being "Unified Architecture for Computing and Devices." It is this tightly coupled software-hardware collaborative architecture that allows CUDA to maintain high efficiency compatibility and optimization capabilities during the rapid iteration of GPU computing. PTX is essentially an intermediate representation (IR), which is just another way of expressing CUDA code. Secondly, PTX can only be parsed and executed by NVIDIA GPUs. Editing PTX instructions allows users to develop and optimize in a more low-level way within the CUDA ecosystem to adapt and utilize NVIDIA's GPU hardware architecture more efficiently, rather than bypassing or exceeding its architectural limitations. The PTX instruction set is specifically designed for NVIDIA GPUs and is not suitable for GPUs or computing architectures from other manufacturers and cannot be directly ported to non-NVIDIA chips. Furthermore, DeepSeek can edit PTX because NVIDIA has opened up the permission for PTX instruction-level optimization, while the intermediate representation layers (IR) of other chips (such as Huawei Ascend, AMDGPU, Google TPU) have lower levels of openness, and developers typically cannot directly edit the low-level instruction set. In conclusion, to completely bypass CUDA, there are two main paths: either redesign a set of GPU computing acceleration libraries and development frameworks covering multiple industries at the high-level programming language level, which requires a lot of time, resources, and industry ecosystem support; or try to compile CUDA code into IR code other than PTX to adapt to different manufacturers' GPU hardware architectures, but this will be limited by compatibility and optimization. For example, AMD is using a HIP converter to migrate CUDA code to AMDGPU, which still incurs performance loss and adaptation costs. This is similar to running Windows on an Apple computer - technically feasible, but performance, compatibility, and experience are usually worse than the native environment. There are almost no better alternatives. As for the third issue, Oriental Harbor's view is that Chinese and American AI applications will bring various investment opportunities, and the business model of large model enterprises will continue to be challenged. In just one month, with its own efforts, Deepseek conducted an "AI popular science" for the people of the whole country, and even equaled or even surpassed most American models in terms of model capability and inference cost. The more important contribution of Deepseek is to discover an efficient method, that is, to use large models trained through reinforcement learning and possessing inference capability for distillation, generating sample data containing "chains of thought," and directly supervising and fine-tuning small models. Compared to directly applying reinforcement learning to small models, this method can more effectively reproduce the inference capability of large models. Therefore, after the release of the R1 model, global enterprises and universities quickly launched a replication project based on the chain of thought data for fine-tuning small models, allowing the model's inference capability to be rapidly reproduced and diffused outside the Deepseek system. The road to equal rights of inference models suddenly accelerated. Therefore, the AI application opportunities we see in the United States will also be widely implemented in the Chinese market. The only thing to note is the difference in computing power between China and the United States, which may continue to widen due to the upgrade of computing power regulation, such as the ban on NVIDIA's H20 chip. Models like Deepseek have already been adapted to domestic chips, but domestic chips still have shortcomings in architecture, software acceleration libraries, and cluster capabilities, which may affect the quality of inference services for AI products. When more users are simultaneously using more types of AI applications, situations such as inference delays and busy servers may become commonplace. Shortly after the release of R1, Openai released the o3 model as scheduled and offered a free trial. The capabilities of o3 have made a qualitative leap compared to o1, and Openai has temporarily retained its position as the "leader." However, in the game of "explorers and followers," if the explorer's pace of innovation cannot keep up with the followers' speed of cost reduction and replication, the explorer's early costs will not be recovered, and the business model will not be able to close the loop. If followers cannot replicate due to "patent barriers" or "network effects," or if explorers can continue to innovate and stay ahead, the explorers will be able to maintain premium pricing power on cutting-edge products, while suppressing prices on next-generation products catch up, ensuring the rationality of the business model, just like the business strategy adopted by TSMC in process technology. However, in the field of large models, without network effects or patent protection, Openai or other model companies aspiring to be leaders can only maintain their leading edge by continuously innovating with advanced models, to maintain a large user base and pricing advantage to offset the high exploration costs. This difficulty has become increasingly challenging. These are the main views of Oriental Harbor on these three issues. 2025 is destined to be a year of high market volatility. However, after sorting out the details of investments, we still need to focus on the main theme of investment. In the context of the era of AI, the wheels of the era are clearly accelerating. At the same time, we also need to realize that, amid high volatility, the US stock market in 2025 is expected to see over $2 trillion in capital inflows, providing support for market valuations and stability. Corporate buybacks are expected to reach $1 trillion, enhancing investor confidence by reducing the number of outstanding shares and increasing earnings per share (EPS), especially as tech giants continue to increase their buyback efforts. The total dividends of S&P 500 companies are expected to reach $600 billion, attracting long-term investors, especially pension and 401(k) accounts, due to their stability and predictability. In addition, pension and long-term investment accounts are expected to contribute over $400 billion in fund inflows, which typically flow to passively managed funds, such as S&P 500 ETFs, providing stable liquidity to the market.
European Natural Resources Fund: Global Gold Rush to the United States by 2025, Gold Prices May Peak in the First Quarter of this Year (Temporary)
Rush to DeepSeek, conversion of key words for public and private research, AI before the holiday and DS after the holiday.