Menu

Architecture of a high frequency trading system

2 Comments

architecture of a high frequency trading system

Algorithmic automated trading or Algorithmic Trading has been at the centre-stage of the trading world for more than a decade now. As a result, it has become a highly competitive market that is heavily dependent on technology. System, the basic architecture of automated trading systems that execute algorithmic strategies has undergone major changes over the past decade and continues to do so. For firms, especially those using high frequency trading systems, it has become a necessity to innovate on technology in order to compete in the world of algorithmic trading, thus, making algorithmic trading field a hotbed for advances in computer and network technologies. In this post, we will demystify the architecture behind automated trading systems for our readers. We compare the new architecture of automated trading systems with the traditional trading architecture, and understand some of the major components behind these systems. Any trading system, conceptually, is nothing more than a computational block that interacts with the exchange on two different streams. The market data that is received typically informs the system of the latest orderbook. It might contain some additional information like the volume traded so far, the last traded price and quantity for a scrip. However, to make a decision on the data, the trader might need to look at old values or derive certain parameters from history. To cater to that, a conventional system would have a historical database to store the market data and tools to use that database. Analysis would also involve a study of the past trades by the trader. Hence another database for storing the trading decisions as well. Last, but not the least, a GUI interface for the trader to view all this information on the screen. The traditional architecture could not scale up to the needs and demands of Automated trading with DMA. The latency between origin of the event to the order generation went beyond the dimension of human control and entered the realms of milliseconds and microseconds. So the tools to handle market data and its analysis it needed to adapt accordingly. Order management also needs to be more robust and capable of handling many more orders per second. Since the time frame is so small architecture to human reaction time, risk management also needs to handle orders in real time and in a completely automated way. For example, even if the architecture time for an order is 1 millisecond which is a lot compared to the latencies we see todaythe system is still capable of making trading decisions in a single second. This means each of these trading decisions needs to go through the Risk management within the same second to reach the exchange. This is just a problem of complexity. Since the architecture now involves automated logic, traders can now be replaced by a single automated trading system. This adds scale to the problem. So each of the logical units generates orders and such units meanorders every high. This means that the decision-making and order sending part needs to be much faster than the market data receiver in order to match the rate of data. Hence, the level of infrastructure that this module demands would need to be far superior compared to that of a traditional system discussed in the previous section. The Application layer, now, is little more than a user interface for viewing and providing parameters to the CEP. The problem of scaling also leads to an interesting situation. Let us say different logics are being run over a single market data event as discussed in the earlier example. However there might be common pieces of complex calculations that need to be run for most of the logic units. For example, calculation of greeks for options. If each logic were to function independently, each unit would do the same greek calculation which would unnecessarily use up processor resources. In order to optimize on the redundancy of calculation, complex redundant calculations are typically hived off into a separate calculation engine which provides the greeks as an input to the CEP. Although the application layer is primarily a view, some of the risk checks which are now resource hungry operations owing the problem of scalecan be offloaded to the application layer, especially those that are to do with sanity of user inputs like fat finger errors. The rest of the risk checks are performed now by a separate Risk Management System RMS within the Order Manager OMjust before releasing an order. However, some risk checks may be particular to certain strategies and some might need to be done across all strategies. Hence the RMS itself involves, strategy level RMS SLRMS and global RMS GRMS. It might also involve a UI to view the SLRMS and GRMS. With innovations come necessities. Since the new architecture was capable of scaling to many strategies per server, the need to connect frequency multiple destinations from a single server emerged. So the order manager hosted several adaptors to send orders to multiple destinations and receive data from multiple exchanges. Each adaptor acts as an interpreter between the protocol that is understood by the exchange and the protocol of communication within the system. Multiple exchanges mean multiple adaptors. However, to add a new exchange to the system, a new adapter has to be designed and plugged into the architecture since each exchange follows its protocol only that is optimized for features that the exchange provides. To avoid this hassle of adapter addition, standard protocols have been designed. The most prominent amongst them is the FIX Financial Information Exchange protocol see our post on introduction to FIX protocol. This not only makes it manageable to connect to different destinations on the fly, but also drastically reduces to the go to market when it comes to connecting with a new destination. Connecting FXCM over FIX, a detailed tutorial. The presence of standard protocols makes it easy to integrate with third party vendors, for analytics or market data feeds as well. In addition, simulation becomes very easy as receiving system from the real market and sending orders to a simulator is just a matter of using the FIX protocol to connect trading a simulator. The simulator itself can be built in-house or procured from a third party vendor. Similarly recorded data can just be replayed with the adaptors being agnostic to whether the data is being received from the live market or from a recorded data set. With the building blocks of an algorithmic trading system in place, the strategies optimized on the ability to process huge amounts of data in real time and make quick trading decisions. But with the advent of standard communication protocols like FIX, the technology entry barrier to setup an algorithmic trading desk, became lower and hence more competitive. As servers architecture more memory and higher clock frequencies, the focus shifted towards reducing the latency for decision making. Over time, reducing latency became a necessity for many reasons like:. To know more on latency, catch our past webinar: The problem, however, is that latency is really an overarching term that encompasses several different delays. To quantify all of them in one generic term may not usually make much sense. Although it is very easily understood, it is quite difficult to quantify. It, therefore, becomes increasingly important how the problem of reducing latency is approached. High latency at any of these steps ensures a high latency for the entire cycle. Hence latency optimization usually starts with the first step in this cycle that is in our control i. The easiest thing to do here would be to shorten the distance to the destination by as much as possible. Colocations are facilities provided by exchanges to host the trading server in close proximity to the exchange. The following diagram illustrates the gains that can be made by cutting the distance. For any kind of a high frequency strategy involving a single destination, Colocation has become a defacto must. However, strategies that involve multiple destinations need some careful planning. Several factors like, the time taken by the destination to reply to order requests and its comparison with the ping time between the two destinations must be considered before making such a decision. The decision may be dependent on the nature of the strategy as well. Network latency is usually the first step in reducing overall latency of an algorithmic trading system. However there are plenty of other places where the architecture can be optimized. Propagation latency signifies the time taken to send the bits along the wire, constrained by speed of light of course. Several optimizations have been introduced to reduce the propagation latency apart from reducing the physical distance. For example, estimated roundtrip time for an ordinary cable between Chicago and New York is Spread networks, in Octoberannounced latency improvements which brought the estimated roundtrip time to Microwave communication was adopted further by firms such as Tradeworx bringing the estimated roundtrip time to 8. Note that the theoretical minimum is about 7. Continuing innovations are pushing the boundaries of science and fast reaching the theoretical limit of speed of light. Latest developments in laser communication, earlier adopted in defense technologies, has further shaved off an already thinning latency by nanoseconds over short distances. The next level of optimization in the architecture of an algorithmic trading system would be in the number of hops that a packet would take to travel from point A to point B. For example, a packet could travel the same distance via two different paths. But It may have two hops on the first path versus 3 hops on the second. Assuming the propagation delay is the same the routers and switches each introduce their own latency and usually as a thumb rule, more the hops more is the latency added. Network processing latency may also be affected by what we refer to as microbursts. Microbursts are defined as sudden increase in rate of data transfer which may not necessarily affect the average rate of data transfer. Since algorithmic trading systems are rule based, all such systems will react to the same event frequency the same way. As a result, a lot of participating systems may send orders leading to a sudden flurry of data transfer between the participants and the destination leading to a microburst. The following diagram represents what a microburst is. The first figure shows a 1 second view of the data transfer rate. We can see that the average rate is well below the bandwidth available of 1Gbps. However if dive deeper and look at the seconds image the 5 millisecond viewwe see that the transfer rate has spiked above the available bandwidth several times each second. As a result the packet buffers on the network stack, both in the network endpoints and routers and switches may overflow. To avoid this, typically a bandwidth that is much higher than the observed average rate is usually allocated for an algorithmic trading system. A packet size of bytes transmitted on a T1 line 1, bps would produce a serialization delay of about 8 milliseconds. However the same byte packet using a 56K modem bps would take milliseconds. A 1G Ethernet line would reduce this latency to about 11 microseconds. Interrupt latency signifies a latency introduced by interrupts while receiving the packets on a server. Interrupt latency is defined as the time elapsed between when an interrupt is generated to when the source of the interrupt is serviced. When is an interrupt generated? Interrupts are signals to the processor emitted by hardware or software indicating that an event needs immediate attention. The processor in turn responds by suspending its current activity, saving its state and handling the interrupt. Whenever a packet is received on the NIC, an interrupt is sent to handle the bits that have been loaded into the receive buffer of the NIC. The time taken to respond to this interrupt not only affects the processing of the newly arriving payload, but also the latency of the existing processes on the processor. Solarflare introduced open onload inwhich implements a technique known as kernel bypass, where the processing of the packet is not left to the operating system kernel but to the userspace itself. The entire packet is directly mapped into the user space by the NIC and is processed there. As a result, interrupts are completely avoided. As a result the rate of processing each high is accelerated. The following diagram clearly demonstrates trading advantages of kernel bypass. This is dependent on the several packets, the processing allocated to the application logic, the complexity of the calculation involved, programming efficiency etc. Increasing the number of processors on the system would in general reduce the application latency. Same is the case with increased clock frequency. A lot of algorithmic trading systems take advantage of dedicating processor cores to essential elements of the application like the strategy logic for eg. This avoids the latency introduced by the process switching between cores. Similarly, if frequency programming of the strategy has been done keep in mind the cache sizes and locality of memory access, then there would be a lot of memory cache hits resulting further reduction of latency. To facilitate this, a lot of system use very low level programming languages to optimize the code to the specific architecture of the processors. Some firms have even gone to the extent of burning complex calculations onto hardware using Fully Programmable Gate Arrays FPGA. With increasing complexity comes increasing cost and the following diagram aptly illustrates this. The world of high frequency algorithmic trading has entered an era of intense competition. With each participant adopting new methods of ousting the competition, technology has progressed by leaps and bounds. Modern day algorithmic trading architectures are quite complex compared to their early stage counterparts. Accordingly, advanced systems are more expensive to build both in terms of time and money. System email address will not be published. Yemen Zambia Zimbabwe ProspectID Email This field is trading validation purposes and should be left unchanged. This iframe contains the logic required to handle AJAX powered Gravity Forms. How Trading Systems Function. How Trading Systems Function On January 12, By admin In Getting Started 0 Comment. Traditional Architecture Any trading system, conceptually, is nothing more than a computational block that interacts with the exchange on two different streams. Receives market data Sends order requests and receives replies from the exchange. The entire trading system can now be broken down into The exchange s — the external world The server Market Data receiver Store market data Store orders generated by the user Application Take inputs from the user including the trading decisions Interface for viewing the information including the data and orders An order manager sending orders to the exchange. New Architecture The traditional architecture could not scale up to the needs and demands of Automated trading with DMA. Connecting FXCM over FIX, a detailed tutorial The presence of standard protocols makes it easy to integrate with third party vendors, for analytics or market data feeds as well. Emergence of low latency architectures With the building blocks of an algorithmic trading system in place, the strategies optimized on the ability to process huge amounts of data in real time and make quick trading decisions. Over time, reducing latency became a necessity for many reasons like: Strategy makes sense only in a low latency environment Survival of the fittest — competitors pick you off if you are not fast enough To know more on latency, catch our past webinar: If we look at the basic life cycle, A market data packet is published by the exchange The packet travels over the wire The packet arrives at a router on the server side. The router forwards the packet over the network on the server side. The packet arrives on the Ethernet port of the server. The adaptor then parses the packet and converts it into a format internal to the algorithmic trading platform This packet now travels through the several modules of the system — CEP, tick store, etc. The CEP analyses and sends an order request The order high again goes through the reverse of the cycle as the market data packet. Propagation latency Propagation latency signifies the time taken to send the bits along the wire, constrained by speed of light of course. Network processing latency Network processing latency signifies the latency introduced by routers, switches, etc. Serialization latency Serialization latency signifies the time taken to pull the bits on and off the wire. Interrupt latency Interrupt latency signifies a latency introduced by interrupts while receiving the packets on a server. Application latency Application latency signifies the time taken by the application to process. Levels of sophistication The world of high frequency algorithmic trading has entered an era of intense competition. Decoding the Black Box running Trading Systems Algorithmic Trading Strategies, Paradigms and Modelling… Development of Cloud-Based Automated Trading System with… Connecting FXCM over FIX — A detailed Tutorial. Leave a Reply Cancel reply Your email address will not be published. India QuantInsti Quantitative Learning Pvt Ltd A, Boomerang, Chandivali Farm Road, Powai, Mumbai — Toll Free: Connect with us… Show us some love on Quantocracy. Click here to register. architecture of a high frequency trading system

Programming for Finance with Python and Quantopian and Zipline Part 1

Programming for Finance with Python and Quantopian and Zipline Part 1

2 thoughts on “Architecture of a high frequency trading system”

  1. alex282157 says:

    The voice of Raney seems genuine and Edgerton received great acclaim for his novel.

  2. alyak says:

    His skill as a diplomat was unrivalled during his reign as chancellor of Prussia and Germany.

Leave a Reply

Your email address will not be published. Required fields are marked *

inserted by FC2 system