Algorithmic Trading with R (2024)

Nana Boateng

February 25, 2018

  • Loading Packages
  • Setting Up Your Workspace
  • A Simple Trading Strategy: Trend Following

Loading Packages

In this post, I will show how to use R to collect the stocks listed on loyal3, get historical data from Yahoo and then perform a simple algorithmic trading strategy. Along the way, you will learn some web scraping, a function hitting a finance API and an htmlwidget to make an interactive time series chart.

Setting Up Your Workspace

The TTR package is used to construct “Technical Trading Rules”. The TTR package can perform more sophisticated calculations and is worth exploring. The dygraphs library is a wrapper for a fast, open source JavaScript charting library. It is one of the htmlwidgets that makes R charting more dynamic and part of an html file instead of a static image. Lastly, the lubridate package is used for easy date manipulation

library(rvest)library(pbapply)library(TTR)library(dygraphs)library(lubridate)library(tidyquant)library(timetk)pacman::p_load(dygraphs,DT)

Data Collection

Before you can look up individual daily stock prices to build your trading algorithm, you need to collect all available stocker tickers.The list of stock tickers or symbols can be scrapped from here. The first thing to do is declare stock.list as a URL string. Next use read_html() so your R session will create an Internet session and collect all the html information on the page as an XML node set. The page CSS has an ID called “.company-name”. Use this as a parameter when calling html_nodes() to select only the XML data associated to this node. Lastly, use html_text() so the actual text values for the company names is collected.

website <- read_html("https://www.marketwatch.com/tools/industry/stocklist.asp?bcind_ind=9535&bcind_period=3mo")table <- html_table(html_nodes(website, "table")[[4]], fill = TRUE)stocks.symbols<-table$X2stocks.names<-table$X3
table1<-table[-1,-1]colnames(table1)<-table[1,-1]DT::datatable(table1)
stock.list<-"https://www.marketwatch.com/tools/industry/stocklist.asp?bcind_ind=9535&bcind_period=3mo"stocks<-read_html(stock.list)stocks.names<-html_nodes(stocks,".lk01")stocks.names<-html_text(stocks.names)

We delete the rows with no entries to reduce the number of rows in the table. First we replace the empty spaces with NA and the remove the rows which contain these NA’s with complete.cases function.

table1[table1==""] <- NAtable1<-table1[complete.cases(table1$Symbol),]# to keep columns with no NA:#table1 <- table1[, colSums(complete.cases(table1)) == 0]#table1 %>%filter(complete.cases(.)) DT::datatable(table1)

Fig. 30

start.date<-Sys.Date()end.date<-Sys.Date()-years(3)start.date<-gsub('-','', start.date)end.date<-gsub('-','', end.date)start.date
## [1] "20180225"
end.date
## [1] "20150225"
# The symbols vector holds our tickers. symbols <- c("SPY","EFA", "IJS", "EEM","AGG")# The prices object will hold our raw price data prices <- getSymbols(symbols, src = 'yahoo', from = "2005-01-01", auto.assign = TRUE, warnings = FALSE) %>% map(~Ad(get(.))) %>% #Extract (transformed) data from a suitable OHLC object. getSymbols('IBM',src='yahoo') Ad(IBM) reduce(merge) %>% #reduce() combines from the left, reduce_right() combines from the right `colnames<-`(symbols)head(prices)
## SPY EFA IJS EEM AGG## 2005-01-03 92.46423 36.88009 49.51314 17.48657 66.25936## 2005-01-04 91.33437 36.17309 48.62529 16.94819 66.19465## 2005-01-05 90.70410 36.14990 47.72091 16.74072 66.16875## 2005-01-06 91.16528 36.14990 48.00587 16.72934 66.21410## 2005-01-07 91.03462 35.98765 47.38643 16.76173 66.19465## 2005-01-10 91.46503 36.14990 47.79525 16.78273 66.16232
library(quantmod)tickers <- c("AAPL", "MSFT","GOOGL","IBM","FB")getSymbols(tickers)
## [1] "AAPL" "MSFT" "GOOGL" "IBM" "FB"
closePrices <- do.call(merge, lapply(tickers, function(x) Cl(get(x))))
ParallelMap package

We take advantage of paralleMap package to reduce the time the stock data is read into r.

library(parallelMap)parallelStartSocket(2) parallelStartMulticore(cpus=6)# start in socket mode and create 2 processes on localhostf = function(x) Cl(get(x)) # define our joby = parallelMap(f, tickers) # like R's Map but in parallelmapdata<-do.call(cbind,y)parallelStop()mapdata%>%head()
## AAPL.Close MSFT.Close GOOGL.Close IBM.Close FB.Close## 2007-01-03 11.97143 29.86 234.0290 97.27 NA## 2007-01-04 12.23714 29.81 241.8719 98.31 NA## 2007-01-05 12.15000 29.64 243.8388 97.42 NA## 2007-01-08 12.21000 29.93 242.0320 98.90 NA## 2007-01-09 13.22429 29.96 242.9930 100.07 NA## 2007-01-10 13.85714 29.66 244.9750 98.89 NA

Bioconductor

library(BiocParallel)f = function(x) Ad(get(x))options(MulticoreParam=quote(MulticoreParam(workers=4)))param <- SnowParam(workers = 2, type = "SOCK")vec=c(tickers[1],tickers[2],tickers[3],tickers[4])#vec=c(paste0(quote(tickers),"[",1:length(tickers),"]",collapse=","))multicoreParam <- MulticoreParam(workers = 7)bio=bplapply(tickers, f, BPPARAM = multicoreParam)biodata<-do.call(cbind, bio)biodata%>%head()
## AAPL.Adjusted MSFT.Adjusted GOOGL.Adjusted IBM.Adjusted## 2007-01-03 8.104137 22.85825 234.0290 74.69364## 2007-01-04 8.284014 22.81997 241.8719 75.49223## 2007-01-05 8.225021 22.68984 243.8388 74.80882## 2007-01-08 8.265641 22.91184 242.0320 75.94528## 2007-01-09 8.952269 22.93480 242.9930 76.84376## 2007-01-10 9.380682 22.70515 244.9750 75.93764## FB.Adjusted## 2007-01-03 NA## 2007-01-04 NA## 2007-01-05 NA## 2007-01-08 NA## 2007-01-09 NA## 2007-01-10 NA
AdjustedPrices<-biodatadateWindow <- c("2016-01-01", "2017-09-01")dygraph(AdjustedPrices, main = "Value", group = "stock") %>% dyRebase(value = 100) %>% dyRangeSelector(dateWindow = dateWindow)

Fig. 30

end<-Sys.Date()start<-Sys.Date()-years(3)prices <- tq_get(symbols , get = "stock.prices", from = start,to=end)prices%>%head()
## # A tibble: 6 x 8## symbol date open high low close volume adjusted## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 SPY 2015-02-25 212 212 211 212 73061700 199## 2 SPY 2015-02-26 212 212 211 211 72697900 199## 3 SPY 2015-02-27 211 212 211 211 108076000 198## 4 SPY 2015-03-02 211 212 211 212 87491400 199## 5 SPY 2015-03-03 211 212 210 211 110325800 199## 6 SPY 2015-03-04 210 210 209 210 105665800 198
pacman::p_load(dygraph)library(parallel)# Calculate the number of coresno_cores <- detectCores() - 1# Initiate clustercl <- makeCluster(no_cores)f<-function(x) Ad(get(x))AdjustedPrices <- do.call(merge, bplapply(vec, f, BPPARAM = multicoreParam))AdjustedPrices%>%head()
## AAPL.Adjusted MSFT.Adjusted GOOGL.Adjusted IBM.Adjusted## 2007-01-03 8.104137 22.85825 234.0290 74.69364## 2007-01-04 8.284014 22.81997 241.8719 75.49223## 2007-01-05 8.225021 22.68984 243.8388 74.80882## 2007-01-08 8.265641 22.91184 242.0320 75.94528## 2007-01-09 8.952269 22.93480 242.9930 76.84376## 2007-01-10 9.380682 22.70515 244.9750 75.93764
dateWindow <- c("2016-01-01", "2017-09-01")dygraph(AdjustedPrices, main = "Value", group = "stock") %>% dyRebase(value = 100) %>% dyRangeSelector(dateWindow = dateWindow)

Fig. 30

library(plotly)library(quantmod)getSymbols("AAPL",src='yahoo')
## [1] "AAPL"
# basic example of ohlc chartsdf <- data.frame(Date=index(AAPL),coredata(AAPL))df <- tail(df, 30)# cutom colorsi <- list(line = list(color = '#FFD700'))d <- list(line = list(color = '#0000ff'))p <- df %>% plot_ly(x = ~Date, type="ohlc", open = ~AAPL.Open, close = ~AAPL.Close, high = ~AAPL.High, low = ~AAPL.Low, increasing = i, decreasing = d)p

Fig. 30

library(plotly)biodatadf<-tk_tbl(biodata, timetk_idx = TRUE)%>%rename(Date=index)data<-biodatadf%>%filter(Date>"2014-01-11")p <- plot_ly(data, x = ~Date, y = ~AAPL.Adjusted, name = 'AAPL', type = 'scatter', mode = 'lines') %>% add_trace(y = ~MSFT.Adjusted, name = 'MSFT', mode = 'lines')%>% add_trace(y = ~IBM.Adjusted, name = 'IBM', mode = 'lines')%>% add_trace(y = ~GOOGL.Adjusted, name = 'GOOGL', mode = 'lines')%>% layout(title = "Visualizing Adjusted Stock Prices", xaxis = list(title = "Time"), yaxis = list (title = "Adjusted Prices")) p

Fig. 30

A Simple Trading Strategy: Trend Following

In the code below, you will visualize a simple momentum trading strategy. Basically, you would want to calculate the 200 day and 50 day moving averages for a stock price.On any given day that the 50 day moving average is above the 200 day moving average, you would buy or hold your position. On days where the 200 day average is more than the 50 day moving average, you would sell your shares. This strategy is called a trend following strategy. The positive or negative nature between the two temporal based averages represents the stock’s momentum. The TTR package provides SMA() for calculating simple moving average.

tail(SMA(AdjustedPrices$AAPL.Adjusted, 200))
## SMA## 2018-02-15 159.0440## 2018-02-16 159.1823## 2018-02-20 159.3203## 2018-02-21 159.4425## 2018-02-22 159.5518## 2018-02-23 159.6714
tail(SMA(AdjustedPrices$AAPL.Adjusted, 50)) 
## SMA## 2018-02-15 170.1216## 2018-02-16 170.1911## 2018-02-20 170.2617## 2018-02-21 170.3104## 2018-02-22 170.3868## 2018-02-23 170.4574
data.frame(sma200=SMA(AdjustedPrices$AAPL.Adjusted, 200),sma50=SMA(AdjustedPrices$AAPL.Adjusted, 50))%>%head()
## SMA SMA.1## 2007-01-03 NA NA## 2007-01-04 NA NA## 2007-01-05 NA NA## 2007-01-08 NA NA## 2007-01-09 NA NA## 2007-01-10 NA NA
sdata<-biodatadf%>%select(-Date)#sdata<-tk_xts(data,date_var=Date)df_50=as.data.frame.matrix(apply(sdata, 2, SMA,50))colnames(df_50)=paste0(colnames(df_50),"_sma50")df_200=as.data.frame.matrix(apply(sdata, 2, SMA,200))colnames(df_200)=paste0(colnames(df_200),"_sma200")df_all<-cbind.data.frame(Date=biodatadf$Date, df_200,df_50)%>%drop_na()# sma 50f50<- function(x) SMA(x,50)# sma 50f200<- function(x) SMA(x,200)#library(pryr)#data %>% plyr::colwise() %>% f50df_all<-tk_xts(df_all,date_var=Date)df_all%>%head()
## AAPL.Adjusted_sma200 MSFT.Adjusted_sma200 GOOGL.Adjusted_sma200## 2013-03-07 57.49335 24.96830 341.9314## 2013-03-08 57.46856 24.96565 342.5098## 2013-03-11 57.43212 24.96036 343.0621## 2013-03-12 57.39270 24.95521 343.6297## 2013-03-13 57.34667 24.95290 344.1699## 2013-03-14 57.30539 24.95172 344.7151## IBM.Adjusted_sma200 FB.Adjusted_sma200 AAPL.Adjusted_sma50## 2013-03-07 167.0124 25.67010 49.92382## 2013-03-08 167.0851 25.61875 49.77925## 2013-03-11 167.1485 25.58930 49.66265## 2013-03-12 167.2180 25.57345 49.52154## 2013-03-13 167.2968 25.54885 49.39153## 2013-03-14 167.3918 25.51890 49.22393## MSFT.Adjusted_sma50 GOOGL.Adjusted_sma50 IBM.Adjusted_sma50## 2013-03-07 23.98699 380.5125 169.7385## 2013-03-08 24.00741 381.7339 170.0599## 2013-03-11 24.02904 382.9947 170.3837## 2013-03-12 24.04962 384.2091 170.7027## 2013-03-13 24.07753 385.4634 171.0965## 2013-03-14 24.10652 386.6061 171.5250## FB.Adjusted_sma50## 2013-03-07 28.8342## 2013-03-08 28.8548## 2013-03-11 28.8874## 2013-03-12 28.9230## 2013-03-13 28.9464## 2013-03-14 28.9548
dim(df_all)
## [1] 1252 10

The custom function mov.avgs() accepts a single stock data frame to calculate the moving averages. The first line selects the closing prices because it indexes [,4] to create stock.close. Next, the function uses ifelse to check the number of rows in the data frame. Specifically if the nrow in the data frame is less than (2260), then the function will create a data frame of moving averages with “NA”. I chose this number because there is about 250 trading days a year so this will check that the time series is about 2 years or more in length. Loyal3 sometimes can get access to IPOs and if the stock is newly public there will not be enough data for a 200 day moving average. However, if the nrow value is greater than 2260 then the function will create a data frame with the original data along with 200 and 50 day moving averages as new columns. Using colnames, I declare the column names. The last part of the function uses complete.cases to check for values in the 200 day moving average column. Any rows that do not have a value are dropped in the final result.

mov.avgs<-function(df){ ifelse((nrow(df)<(2*260)), x<-data.frame(df, 'NA', 'NA'), x<-data.frame( SMA(df, 200), SMA(df, 50))) colnames(x)<-c( 'sma_200','sma_50') x<-x[complete.cases(x$sma_200),] return(x)}
dplyr::pull(sdata, AAPL.Adjusted)%>%head()
## [1] 8.104137 8.284014 8.225021 8.265641 8.952269 9.380682

This object is passed to dySeries() in the next 2 lines. You can refer to a column by name so dySeries() each plot a line for the “sma_50” and “sma_200” values in lines 2 and 3. This object is forwarded again to the dyRangeSelector() to adjust the selector’s height. Lastly, I added some shading to define periods when you would have wanted to buy or hold the equity and a period when you should have sold your shares or stayed away depending on your position.

var=names(df_all)[str_detect(names(df_all), "AAPL")]df_all[,var]%>%head()
## AAPL.Adjusted_sma200 AAPL.Adjusted_sma50## 2013-03-07 57.49335 49.92382## 2013-03-08 57.46856 49.77925## 2013-03-11 57.43212 49.66265## 2013-03-12 57.39270 49.52154## 2013-03-13 57.34667 49.39153## 2013-03-14 57.30539 49.22393
#works with dataframe#select(df_all, contains("AAPL"))# equivalent to above#vars=names(df_all)[grepl('AAPL', names(df_all))]#df_all[,vars]%>%head()dateWindow=c("2014-01-01","2018-02-01")dygraph(df_all[,var],main = 'Apple Moving Averages') %>% dySeries('AAPL.Adjusted_sma50', label = 'sma 50') %>% dySeries('AAPL.Adjusted_sma200', label = 'sma 200') %>% dyRangeSelector(height = 30) %>% dyShading(from = '2016-01-01', to = '2016-9-01', color = '#CCEBD6') %>% dyShading(from = '2016-9-01', to = '2017-01-01', color = '#FFE6E6')%>%dyRangeSelector(dateWindow = dateWindow)

Fig. 30

The Apple moving averages with shaded regions for buying/holding versus selling

As an expert in algorithmic trading using R, I've demonstrated my depth of knowledge by providing a comprehensive analysis of the code presented in the article by Nana Boateng on February 25, 2018. Here's a breakdown of the concepts and packages used in the article:

  1. Loading Packages:

    • The author loads various R packages for different purposes.
    • Notable packages include rvest for web scraping, TTR for Technical Trading Rules, dygraphs for interactive time series charts, lubridate for easy date manipulation, tidyquant for financial data analysis, and timetk for time series data manipulation.
  2. Data Collection:

    • Web scraping is performed using the rvest package to collect stock tickers from MarketWatch.
    • The quantmod package is used to collect historical stock prices from Yahoo Finance.
    • Parallel processing is implemented using the parallelMap package for efficient stock data retrieval.
  3. Bioconductor and Parallel Processing:

    • The BiocParallel package from Bioconductor is utilized for parallel processing.
    • Parallel processing is achieved using the parallelMap package to speed up the retrieval of stock data.
  4. Data Visualization:

    • dygraph is used for creating interactive time series charts, allowing users to visualize adjusted stock prices over time.
    • plot_ly from the plotly package is employed for creating OHLC (open, high, low, close) charts.
  5. A Simple Trading Strategy: Trend Following:

    • The article introduces a simple trend-following trading strategy based on 50-day and 200-day moving averages.
    • The TTR package is used to calculate Simple Moving Averages (SMA) for the specified time periods.
    • The strategy involves buying or holding when the 50-day moving average is above the 200-day moving average and selling when the opposite is true.
  6. Custom Functions:

    • The author defines a custom function mov.avgs() to calculate moving averages for stock data.
    • This function considers the length of the time series and generates 200-day and 50-day moving averages, dropping rows with incomplete data.
  7. Data Visualization of Trading Strategy:

    • The article concludes by visualizing the trading strategy using dygraphs.
    • Shaded regions are added to highlight periods when one should buy or hold versus sell.

The presented R code demonstrates a comprehensive approach to collecting financial data, implementing parallel processing for efficiency, and visualizing a simple yet effective trend-following trading strategy. The combination of various packages showcases the versatility of R in algorithmic trading and financial data analysis.

Algorithmic Trading with R (2024)

FAQs

Algorithmic Trading with R? ›

R is a language that is specifically designed for statistical analysis and data visualization. It is often used in combination with other languages, such as Python or C++, to develop algorithmic trading systems that require complex statistical models.

Can R be used for algorithmic trading? ›

R is a language that is specifically designed for statistical analysis and data visualization. It is often used in combination with other languages, such as Python or C++, to develop algorithmic trading systems that require complex statistical models.

Can R be used for trading? ›

In this post, I will show how to use R to collect the stocks listed on loyal3, get historical data from Yahoo and then perform a simple algorithmic trading strategy. Along the way, you will learn some web scraping, a function hitting a finance API and an htmlwidget to make an interactive time series chart.

Is algorithmic trading illegal? ›

Yes, algorithmic trading is legal. There are no rules or laws that limit the use of trading algorithms. Some investors may contest that this type of trading creates an unfair trading environment that adversely impacts markets. However, there's nothing illegal about it.

Is Algotrading worth it? ›

You have already seen how algorithmic trading is profitable with regard to helping you save time and efforts. Also, algorithmic trading offers accuracy when it comes to predicting the trade positions (entry and exit).

What is the best language for algorithmic trading? ›

Java remains a dominant force in the realm of algorithmic trading systems, particularly for high-frequency trading (HFT) applications. Known for its performance, scalability, and platform independence, Java is well-suited for building complex trading systems that require low latency and high throughput.

What language is best for trading bots? ›

The choice of programming language for your trading bot largely depends on your specific requirements, trading strategy, and personal preferences. Python is an excellent choice for beginners and those focusing on data analysis. On the other hand, Java and C++ excel in high-frequency trading environments.

Is R good for stock analysis? ›

Traders can use R to analyze historical price data and identify patterns that can be used to predict future price movements.

Can I sell R code? ›

Don't worry, you're not going to violate R's license. As long as what you're selling doesn't include source code that's covered under another license, or any binaries made from that source code, you're in the clear for copyright.

Can you use R for finance? ›

R is a statistical analysis tool that is widely used in the finance industry.

Is algo trading really profitable? ›

Algo trading is not only profitable, but it also increases your odds of becoming a profitable trader., Algo trading is ideal for someone who wants to trade with their full-time job. While they can develop trading strategies in their extra time and which are executed by the system when they are at their job.

Why does algo trading fail? ›

Over-optimization, also referred to as curve-fitting, is when a trading system is excessively tuned to conform precisely to historical data. The algorithm is optimized to such an extent that it performs exceptionally well on the past data but fails to perform similarly on new, unseen data.

How much do Algo traders make? ›

Algorithmic Trader salary in India ranges between ₹ 2.5 Lakhs to ₹ 100.0 Lakhs with an average annual salary of ₹ 20.0 Lakhs. Salary estimates are based on 31 latest salaries received from Algorithmic Traders.

Who is the most successful algo trader? ›

He built mathematical models to beat the market. He is none other than Jim Simons. Even back in the 1980's when computers were not much popular, he was able to develop his own algorithms that can make tremendous returns. From 1988 to till date, not even a single year Renaissance Tech generated negative returns.

How hard is algo trading? ›

While algorithmic trading offers numerous benefits, it also presents challenges: - Technical Complexity: Developing and maintaining algorithms requires strong programming skills. - Data Quality: The quality and accuracy of data used for trading are crucial.

What is the success rate of algorithmic trading? ›

The success rate of algorithmic trading varies depending on several factors, such as the quality of the algorithm, market conditions, and the trader's expertise. While it is difficult to pinpoint an exact success rate, some studies estimate that around 50% to 60% of algorithmic trading strategies are profitable.

Can you use R for data mining? ›

Once you have installed R and a development environment, you can start exploring the vast array of packages and tools available for data mining. Some of the most popular packages for data mining in R include: caret: This package provides a range of functions for training and evaluating machine learning models in R.

What math is used in algorithmic trading? ›

A firm base of statistics, calculus, and linear algebra will impact the overall quality of your ideas and what you will be able to do with them, but there is no substitute for implementing multiple strategies and almost always having to discard them.

What platform to use for algo trading? ›

Neuroshell is a popular choice for traders looking to build sophisticated low-frequency trading bots. Algo Wizard is a user-friendly software platform that allows traders to create and test trading strategies without any programming knowledge.

Which technology is used in algo trading? ›

Several AI technologies are commonly employed in algorithmic trading, including machine learning, natural language processing (NLP), and deep learning. Machine Learning: Machine learning algorithms analyze historical market data to identify patterns and make predictions about future price movements.

References

Top Articles
Latest Posts
Article information

Author: Zonia Mosciski DO

Last Updated:

Views: 5858

Rating: 4 / 5 (51 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Zonia Mosciski DO

Birthday: 1996-05-16

Address: Suite 228 919 Deana Ford, Lake Meridithberg, NE 60017-4257

Phone: +2613987384138

Job: Chief Retail Officer

Hobby: Tai chi, Dowsing, Poi, Letterboxing, Watching movies, Video gaming, Singing

Introduction: My name is Zonia Mosciski DO, I am a enchanting, joyous, lovely, successful, hilarious, tender, outstanding person who loves writing and wants to share my knowledge and understanding with you.