Download And Unzip files using Power BI Desktop and R Script
- Prathy Kamasani

- Apr 4, 2016
- 3 min read
Updated: Nov 14
Most of the open data source providers supply data in a compressed format, especially when files are large. That's what I noticed with Company House data. I wanted to explore company house data using Power BI. But, I wasn't at all looking forward to downloading each file then unzip so that I can import into Power BI model. I have many files to work with, so I wanted to automate things. At least, I wanted to unzip all files using Power BI. Quite quickly, I understood Power BI currently doesn't support all compressed formats. Chris Webb explains Here, how to work with compressed files and why currently all compressed formats aren't supported.
However, Power BI supports R Script, which means I can easily automate my steps. Until Power BI Desktop supports unzip functionality, we can use new R Script in the Power BI Desktop. Power BI support documented very well regarding how to run R scripts in Power BI. Using R Script, I can download and unzip company house files.
R Script as a source in Power BI Desktop:
Microsoft recommends it's better to write and execute R Scripts in RStudio( or any other R tools) before you use it in Power BI. Power BI doesn't have a rich R interface, but I found it does give reasonable error information. Currently, R Script imports only data frames to Power BI.
Here is a simple R data frame in RStudio and Power BI Desktop

RScript in R Studio

RScript data frame in R Studio

R Script In Power BI
Power BI Desktop uses R. Execute to execute R Script.
Download and Unzip from web using Power BI and R Script:
I wanted to download more than one file which has different URL with an incremental number. Hence, I used for loop to get the file number I wanted to download. With few lines of R script, I can easily download and unzip all these files.
R script I used in R Studio:
for(i in 0:4){URLFull="http://download.companieshouse.gov.uk/BasicCompanyData-2016-03-01-part"i=i+1SourceURL=paste(URLFull,i,"_5.zip",sep="")Destination="\\\\Mac\\Home\\Desktop\\Working\\csv\\"DestinationURL=paste(Destination,i,".zip",sep="")download.file(url=SourceURL, destfile=DestinationURL, method = "auto", quiet=FALSE)unzipfolder="\\\\Mac\\Home\\Desktop\\Working\\csv\\CSVFiles"unzip (zipfile=DestinationURL,exdir=unzipfolder)}This script doesn't return any data frame. If I use the same query in PowerBI Desktop, it would still download and unzip files, but I wouldn't be able to use it as a step in the Power BI query.
We can either return an empty R Data Frame and create a separate query in Power BI to read the unzipped files or use a blank query as a source instead of R Script as the source, and do all operations in the single query.
Same script as above in Power BI Desktop returning empty data frame:
for(i in 0:1){URLFull="http://download.companieshouse.gov.uk/BasicCompanyData-2016-03-01-part"i=i+1SourceURL=paste(URLFull,i,"_5.zip",sep="")Destination="\\\\Mac\\Home\\Desktop\\Working\\csv\\"DestinationURL=paste(Destination,i,".zip",sep="")download.file(url=SourceURL, destfile=DestinationURL, method = "auto", quiet=FALSE)unzipfolder="\\\\Mac\\Home\\Desktop\\Working\\csv\\CSVFiles"unzip (zipfile=DestinationURL,exdir=unzipfolder)}df= data.frame()Alternatively, same R Script without returning any data frame, this executes the R Script and does other Power Query operations:
letSource = R.Execute("for(i
in
0:4)#(lf)#(lf){#(lf)#(lf)URLFull=""http:
//download.companieshouse.gov.uk/BasicCompanyData-2016-03-01-part""#(lf)#(lf)i=i+1#(lf)#(lf)SourceURL=paste(URLFull,
i,""_5.zip"",
sep = """")#(lf)#(lf)Destination=""\\\\Mac\\Home\\Desktop\\Working\\csv\\""#(lf)#(lf)DestinationURL=paste(Destination,
i,"".zip"",
sep = """")#(lf)#(lf)download.file(url=SourceURL,
destfile = DestinationURL,
method = ""auto"",
quiet = FALSE)#(lf)#(lf)unzipfolder=""\\\\Mac\\Home\\Desktop\\Working\\csv\\CSVFiles""#(lf)#(lf)unzip (zipfile=DestinationURL,
exdir = unzipfolder)#(lf)}"),
df1 = Folder.Files("\\Mac\Home\Desktop\Working\csv\CSVFiles")indf1
Downloaded R Files in Power BI
That's all; as usually now we can do additional operations in Power BI. However, there are few things to consider:
There's a 30 min timeout limit on all R Scripts, if you're downloading many large files, Power BI may raise a timeout error.
Every time you refresh PowerBI it would re-run the R Script; work around for this to run R Script using one query and do other Power BI operations in another query so you can disable the query which using R Script.
Among these are few other things I noticed:
I always had to import a data frame. Otherwise, it would still execute the R script, but I would have nothing in Power BI. In fact, I had to cancel the query, which didn't save R Script either.
If I cancel Power query refresh, it will lock the files or folders which were in the process – need to close and reopen the Power BI to unlock!
If I want to change the R Script, before it finishes execution or Power BI refresh, again Power BI caches the script or values I given in the script, which mean I need to close and reopen Power BI. E.g. After I canceled the query and If I changed the for loop condition, it will still use the condition I given before I cancelled the query.
Overall it's exciting to see R Script in Power BI. It enables many features those Power BI doesn't support currently. Remember you never know what incredible Power BI team can deliver; an element which is missing today may appear in the next minute 🙂
Till next time,
Prathy 🙂



Comments