File Path Parsing

Posted under tips on April 24 , 2019 by Swanand Nirgudkar

Extracting relevant information from a file path may seem an obvious task on the surface but achieving it in a single step can easily get tedious.

Though there are multiple ways of parsing a path in SheetKraft, a fairly easy solution using Regex is bound to achieve the same in most cases. For example let's consider a file path C:\Users\New User\Desktop\Example.txt. Using just Excel one often has to use a formula like

MID(A1,FIND("*",SUBSTITUTE(A1,"\","*",LEN(A1)-LEN(SUBSTITUTE(A1,"\",""))))+1,LEN(A1))

Where A1 is the address of the reference cell. Moreover one further has to use functions like LEFT and LEN to get the parent directory. On the contrary, if we use SheetKraft the same can be reduced to

RegexSearch.SK(A1,"(.*)\\(.*)","$2")

In this formula, the second argument is a Regex containing two capturing groups "(.*)" set apart by a separator "\", where the additional "\" is used for escaping. The third argument specifies that we are using the second capture group. Hence, simply replacing the $2 with $1 will fetch us the parent diectory. Also in case the separator is a forward slash "/" as in case of urls, the desired result can be achieved by substituting "\\" in the formula with "/".

Notes:

  • It would be better to use a Regex like (.*)\\([^\\]+)$. Here we are saying that the last group should not contain a backslash, must have at least one character and the string should end after it. The former Regex does work but relies on the Regex engine being more greedy on the first group than the second which is a dubious feature.
  • Multiple capture groups can be obtained out of RegexSearch by passing an array in the third argument. For instance, {"$1","$2"} will produce an array output containing the parent directory path and filename.
  • Any Regex with more than one .* can lead to an ambiguity about which one is more greedy. For example a\b\c can be considered as a\b followed by c or a followed by b\c. Even if there are definite rules regarding which one should the Regex engine prefer, performance with multiple greedy parts in a Regex will be poorer.


TAGGED:SheetKraftRegexFileCapture Group


Recent Posts


Database Table Indexing

January 03, 2020 bySwanand Nirgudkar

For an automated process any operation involving database is relatively expensive since it involves…

Continue Reading

Truncating Database Transaction Log

November 02, 2019 byNidhisha Shetty

Databases are usually in Full Recovery mode. This means that every transaction is retained in the…

Continue Reading

ActivityRunner Not Starting

October 11, 2019 bySwanand Nirgudkar

One may face issue while applying SheetKraft patch wherein ActivityRunner service refuses to start…

Continue Reading

Reshape instead of Fill-Down condition

May 02, 2019 bySaloni Gude

Originally published on Thu, 2 Mar, 2017 at 6:59 AM Fill is used to generate multiple values from an…

Continue Reading

Let's talk

Address

7th floor, Unit No. 701, C Wing,
Kailas Business Park, S. Veer Savarkar Marg,
Park Site, Vikhroli (W), Mumbai-400079.

 

Phone

+91 22 62362043

 

For General Queries & Technical Support

contact@quantumphinance.com

 

For Sale and Demos

sales@quantumphinance.com