One of the most difficult parts of gaining a deeper understanding of Regex is the ability to think through how to properly encapsulate your logic with the right operators for all the characters you don’t want to select. While it’s fairly straight forward to know what you want your queries’ output data to look like, much of writing regular expressions is writing queries to exclude characters. When writing Regex it’s important to think about the problem sequentially and narrow down the selected data step-by-step till you have what you’re looking for.
Recursive Step-By-Step
Tip: Open a new Rubular tab for every step. This helps eliminate uncertainty from your output.
- Determine the exact output you’d like to retrieve from the string.
- Write a the simplest Regex expression you can think of to select what string you’re looking for. Next, determine which unintended characters that you don’t want are being selected from the previous query.
- From the list of unintended characters that you don’t want, take note of each of their data types. For each determine if they’re a digit, white-space, word, etc.
- In a new Rubular tab determine which selector allows you to select the maximum amount of the characters that you don’t want.
- You should strive for your don’t query to not select your desired output but the difficult part about this is recognizing that in this query it’s okay for your don’t query to be selecting it if at this stage.
- Continue recursively on the above steps till you’re certain that you have: The best query to select all of the characters that you do want in one tab and the best query to select all of the characters that you don’t want in another tab.
- Next, ask yourself if there are any easily identifiable boundaries on either end of the desired output. It’s important to note that if you’re using string specific boundaries to narrow in on your query you’ll need to ensure that all data being sent to the query is standardized with these boundaries. These boundaries can be any combination of characters – even white-space counts.
- Now, begin iterating on attempting to combine your want and don’t want queries together in yet another Rubular tab by using your tool kit of 1. encapsulation, 2. start/end queries that specify location, and 3. operators like |,{1},+, etc..
- As you work through combining your queries always keep your mindset focus on finding exactly what you need to do to NOT select current items that are being selected.
It’s helpful to follow Rubular’s Regex quick reference from top to bottom and left to right.