SlideShare a Scribd company logo
File Searching Tools
Eric Roberts
Hoffman Lab
1
2
Presentation Guide
● In file Searching
● File / Path Searching
● Fuzzy Filtering
● Code Searching
3
Searching in files
● grep
○ ed (editor) command: g/re/p
■ Global, Regular Expression, Print
$ grep PATTERN [FILE..]
# Example
$ grep ‘^chr1’ *.bed
Regular Expression
Bash glob (implicitly adds all BED
files on the command prompt)
(Bash) Glob (not a regex)
4
grep examples
$ zcat segway.bed.gz | grep ‘^chr1’
$ grep --recursive --include ‘*.bed’ ‘^chr1’
Regular Expression
● Can be used without file patterns
5
Regular Expressions
● A syntax to specify specific subsets of characters
^chr[1-3]
Match all lines that start with “chr1”, “chr2” or “chr3”
● Different languages/libraries have different syntax
○ PCRE - Perl Compatible Regular Expressions
6
Fancy new grep alternatives
● Ack (2005)
○ Entirely in Perl
● Ag / The Silver Searcher (2011)
○ Fork of Ack
● Rg / Ripgrep (2016)
○ Based on Ag, written in Rust
- Recursive by default
- Filetype detection
- Obeys .*ignore files
- *Faster than grep
7
Ripgrep example
$ rg --glob ‘*.bed’ ‘^chr1’
8
Searching for files
8
● find
$ find [OPTIONS] [PATH] [expression/command]
# Example
$ find -name ‘*.bed’
● fd (alternative to find)
$ fd PATTERN [PATH]
# Example
$ fd ‘.*bed$’
Regular Expression
Bash glob
9
Fuzzy Filtering
9
● FZF
○ Takes an input source and interactively allows for fuzzy
searching
○ Defaults to a source of `find --type f`
● Example input sources:
○ Files
○ Bash History
○ Process Lists
Brief Live Demo!
10
11
Code Searching
11
● Cscope (1980’s)
○ C/C++, Java only
● Ctags
○ Lots of languages,
support in lots of editors
- Needs an
index/database to be
rebuilt periodically
- Can be found almost
everywhere
12
Language Server Protocol
12
● External process that acts a context-aware server for your
code and communicates with your editor
● Open standard between Microsoft, Red Hat, and CodeEnvy
● Common searching tasks:
○ Find all references
○ Go to definition
13
Summary
13
● Keep using grep and find for portability
● Consider using much faster/convenient alternatives (e.g. rg
and fd) when possible
● There is very likely a tool to help contextualize code on any
machine you use
● Consider using plugins / language servers for your editor of
choice when possible
Questions
14

More Related Content

PDF
Cpp lab 13_pres
PDF
The State of Go - Andrew Gerrand
PDF
My talk on GitHub open data at ITGM #10
PDF
Compact ordered dict__k_lab_meeting_
PDF
世界のどこかで楽しくRubyでお仕事するために
PPTX
C programming disk file reading and writing
ODP
Getting groovy (ODP)
Cpp lab 13_pres
The State of Go - Andrew Gerrand
My talk on GitHub open data at ITGM #10
Compact ordered dict__k_lab_meeting_
世界のどこかで楽しくRubyでお仕事するために
C programming disk file reading and writing
Getting groovy (ODP)

What's hot (20)

ODP
Compress and the other side
PPT
Filing system in PHP
PDF
2Bytesprog2 course_2014_c4_binaryfiles
PPT
Unit5 C
PPTX
Python data file handling
PPTX
Files and file objects (in Python)
PDF
Learning go for perl programmers
PPT
Presentation on files
PDF
File operations
DOC
Treebeard's Unix Cheat Sheet
PDF
Python - Lecture 9
PDF
PHP file handling
PPT
Perl Intro 8 File Handles
PPT
PHP - Introduction to File Handling with PHP
PDF
Unit3 browsing the filesystem
PDF
Find and Locate: Two Commands
PDF
Files in c
PPTX
CBSE - Class 12 - Ch -5 -File Handling , access mode,CSV , Binary file
PPTX
Grep - A powerful search utility
Compress and the other side
Filing system in PHP
2Bytesprog2 course_2014_c4_binaryfiles
Unit5 C
Python data file handling
Files and file objects (in Python)
Learning go for perl programmers
Presentation on files
File operations
Treebeard's Unix Cheat Sheet
Python - Lecture 9
PHP file handling
Perl Intro 8 File Handles
PHP - Introduction to File Handling with PHP
Unit3 browsing the filesystem
Find and Locate: Two Commands
Files in c
CBSE - Class 12 - Ch -5 -File Handling , access mode,CSV , Binary file
Grep - A powerful search utility
Ad

Similar to File searching tools (20)

PPT
101 3.7 search text files using regular expressions
PPTX
Terminal Velocity: Work faster in your shell
PDF
Lecture 18 - Regular Expressions.pdf
PPT
Spsl II unit
PPT
Unix command-line tools
PPT
grep and egrep linux presentation for lecture
PPT
3.7 search text files using regular expressions
PDF
Course 102: Lecture 13: Regular Expressions
PPTX
Using Regular Expressions in Grep
PPT
101 3.7 search text files using regular expressions
PPT
101 3.7 search text files using regular expressions
PDF
Linux intro 3 grep + Unix piping
PDF
15 practical grep command examples in linux : unix
PDF
20180324 leveraging unix tools
PDF
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
PPT
L4_grep command ppt for unix linux programming
PDF
Keynote 1 - Engineering Software Analytics Studies
PDF
Unit 8 text processing tools
PDF
Lecture_4.pdf
DOCX
Directories description
101 3.7 search text files using regular expressions
Terminal Velocity: Work faster in your shell
Lecture 18 - Regular Expressions.pdf
Spsl II unit
Unix command-line tools
grep and egrep linux presentation for lecture
3.7 search text files using regular expressions
Course 102: Lecture 13: Regular Expressions
Using Regular Expressions in Grep
101 3.7 search text files using regular expressions
101 3.7 search text files using regular expressions
Linux intro 3 grep + Unix piping
15 practical grep command examples in linux : unix
20180324 leveraging unix tools
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
L4_grep command ppt for unix linux programming
Keynote 1 - Engineering Software Analytics Studies
Unit 8 text processing tools
Lecture_4.pdf
Directories description
Ad

More from Hoffman Lab (20)

PPTX
Miller: A command-line tool for querying, shaping, and reformatting data files
PDF
GNU Parallel: Lab meeting—technical talk
PDF
TCRpower
PPTX
Efficient querying of genomic reference databases with gget
PPTX
WashU Epigenome Browser
PPTX
Wireguard: A Virtual Private Network Tunnel
PPTX
Plotting heatmap with matplotlib/seaborn
PPTX
Go Get Data (GGD)
PPTX
fastp: the FASTQ pre-processor
PPTX
R markdown and Rmdformats
PPTX
Better BibTeX (BBT) for Zotero
PPTX
Awk primer and Bioawk
PPTX
Terminals and Shells
PPTX
BioRender & Glossary/Acronym
PPTX
Linters in R
PPTX
BioSyntax: syntax highlighting for computational biology
PPTX
Get Good With Git
PDF
Tech Talk: UCSC Genome Browser
PDF
MultiQC: summarize analysis results for multiple tools and samples in a singl...
PPTX
dreamRs: interactive ggplot2
Miller: A command-line tool for querying, shaping, and reformatting data files
GNU Parallel: Lab meeting—technical talk
TCRpower
Efficient querying of genomic reference databases with gget
WashU Epigenome Browser
Wireguard: A Virtual Private Network Tunnel
Plotting heatmap with matplotlib/seaborn
Go Get Data (GGD)
fastp: the FASTQ pre-processor
R markdown and Rmdformats
Better BibTeX (BBT) for Zotero
Awk primer and Bioawk
Terminals and Shells
BioRender & Glossary/Acronym
Linters in R
BioSyntax: syntax highlighting for computational biology
Get Good With Git
Tech Talk: UCSC Genome Browser
MultiQC: summarize analysis results for multiple tools and samples in a singl...
dreamRs: interactive ggplot2

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
cuic standard and advanced reporting.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Getting Started with Data Integration: FME Form 101
PDF
Empathic Computing: Creating Shared Understanding
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Tartificialntelligence_presentation.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
1. Introduction to Computer Programming.pptx
A Presentation on Artificial Intelligence
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Spectral efficient network and resource selection model in 5G networks
cuic standard and advanced reporting.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Advanced methodologies resolving dimensionality complications for autism neur...
Assigned Numbers - 2025 - Bluetooth® Document
Dropbox Q2 2025 Financial Results & Investor Presentation
Digital-Transformation-Roadmap-for-Companies.pptx
Getting Started with Data Integration: FME Form 101
Empathic Computing: Creating Shared Understanding
Building Integrated photovoltaic BIPV_UPV.pdf
Electronic commerce courselecture one. Pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
20250228 LYD VKU AI Blended-Learning.pptx
Tartificialntelligence_presentation.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
1. Introduction to Computer Programming.pptx

File searching tools

  • 1. File Searching Tools Eric Roberts Hoffman Lab 1
  • 2. 2 Presentation Guide ● In file Searching ● File / Path Searching ● Fuzzy Filtering ● Code Searching
  • 3. 3 Searching in files ● grep ○ ed (editor) command: g/re/p ■ Global, Regular Expression, Print $ grep PATTERN [FILE..] # Example $ grep ‘^chr1’ *.bed Regular Expression Bash glob (implicitly adds all BED files on the command prompt)
  • 4. (Bash) Glob (not a regex) 4 grep examples $ zcat segway.bed.gz | grep ‘^chr1’ $ grep --recursive --include ‘*.bed’ ‘^chr1’ Regular Expression ● Can be used without file patterns
  • 5. 5 Regular Expressions ● A syntax to specify specific subsets of characters ^chr[1-3] Match all lines that start with “chr1”, “chr2” or “chr3” ● Different languages/libraries have different syntax ○ PCRE - Perl Compatible Regular Expressions
  • 6. 6 Fancy new grep alternatives ● Ack (2005) ○ Entirely in Perl ● Ag / The Silver Searcher (2011) ○ Fork of Ack ● Rg / Ripgrep (2016) ○ Based on Ag, written in Rust - Recursive by default - Filetype detection - Obeys .*ignore files - *Faster than grep
  • 7. 7 Ripgrep example $ rg --glob ‘*.bed’ ‘^chr1’
  • 8. 8 Searching for files 8 ● find $ find [OPTIONS] [PATH] [expression/command] # Example $ find -name ‘*.bed’ ● fd (alternative to find) $ fd PATTERN [PATH] # Example $ fd ‘.*bed$’ Regular Expression Bash glob
  • 9. 9 Fuzzy Filtering 9 ● FZF ○ Takes an input source and interactively allows for fuzzy searching ○ Defaults to a source of `find --type f` ● Example input sources: ○ Files ○ Bash History ○ Process Lists
  • 11. 11 Code Searching 11 ● Cscope (1980’s) ○ C/C++, Java only ● Ctags ○ Lots of languages, support in lots of editors - Needs an index/database to be rebuilt periodically - Can be found almost everywhere
  • 12. 12 Language Server Protocol 12 ● External process that acts a context-aware server for your code and communicates with your editor ● Open standard between Microsoft, Red Hat, and CodeEnvy ● Common searching tasks: ○ Find all references ○ Go to definition
  • 13. 13 Summary 13 ● Keep using grep and find for portability ● Consider using much faster/convenient alternatives (e.g. rg and fd) when possible ● There is very likely a tool to help contextualize code on any machine you use ● Consider using plugins / language servers for your editor of choice when possible

Editor's Notes

  • #2: This presentation will focus on searching and mostly searching for files and the stuff in files. This presentation is not just how to use grep and find though I’ll go briefly go over those since they are still very important tools. Practically though this presentation will introduce you to alternatives and a few tools that I’ve been consistently using over the past few years.
  • #4: To search in files I have to touch on the ubiquitous grep command. The command itself is extracted from the very common subcommand of the original UNIX editor somewhat aptly named ed. The command used was Global Regular Expression Print. Regular expressions is syntax for specifying specific subsets of text. Here I am giving a very brief but important example on using grep. The pattern portion of grep is a regular expression, the file portion is often a bash glob. Bash globs are *not* regular expressions. This glob pattern expands out to all BED files in the current directory. Also single quotes around the regular expression is really important so bash doesn’t accidentally interpret as a bash glob an accidentally expand out somehow.
  • #5: Here are examples with grep with no files specified. Grep can read from standard input in no files are specified and can implictly look in every file when recursive is specified. Note again the mix between GLOB patterns and regular expressions and having to single quote them so the bash shell doesn’t expand them as well.
  • #6: So here I’m going to very briefly go over regular expressions since they are often very language specific each with lots of documentation and examples. The most notable regular expression syntax “standard” if there is called “PCRE”. It is the regular expression syntax that seeks to be compatible with what comes with the Perl language. Very often search tools will have options for PCRE compliance or not or will state when/how they are not etc.
  • #7: From slower to much faster than grep in most cases. There are lots of page on benchmarks for lots of cases. It is not uncommon to see a 10x speedup of rg over grep. Rg also an explicit PCRE2 compliant flag if you so wish instead of Rust’s own regex syntax/engine. These alternatives are very convenient and much faster. Saying that grep is still very important to know and use for a few very good reasons: Portability, if you need your bash script to use a search, you should use grep obviously Older/limited system where Perl or Rust are unavailable. Also I excluded here Pg, which is the Platinum searcher, written in Go. I’ve had no experience with it but it is also very likely a better stand-in replacement for grep in most cases
  • #8: To search in files I have to touch on the ubiquitous grep command. The command itself is extracted from the very common subcommand of the original UNIX editor somewhat aptly named ed. The command used was Global Regular Expression Print. Regular expressions is syntax for specifying specific subsets of text. Here I am giving a very brief but important example on using grep. The pattern portion of grep is a regular expression, the file portion is often a bash glob. Bash globs are *not* regular expressions. This glob pattern expands out to all BED files in the current directory. Also single quotes around the regular expression is really important so bash doesn’t accidentally interpret as a bash glob an accidentally expand out somehow.
  • #9: Fd also respects ignore files Fd seems to benchmark much faster than find Fd also allows you to search for files using a regular expression. You could always do this but you would run find without any filter then pipe into grep (or ripgrep).
  • #10: Should show examples how this looks
  • #12: Should show examples how this looks
  • #13: So while there are a dime-a-dozen code-based plugins (to help search) for each editor, which are fine it is probably worth while mentioning a new rising standard in the world of code editors and that is the Language Server Protocol This is the default in the Visual Studio Code editor in which this standard came from. But if your editor supports the protocol, you should have effectively the same code-context features as Visual Studio Code does. Of course the searching tasks isn’t all that a language server provides. For example you can rename a variable across your entire codebase, autocompletion, documentation, type definitions, etc.
  • #14: So while there are a dime-a-dozen code-based plugins (to help search) for each editor, which are fine it is probably worth while mentioning a new rising standard in the world of code editors and that is the Language Server Protocol This is the default in the Visual Studio Code editor in which this standard came from. But if your editor supports the protocol, you should have effectively the same code-context features as Visual Studio Code does. Of course the searching tasks isn’t all that a language server provides. For example you can rename a variable across your entire codebase, autocompletion, documentation, type definitions, etc.