TMCnet News

Look Under the Hood of Global File Search Solutions
[August 21, 2018]

Look Under the Hood of Global File Search Solutions


With enterprises and OEMs looking to integrate search tools into their own products, Enterprise File Search (EFS) has emerged as one of today's hottest technology trends - with several recent high-profile IPOs and multimillion-dollar venture capital investments. However, a new iteration called Global File Search (GFS) may quickly surpass it, according to experts at software startup Cloudtenna.

GFS solutions are designed to search on-premise repositories, such as network servers and storage, email apps, cloud file services, and popular hosted collaboration suites that store documents. But with search technologies and architectures greatly varying, older search solutions lack connectors to many of these user file repositories. Enterprises and vendors evaluating GFS solutions should look closely at these three important criteria.

Speed-Accuracy Tradeoff

How fast are search results returned after submitting a query? Seconds? Minutes? A good experience requires search in under a second. And how quickly are file permissions updates reflected in search results? It's critical that users only see search results for files they have access permission to view.

All GFS software scan and index connected repositories to create its own file reference database. Access permissions, commonly referred to as ACLs, are then applied to the index to ensure each user only sees file results they have permission to view. The speed at which this takes place depends on the GFS software's underlying technology. The two common approaches have been "query-time binding," which enforces permissions at the time a user performs their search, and "early binding," which pre-processes file permissions according to a set schedule. Cloudtenna is introducing a new approach called "real-time binding," which builds its index and then performs consistency checks so any deltas are captured at the time a security change is made.

These three fundamentally different approaches deliver varying speeds and quality of results. Query-time binding ensures that security permissions are enforced in real-time, but it suffers from very long latency that significantly slows search results. Early binding trades off security to deliver a positive search-time user experience, but will frequently return results that don't reflect the up-to-the-minute file permissions, compromising file security. Real-time binding achieves speeds as fast as early binding but maintains accuracy because it works continually in the background.



Security Concerns

GFS software tools need to approach security and access control differently than EFS in order to return a list of files the specific searcher is authorized to view. After files are scanned and indexed, the GFS tool understands the organization's access control structures. If the software uses early binding, file permissions may be out of date by as much as a week. This means users can find and access files they are not allowed to view.


On the other hand, query-time binding, while inefficient and cumbersome, maintains ACLs and permissions for security because it performs lengthy system-intensive join operations to apply the file permission at the time of query.

GFS solutions that use real-time binding keep indexes updated with the latest ACLs to accommodate changes in file permissions as they happen, such as when an executive leaves the company. Real-time binding requires machine learning to match the speeds necessary to run continuously and ensure an always up-to-date permissions map.

Scalability

Several GFS options break down at scale based on how they are built. They attempt to mask that architectural limitation by capping the number of files they can accommodate per software instance. This can be acceptable to midsized organizations or departments with fewer than 200,000 files, or those using GFS as a point solution for a single repository such as a custom-built search function on a website. Enterprise organizations with considerably more files will find the costs untenable. More licenses, management, compute hardware, supporting infrastructure, and/or virtual compute instances add up rapidly. These types of GFS licensing may be per-seat for enterprise customers, but there are also the upfront and ongoing costs incurred in integration, especially in the case of OEM partners.

Aside from user and file limitations, many GFS systems are subject to repository limits. Most accommodate local machines and on-premise network shares in filers and NAS; fewer work across file sync-and-share services and clouds (Google Drive, Box, Dropbox, and Microsoft (News - Alert) OneDrive). GFS should also search files in email applications (Outlook and Gmail) and SaaS applications (including Salesforce, Slack, Jira, or Confluence).

"GFS solutions must be built and integrated properly based on each organization's or OEM's individual requirements," said Aaron Ganek, Cloudtenna CEO. "In a modern enterprise with thousands of employees and millions of files across dozens of repositories, data management and security are complex challenges that GFS solutions can alleviate or aggravate depending on their architectures."

Cloudtenna's DirectSearch™ works universally across on-premise repositories, cloud file storage services, and hosted/online applications. The search-once-and-done tool can find files by name, sender, date, file type, keyword, content, and other attributes regardless of where it is stored. DirectSearch uses machine learning intelligence, natural language processing, and automation to deliver relevant results and rankings fast - in 400-600 milliseconds.

Follow Cloudtenna

Twitter
Facebook
LinkedIn

About Cloudtenna

Cloudtenna was founded to bring order to file chaos with a suite of AI-powered applications for file management. Cloudtenna's team has decades of experience in both enterprise infrastructure and cloud file management services at leading companies including Rhapsody Networks, Oxygen Cloud, Symantec, Sun Microsystems, NetApp, EMC (News - Alert), Fusion.io, and VERITAS. The team has developed over 20 successful OEM programs from the ground up. Its executives are complemented by engineers who have made key contributions to the NetApp WAFL and VxFS code bases, among other file systems. Together, the Cloudtenna team is revolutionizing how people work with files inside the enterprise with the next generation of file management, file analytics, auditing, and governance. For more information visit www.cloudtenna.com.


[ Back To TMCnet.com's Homepage ]