Facebook AI Introduces ‘Neural Databases’, A New Approach Which Enables Machines to Search Unstructured Data and Connect The Fields of Databases and NLP

Data databases are essential components of nearly every computer program and online service. However, they can be rigid structures that constrain how the data could actually be used. The schemas need to preset for each database system; this means queries must have well-defined semantics written in SQL (structured query language). It’s hard to take advantage of unstructured data with a pre-set schema because everything needs to fit into its specific requirements.

Facebook’s new approach, called neural databases, will allow machines to search unstructured data – such as vast collections of text or recordings of songs. This could one day enable users to run complex queries like “what is the third-longest entry about a Russian novelist?” directly on Wikipedia.

A neural database bridges an important gap between the fields of databases and NLP, allowing people to pose ad hoc queries like “how many teams won away games by more than three points?” This is possible because, unlike standard structured data that only contains information in a strict format, these new systems have access to unstructured sources such as text. However, there are limitations since existing neural networks can’t query on collections without any structure at all. Machine learning models are powerful for tasks where data semantics are unclear, but they don’t have the benefits composition has. This makes it difficult or impossible to extend them to closely related predictions like “What percentage of reviews are positive?” Or even more specific questions like “How many directors under 30 released positively reviewed horror movies in the 1970s?”.

The proposed neural database architecture operates over textual facts with parallelizable nonblocking operators before aggregating the results. The three core components of this architecture are a support set generator (SSG), which retrieves small sets of relevant facts; a parallel, non-blocking operator that generates intermediate answers and can be unioned to produce the final answer; and an optional aggregation stage. The new architecture for neural networks has been successful because it takes advantage of what they do best: reasoning over a small set of facts.

This research by Facebook AI indicates that this method can scale; it can reason over many sets of facts as the number increases.

https://ai.facebook.com/blog/using-ai-for-database-queries-on-any-unstructured-data-setSource: https://ai.facebook.com/blog/using-ai-for-database-queries-on-any-unstructured-data-set

Neural databases, which combine machine learning and traditional database technologies, could one day enable people to access the entire world’s information. They would be able to search through any data that is available online but not stored in a standard database format like text or images. This includes newspapers articles from 80 years ago or genetic sequence records – anything! Since so much of the world’s knowledge exists outside of normal databases already today, neural applications will likely play an important role for research as well as everyday tasks soon enough.

Codes: https://github.com/facebookresearch/NeuralDB?

Paper (Neural Databases): https://arxiv.org/pdf/2010.06973.pdf

Paper (Database reasoning over text): https://aclanthology.org/2021.acl-long.241.pdf

Source: https://ai.facebook.com/blog/using-ai-for-database-queries-on-any-unstructured-data-set