Evolution of Retrieval-Augmented Generation (RAG) Models from Big Data Analytics (Big D) to Graph-Based Legal AI Systems (SwarmLexAI)

Jose Marchena

Apr 1

Written By

Jose Marchena

Co-Presenters: Individual Presentation

College: The Dorothy and George Hennings College of Science, Mathematics and Technology

Major: Computer Science

Faculty Research Mentor: Yulia Kumar

Abstract:

The evolution of Retrieval-Augmented Generation (RAG) models has significantly advanced AI-driven document analysis, knowledge retrieval, and contextual understanding. This research presents a two-phase trajectory, transitioning from Big D, a RAG-based system designed for large-scale business document analysis, to SwarmLexAI, a GraphRAG and Lazy GraphRAG-based approach tailored for multilingual legal document retrieval. Phase I (Big D) dived big data challenges—volume, velocity, and variety—by integrating Latent Semantic Indexing (LSI) and NLP techniques within an expanded "spectrum of Vs" framework, adding dimensions like value, validity, and visualization. Key features included data preprocessing (tokenization, lemmatization, stop-word removal), precise document retrieval (RAG and LSI techniques), and multi-dimensional data visualization (word clouds, sentiment charts, and PCA-based 3D tokenization plots). Applied primarily to financial document analysis, Big D demonstrated the power of AI in extracting actionable insights from complex datasets and bridging the gap between raw data and strategic decision-making. Phase II (SwarmLexAI) extended these capabilities by incorporating GraphRAG and Lazy GraphRAG to analyze multilingual legal documents in English, Spanish, German, and French. Using swarm-based multi-agent AI systems, spaCy, and fastText, this phase focused on graph-based knowledge extraction, identifying legal entities, clauses, and cross-lingual relationships. SwarmLexAI implemented graphical document representations, dynamically mapping semantic and structural relationships within legal texts for comparative legal analysis across jurisdictions. The research contrasts GraphRAG, which excels in in-depth structured legal analysis, with Lazy GraphRAG, optimized for speed and high-level document insights, highlighting trade-offs between accuracy and efficiency. Findings indicate that graph-augmented retrieval enhances contextual reasoning and knowledge representation, making it essential for legal AI systems requiring precision, interpretability, and adaptability. Future research will explore interactive graph visualizations, hybrid models combining GraphRAG and Lazy GraphRAG, cross-domain applications, dynamic graph updates, and optimized entity extraction. This research underscores the potential of GraphRAG methodologies in transforming intelligent document processing, offering a scalable, multilingual AI framework applicable across business intelligence, legal compliance, and knowledge-driven decision-making.

Faculty Mentor: Yulia KumarMajor: Computer ScienceCollege: Hennings College of Science Mathematics and Technology

Evolution of Retrieval-Augmented Generation (RAG) Models from Big Data Analytics (Big D) to Graph-Based Legal AI Systems (SwarmLexAI)​

Jose Marchena

Abstract:

The Impact of Drawing Skills on Academic Performance Among STEM Students

A mixed methods exploration of the social and emotional impact of Latin dance and intergenerational programming on community dwelling older adults.

Evolution of Retrieval-Augmented Generation (RAG) Models from Big Data Analytics (Big D) to Graph-Based Legal AI Systems (SwarmLexAI)