Listen to this story
The Machine Learning Developers Summit (MLDS) 2023 concluded last week with numerous keynote sessions by industry experts. These sessions also included presentations of research papers authored by academics and professionals in the field. During the conference, the researchers presented their works and key findings before industry experts and attendees. These presented papers have been published in Lattice – The Machine Learning Journal , hosted and managed by the Association of Data Scientists (ADaSci) . Here is the list of the top five papers presented during MLDS.
By Rohan Kumar and Parimesh Panda, Data Scientists at Genpact
This research work, presented by the team of data scientists at Genpact , aims to decrease the demand forecast model training cycles by leveraging unsupervised techniques. Their research addressed the issue of retail manufacturers in predicting customer demand for each product at superior forecast accuracy levels that requires high computational expenses.
They have used a clustering-based demand forecasting framework to identify clusters of products with similar customer purchasing behaviour. Their experimental approach utilised this framework to predict the customer demand for more than 500 dairy products for the next eight weeks. A comparative study on computational time across product-level and cluster-level model training has been presented to realise relaxation in computational costs better.
AIM Daily XO
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
2. Visualization techniques for the training of Empirical Deep Reinforcement Learning (DRL) agents with continuous state and action spaces
By Gaurav Adke, Senior Data Scientist at Michelin
Visualization of the reinforcement learning environment and learning dynamics of an agent is a vital step for debugging and a better understanding of the learnt policy. For environments with optimisation of real-world multidimensional spaces with continuous variables, such as optimisation of chemical process parameters, it is challenging and complex to observe agents’ behaviour with visualization.
In his research paper, Gaurav presented a reinforcement learning agent developed to optimise the production process of rubber mix for the tyre industry. This research attempts to visualise an agent’s training and inference for high-dimensional state space problems with continuous state and action spaces.
Download our Mobile App
By Paritosh Sinha, Senior Data Scientist at Uber
Most engineering product improvements are driven based on feedback from users and engineers. B2C products are used to target customers, send personalised communications, manage order requests, and track event-level actions and failures to improve product performance. However, the volume of failure logs and their unstructured nature often hinder the detection of underlying themes from event failures.
This paper by Paritosh discusses a unique and highly efficient approach to tune and leverage a language model for embedding generation. Using a weighted clustering technique, the embeddings are subsequently used to group failures into auto-detectable themes. The paper has also presented distinctive methods to manage embeddings that help improve the algorithm’s performance while retaining its focus on efficiency and computation time.
By Shekar Ramachandran, Senior Member Technical Staff at Intel
Federated learning helps one leverage AI/ML techniques while preserving localised data privacy. However, owing to its decentralised nature, federated learning faces several optimization issues. This paper by Shekar identifies the problem of incoming network congestion concerning the Aggregator in a federated scenario and proposes a statistical significance test to address the problem. Further network optimization is done by implementing a requirement-based, request–response communication architecture to reduce unnecessary training rounds. This research also targets the infamous bias problem introduced due to label bias at the clients in a cross-device federated learning setting.
By Taaniya Arora, Senior Data Scientist at Crux Intelligence
Query Autocompletion (QAC) is a common feature for text-based input applications where a user’s partially-typed prefix input is completed. It has primarily been studied for applications involving search-based queries that are short sequences or phrases.
Taaniya and her team have presented a novel approach to QAC for a question–answering system in an augmented analytics platform where queries are essentially business and analytical questions in natural language. In this research, the team has proposed an approach involving a combination of semantic search and natural language generation via beam search for completing questions. To enable generative completion in natural language and handle unseen prefixes, they have used a pre-trained distilgpt2 model that is fine-tuned for question completion tasks. In addition, they described a method to synthesise training data from limited available past queries for fine-tuning the model and generating quality results for completion.
There were 26 research papers selected for presentation during MLDS. Analytics India Magazine received an overwhelming number of research paper submissions for presentation at MLDS 2023, close to 400. The research reviewing committee selected the top 26 research papers based on the quality of the research work. All these research papers are available on the Lattice website for access.