Top 5 Papers Presented at MLDS 2023

The top Machine Learning research papers presented during MLDS 2023.
Listen to this story

The Machine Learning Developers Summit (MLDS) 2023 concluded last week with numerous keynote sessions by industry experts. These sessions also included presentations of research papers authored by academics and professionals in the field. During the conference, the researchers presented their works and key findings before industry experts and attendees. These presented papers have been published in Lattice – The Machine Learning Journal , hosted and managed by the Association of Data Scientists (ADaSci) . Here is the list of the top five papers presented during MLDS.

1. Application of Clustering for Computationally Light Short-Term Demand Forecasting

By Rohan Kumar and Parimesh Panda, Data Scientists at Genpact

This research work, presented by the team of data scientists at Genpact , aims to decrease the demand forecast model training cycles by leveraging unsupervised techniques. Their research addressed the issue of retail manufacturers in predicting customer demand for each product at superior forecast accuracy levels that requires high computational expenses.

They have used a clustering-based demand forecasting framework to identify clusters of products with similar customer purchasing behaviour. Their experimental approach utilised this framework to predict the customer demand for more than 500 dairy products for the next eight weeks. A comparative study on computational time across product-level and cluster-level model training has been presented to realise relaxation in computational costs better.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

2. Visualization techniques for the training of Empirical Deep Reinforcement Learning (DRL) agents with continuous state and action spaces

By Gaurav Adke, Senior Data Scientist at Michelin

Visualization of the reinforcement learning environment and learning dynamics of an agent is a vital step for debugging and a better understanding of the learnt policy. For environments with optimisation of real-world multidimensional spaces with continuous variables, such as optimisation of chemical process parameters, it is challenging and complex to observe agents’ behaviour with visualization.

In his research paper, Gaurav presented a reinforcement learning agent developed to optimise the production process of rubber mix for the tyre industry. This research attempts to visualise an agent’s training and inference for high-dimensional state space problems with continuous state and action spaces.


Download our Mobile App



3. Weighted clustering on fast sentence embeddings to determine themes from large unstructured data

By Paritosh Sinha, Senior Data Scientist at Uber

Most engineering product improvements are driven based on feedback from users and engineers. B2C products are used to target customers, send personalised communications, manage order requests, and track event-level actions and failures to improve product performance. However, the volume of failure logs and their unstructured nature often hinder the detection of underlying themes from event failures.

This paper by Paritosh discusses a unique and highly efficient approach to tune and leverage a language model for embedding generation. Using a weighted clustering technique, the embeddings are subsequently used to group failures into auto-detectable themes. The paper has also presented distinctive methods to manage embeddings that help improve the algorithm’s performance while retaining its focus on efficiency and computation time.

4. EthicalFL – A Federated Learning Framework with Bias Mitigation

By Shekar Ramachandran, Senior Member Technical Staff at Intel

Federated learning helps one leverage AI/ML techniques while preserving localised data privacy. However, owing to its decentralised nature, federated learning faces several optimization issues. This paper by Shekar identifies the problem of incoming network congestion concerning the Aggregator in a federated scenario and proposes a statistical significance test to address the problem. Further network optimization is done by implementing a requirement-based, request–response communication architecture to reduce unnecessary training rounds. This research also targets the infamous bias problem introduced due to label bias at the clients in a cross-device federated learning setting.

5. IntelliQSense: An intelligent, real-time Query Autocompletion Framework using GPT-2

By Taaniya Arora, Senior Data Scientist at Crux Intelligence

Query Autocompletion (QAC) is a common feature for text-based input applications where a user’s partially-typed prefix input is completed. It has primarily been studied for applications involving search-based queries that are short sequences or phrases.

Taaniya and her team have presented a novel approach to QAC for a question–answering system in an augmented analytics platform where queries are essentially business and analytical questions in natural language. In this research, the team has proposed an approach involving a combination of semantic search and natural language generation via beam search for completing questions. To enable generative completion in natural language and handle unseen prefixes, they have used a pre-trained distilgpt2 model that is fine-tuned for question completion tasks. In addition, they described a method to synthesise training data from limited available past queries for fine-tuning the model and generating quality results for completion.
There were 26 research papers selected for presentation during MLDS. Analytics India Magazine received an overwhelming number of research paper submissions for presentation at MLDS 2023, close to 400. The research reviewing committee selected the top 26 research papers based on the quality of the research work. All these research papers are available on the Lattice website for access.

Sign up for The AI Forum for India

Analytics India Magazine is excited to announce the launch of AI Forum for India – a community, created in association with NVIDIA, aimed at fostering collaboration and growth within the artificial intelligence (AI) industry in India.

Dr. Vaibhav Kumar
Dr. Vaibhav Kumar is a seasoned data science professional with great exposure to machine learning and deep learning. He has good exposure to research, where he has published several research papers in reputed international journals and presented papers at reputed international conferences. He has worked across industry and academia and has led many research and development projects in AI and machine learning. Along with his current role, he has also been associated with many reputed research labs and universities where he contributes as visiting researcher and professor.

Our Upcoming Events

Regular Passes expiring on Friday
27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023

21 Jul, 2023 | New York
MachineCon USA 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Council Post: Retail Business through Generative AI

Today, retail technology is developing at a fast pace – whether it is business transformation or even exploring emerging tech (AR/VR and metaverse etc.) to give customers a more experiential journey. Businesses are innovating not only to remain relevant, but also, ahead. Some are really shaping the future of omni-channel retail by predicting customer expectations and market trends.

Cerebras Wants What NVIDIA Has

While OpenAI apparently utilised 10,000 NVIDIA GPUs to train ChatGPT, Cerebras claims to have trained their models to the highest accuracy for a given compute budget.