A definition of a generalized dialogue graph is proposed to describe the structure of a dialogue in terms of a corpus of homogeneous dialogues. The task of constructing such a graph is relevant in modern conversational artificial intelligence, however, there are few works with meaningful results, often a complete description of the algorithms is not given and the code with the implementation is not published. In the paper, a method for constructing a generalized dialogue graph is proposed, which is implemented in the Python programming language and made publicly available. Experiments were carried out on open data and the results were described.
Download file
Counter downloads: 7
- Title A generalized dialogue graph construction and visualization based on a corpus of dialogues
- Headline A generalized dialogue graph construction and visualization based on a corpus of dialogues
- Publesher
Tomsk State University
- Issue Prikladnaya Diskretnaya Matematika - Applied Discrete Mathematics 59
- Date:
- DOI 10.17223/20710410/59/7
Keywords
dialogue system, NLP, graph, dialogue graph, clustering, embeddingsAuthors
References
Shi W., Zhao T., and Yu Z. Unsupervised Dialog Structure Learning. ArXiv. 2019. arxiv. org/abs/1904.03736.
Qiu L., Zhao Y., Shi W., et al. Structured Attention for Unsupervised Dialogue Structure Induction. ArXiv. 2020. arxiv.org/abs/2009.08552.
Chung J., Kastner K., Dinh L., et al. A Recurrent Latent Variable Model for Sequential Data. ArXiv. 2015. arxiv.org/abs/1506.02216.
Chotimongkol А. Learning the structure of task-oriented conversations from the corpus of in-domain dialogs. PhD thesis. Carnegie Mellon University, 2008.
Tang D., Li X., Gao J., et al. Subgoal discovery for hierarchical dialogue policy learning // Proc. EMNLP. Brussels, Belgium, 2018. P.2298-2309.
Vaswani A., Shazeer N., Parmar N., et al. Attention Is All You Need. ArXiv. 2017. arxiv. org/abs/1706.03762.
Xu J., Lei Z. Wang H., et al. Discovering Dialog Structure Graph for Open-Domain Dialog Generation. ArXiv. 2020. arxiv.org/abs/2012.15543.
Юсупов И. Ф., Трофимова М. В., Бурцев М. С. Построение и использование диалогового графа для улучшения оценки качества в целенаправленном диалоге // ТРУДЫ МФТИ. 2020. Т. 21. №3. С. 75-86.
Фельдипа Е. А., Махныткина О. В. Автоматическое построение дерева диалога по неразмеченным текстовым корпусам на русском языке // Научно-технический вестник информационных технологий, механики и оптики. 2021. Т. 21. №5. С. 709-719.
Nath, A. and Кubbа A. TSCAN: Dialog Structure Discovery using SCAN. ArXiv. 2021. arxiv. org/abs/2107.06426.
Van Gansbeke W., Vandenhende S., Georgoulis S., et al. SCAN: Learning to Classify Images without Labels. ArXiv. 2020. arxiv.org/abs/2005.12320.
Devlin J., Chang M.-W., Lee K., and Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv. 2018. arxiv.org/abs/1810.04805.
Reimers N. and Gurevych I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks // ArXiv. 2019. arxiv.org/abs/1908.10084.
Bishop C. Pattern Recognition and Machine Learning. N.Y.: Springer, 2006. 738 p.
Blei D., Ng A., and Jordan M. Latent Dirichlet allocation //j. Machine Learning Res. 2003. V.3. P. 993-1022.
https://github.com/PavelShtykov/generalized_dialogue_graph - Построение и визуализация обобщённого диалогового графа по корпусу диалогов. 2022.
Mosig J., Mehri S., and Kober T. STAR: A Schema-Guided Dialog Dataset for Transfer Learning. ArXiv. 2020. arxiv.org/abs/2010.11853.
www.kaggle.com/datasets/thoughtvector/customer-support-on-twitter - Customer Support on Twitter. 2022.
Li Y., Su H., Shen X., et al. DailvDialog: A manually labelled multi-turn dialogue dataset // Proc. Eighth Int. Joint Conf. Natural Language Processing. Taipei, Taiwan, 2017. V. 1. P.986-995.
https://www.nltk.org - Natural Language Toolkit. 2022.
Liu Y., Ott M., Goyal N., et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv. 2019. arxiv.org/abs/1907.11692.
Vander Maaten L. and Hinton G. Viualizing data using t-SNE //j. Machine Learning Res. 2008. V. 9. P. 2279-2605.
Song K., Tan X., Qin T., et al. MPNet: Masked and Permuted Pre-training for Language Understanding. ArXiv. 2020. arxiv.org/abs/2004.09297.
Sank V., Debut L., Chaumond J., and Wolf T. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter. ArXiv. 2019. arxiv.org/abs/1910.01108.
Wang W., Wei F., Dong L., et al. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. ArXiv. 2020. arxiv.org/abs/2002.10957.
Rousseeuw P. J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis //j.Comput. Appl. Math. 1987. V.20. P.53-65.
Calihski T. and Harabasz J. A dendrite method for cluster analysis // Commun. in Statistics - Theory and Methods. 1974. V.3. No. 1. P. 1-27.
Davies D.L. and Bouldin D.W. A cluster separation measure // IEEE Trans. Pattern Analysis and Machine Intelligence. 1979. V. 1. No. 2. P.224-227.
Spdrck K. J. A statistical interpretation of term specificity and its application in retrieval //j. Documentatio. 2004. V.60. P.493-502.
https://graphviz.org - Graphviz: open source graph visualization software. 2022.

A generalized dialogue graph construction and visualization based on a corpus of dialogues | Prikladnaya Diskretnaya Matematika - Applied Discrete Mathematics. 2023. № 59. DOI: 10.17223/20710410/59/7
Download full-text version
Counter downloads: 534