Salvino A. Salvaggio   
At a moment where many are preparing for the December 31st evening cocktail, the End of Year speech of the President of the Italian Republic is broadcast right on time at 8:30pm. A tradition which came to be with the constitutional establishment of the Italian Republic itself (or almost ), the year-end message has endured all trends and has survived the technological refoundation of the mediascape . Since the beginning it also imposed itself as a television format  able to bring together millions of TV viewers  regardless of the fact that they are often engrossed in final preparations for the evening’s celebrations.
It is not a concern to question at this time the convenience or lack thereof of such an annual occurrence nor, much less, the reasons which favor or exhaust the lengthening of this republican ritual . Advantage must simply be drawn from the fact that 67 years of presidential year-end well wishes form today a sufficiently rich documental corpus to become the object of data analysis.
For over a decade, the statistical analysis of written documents has become a common practice and a consolidated scientific instrument (Jockers, 2014; Gries, 2013; Baayen, 2008). Although data analysis has been fully incorporated in the field of literary studies, little has been undertaken to investigate the textual or verbal production of public administration with equal quantitative rigor. This work contributes to filling the gap by exploring all the Italian Presidents’ New Year speeches from 1949 to 2015 bringing the tools and methodologies of data science into the field of political studies.
1. Style and elocution
First of all, we must note that not all the Presidents of the Italian Republic have addressed the Nation seven times for the year-end wishes. Luigi Einaudi remained in office for 6 years, Antonio Segni for 2 years, Giorgio Napolitano for 9 (a first mandate of 7 years and a second he resigned from after 2 years) and Sergio Mattarella has taken on the New Year’s Eve practice only once to date. Three types of data are of immediate help:
- If the total number of words spoken is affected by the number of years in office, the average number of words/speech provides a better indication on the more modest and the more loquacious Presidents.
- The average number of words/phrase helps to understand the linguistic style of each President.
- From the average number of words spoken per minute, it is possible to derive a valid idea of the rhythm and speed of elocution.
||New Year Speeches
|| Total words
|| Words by speech
|| Words by sentence
|| Words by minute
|Oscar Luigi Scalfaro
|Carlo Azeglio Ciampi
Two significant elements can then be drawn from the previous basic data:
- The mutations of the stylistic forms unique to each President
- The changes in elocution rhythms of each of them
1.1. Stylistic mutations
The stylistic mutations over the course of the history of the year-end speeches appear with great clarity.  First of all, a progressive shortening of the sentences, almost split in half, is evident: from the approximately 42 words/sentence between 1949-1968 to approximately 22 words in the seven year period of Presidents Ciampi and Scalfaro (with the exception of President Cossiga who used much longer sentences). Lately, President Napolitano reversed the trend, proposing a more sophisticated language founded on longer sentences than his predecessors (approximately 28 words/sentence on average). Finally, in his first year-end address, President Mattarella constructed his speech on very brief sentences (17-18 words on average), which are sharp and of great impact, with a more journalistic style rather than institutional (similar to Presidents Ciampi, Scalfaro and Pertini).
1.2. Elocutionary evolutions
The impact that the profound mutation of the media ecosystem has had in these last decades on the forms of oral and written expression in general could lead one to believe that a similar effect was impressed on the rhythm and speed of elocution of the Presidents. The progressive slipping of the dominating media (from newspapers to radio to TV to Internet to social media) as well as the changes endured by the various formats which tend to favor, more and more, forms of brief communication have spread in such a pervasive manner in every fold of society that it comes natural to speculate a similar transformation of the presidential elocution which reflects the passage of a sacred conception of the Presidency to a conception more in line with the zeitgeist. Actually, none of all this seems to apply to the presidential year-end addresses. In fact, instead of noticing an acceleration of the elocution, the contrary is observed: Presidents Gronchi, Segni, Saragat and Leone spoke in a much more rapid manner than Presidents Pertini, Cossiga, Scalfaro, Ciampi, Napolitano and Mattarella.  
Perhaps it is worthwhile to also note a detail: although all of the Presidents were seated to deliver their year-end well wishes, not all of them were so in the same manner. For the most part of the addresses, the Presidents were seated at their desk; Presidents Pertini, Scalfaro and Mattarella at times used an armchair in a Quirinale sitting room, without a desk. President Pertini in 1984 even sat next to a fireplace in a simple setting far from the formality and the golden palace adornments. In particular, in 2011 and 2012, President Napolitano sat in an informal way at the desk, sideways (and with his jacket unbuttoned in 2011). On these two occasions (2011 and 2012), the average of words spoken by minute increased from 110 to 120 , to indicate —but it is only a general indication whose orientation should be followed over the years— that mutations in the setting of the year-end speech could have an impact on the locution.
Although the seven year term of President Scalfaro is characterized by the most rambling year-end speeches of the whole history of the Italian Republic (with an all time record of 5,013 words spoken on December 31, 1997), a trend to the lengthening of the well-wishing message is undeniable, from an average of less than 500 words/speech before 1960  to an average of more than 1,500 words after 1980 . This increasing trend, however, shows a setback in 1999-2000 and marks a slight about turn in the last 15 years, with speeches that count, for the most part, between approximately 1,800 and 2,250 words.
With this type of public speech that TV broadcast makes available to all in a sort of formal and recurrent celebration (defined by President Saragat a “spiritual communion for the entire nation” in 1964), the choice and use of terms spoken by the Presidents are certainly not by chance. After all, since the early years, the year-end presidential addresses were the object of interpretation and analysis in fine detail on behalf of journalists, analysts and politicians, besides being followed by a very wide audience. It therefore is useful to highlight the words and, behind them, the themes used with the greatest frequency by the Presidents in attempting to outline a map of historically relevant topics or of personal interest to each speaker.
For the totality of year-end well wishes delivered by all the Presidents, it is not surprising to see that Italy represents the main and dominant theme , together with the final balance of the year and with suggestions, proposals, wishes for the new year. Peace and young people also occupy a prominent position in the presidential well wishes, but also politics, Europe, freedom and employment are topics that many Presidents have taken on, on December 31.
The study of the absolute frequency of words also provides clear indications on the overall landscape of the presidential well wishes. These appear as an expression of unity for the country, actual or hoped for, of gathering around shared patriotic symbols, and of celebration if not of construction of the national narration . But not all Presidents insist in the same way on the same themes. Beyond the points of convergence, the lexical ecosystem of each President varies greatly in respect to the overall ‘average’ illustrating specific and multifaceted thematic geographies. The wordclouds highlight well the personal worries and the current trends which hold in a differentiated way the attention of this or that President.
|Presidente L. Einaudi
||Presidente G. Gronchi
||Presidente A. Segni
|Presidente G. Saragat
||Presidente G. Leone
||Presidente S. Pertini
|Presidente F. Cossiga
||Presidente O. L. Scalfaro
||Presidente C. A. Ciampi
|Presidente G. Napolitano
||Presidente S. Mattarella
4. Words association
The identification of structures of association (words which are often used together), in pairs (bi-grams) or in longer structures (tri-grams or n-grams), is a pillar in the processing and analysis of contents in natural language.
Besides the obvious expressions and clichés typical of year-end speeches (such as, for example, ‘happy year’, ‘new year’, ‘this year’) and the references to the nation (‘Italian people’, ‘national unity’, ‘common good’, ‘constitutional charter’), spotting some unexpected expressions among the most cited may arouse surprise. Without a quantitative analysis of the language, it is difficult, for example, to realize how much these messages are self-referential with a strong recurrence of expressions such as ‘president republic’ , ‘head of state’, ‘I would like to say’ (vorrei dire), or how much ‘armed forces’ and ‘law enforcement’ are found in the well wishes. Additionally, the analysis of the n-grams removes every doubt on the themes of central importance for Italy in these almost seventy years of contemporary history. ‘European union’, ‘united nations’, ‘international community’ and ‘middle east’ recall the inclusion of Italy in the network of international relationships but also of the geo-political problems on a large scale. In the same way, expressions like ‘social justice’, ‘social economic’, ‘public debt’ or ‘organized crime’ allude to internal problems which have marked the recent history of the country. Finally, we can reasonably ask if the presence of ‘john paul’ and ‘paul ii’ among the most used bigrams do not suggest proximity —perhaps even more spiritual than geographical— between the Quirinale and the Vatican. After all, in the period of the 67 years analysed here, the trigram ‘help from god’ appears more times than ‘jobs’…
Purely for example, here is a partial and selective list of trigrams  stated at least 10 times.
||forze dell ordine
||capo dello stato
||il popolo italiano
||del nostro paese
||per la pace
||presidente della repubblica
||tutti gli italiani
||tutti i cittadini
||senso di responsabilità
||la libertà e
||il capo dello
||il mio pensiero
||le forze politiche
||contro il terrorismo
||economico e sociale
||giovanni paolo ii
||presidente del consiglio
||della nostra società
||il bene comune
||aiuto di dio
||pace nel mondo
||per la giustizia
||dell unione europea
||della persona umana
||delle nazioni unite
||posti di lavoro
As already seen, the historical period, the major events, national or international, the set of most acute problems, the plights faced by the government institutions or by the population, the particular interests of each President have all a considerable impact on the choice of terminology adopted in the year-end well wishes speeches. In order to illustrate this selection let’s focus on examples relative to seven specific themes :
- work and workers
- young people
- terrorism and terror
- homeland 
Presented as graphs (titled Frequency of theme in each speech), the how and how much Presidents evoke a specific theme in the chain of the 67 year-end addresses is better perceived. Each graph photographs a topic and indicates the number of times that such theme is cited, overall, in each particular message. For example, the first graph pertaining to unemployment shows that the term was used six times in the year-end message of 1984 and 5 times in 1979, 1981, 1983 (President Sandro Pertini) or 4 times in 1992 (President Scalfaro).
This, however, provides only half of the analysis. The significance of a theme is not measured only by the overall absolute frequency of the relevant term in the single speeches but also by the relative frequency of the term in the message compared to the average use of the same term in the Italian language in general in the same historical period or year. And that comparison is given in the second graph (entitled Relative frequencies of theme).
5.2. Work and workers
5.3. Young people
5.5. Terrorism and terror
It should be stressed that in each of these seven examples, the difference of relative frequency of use of the theme (lemma) between the Italian language in general (in those years) and the presidential year-end speeches is not fortuitous. The p-value of independence t-test  is always lower than 0.001.
From this type of analysis it is possible to verify, for example, that the theme of terrorism occupies a significant position of relevance for President Pertini, surely because his seven year term was, from this point of view, a difficult and tragic period for Italy, just as the issues of unemployment and young people were central to him. But it is also noticeable that, overall, for Italian Presidents, terrorism is written in the national historical memory as a phenomenon associated as much —if not more— to the late 1970s and early 80s as to the episodes of the last 10-15 years, from 2001 on. President Napolitano seems instead to be preoccupied not only by the theme of employment but also and above all by the question of the State reformation whereas he refers to the homeland only 4 times in 9 years; a theme that, instead, was special to Presidents Cossiga and Scalfaro. The theme of culture has mainly attracted the attention of Presidents Cossiga and Scalfaro. The statistical analysis should be further fine-tuned though to understand if they meant culture in the anthropological or artistic sense.
Once the algorithm is established, this type of analysis has the advantage of being extendable to any topic, offering a historical reading rooted in quantified data and not only in the inspiration of the researcher. The comparison between the “traditional” socio-political analysis and the statistical reading of the data on themes such as mafia, north-south inequalities, migratory flows, innovation, to name just a few, would surely be instructive.
6. Sentiment analysis
The statistical analysis of sentiment  was developed on the basis of a combination of quantitative methodologies aiming to the measurement, quantification and classification of opinions and sentiments expressed in documents (textual corpus) through words which have a positive, negative or neutral semantic connotation. For example, in function of the lexical composition of the text, the sentence “Last quarter the European economy has benefitted from the favorable price of oil” would be classified as positive, when the phrase “In the same period the labor market has continued to suffer, penalizing particularly young people.” would be considered negative. In the recent years, the development of the discipline has experienced a strong acceleration fuelled by the desire to better understand the evolution of various types of content published by hundreds of millions of social media users (regardless of the difficulties given by the automatic tracking of sentiments in brief texts; Thelwall, 2010) and by the uncountable websites that offer users the possibility to publish review of products and services (Galitsky, 2009; Cataldi and al., 2013).
Normally used for scientific purposes  but also to improve marketing efficiency and expand business opportunities , sentiment analysis applied to historical, administrative or institutional documents still remains embryonic, above all in the case of documents written in the Italian language.  Consequently, inquiring whether the “truth” of the Presidential year-end speeches in Italy lies as much in the analysis of the data behind them as in the more traditional political analysis can only stimulate an innovative line of research and trigger considerations rooted in the extension of data sciences to the study of governments and institutions.
The first step in carrying out the sentiment analysis of the Presidential messages consists of reducing the lexical complexity through the lemmatization of the text, that is the substitution of each term used with its lemma of reference. For example, the last sentence of the first year-end address by President Einaudi in 1949, from “such I am sure is the common vote and such is my personal wish which is directed heartfelt and with affection at this hour to each Italian in and out of the boundaries of the country” becomes “such to be sure to be the common vote and such to be my personal wish which to direct heartfelt and with affection at the hour to each Italian in and out of the boundary of the country”. This makes easier the management of the dictionaries (words lists) that, through various simplifications, do not need to comprise the plurals, conjugated or derived forms but can be limited to citing only the infinitives or simple forms. For a basic sentiment analysis, once the speeches are lemmatized, a single positive value (+1) is associated to each term (lemma) if the term expresses a positive sentiment, opinion, attitude, concept; a single negative value (-1) if the lemma translates a negative sentiment, opinion, attitude, concept and a neutral value (0) for neutral terms (Vryniotis, 2013). After having summed up the values of the single words grouped by sentence, each sentence of each presidential speech can be characterized by 3 absolute values: the sum of the “positive sentiments” (expressed by means of a positive integer), the sum of the “negative sentiments” (expressed by means of a negative integer), the overall sum of the sentiments expressed in that sentence (translated into a positive or negative integer according to the dominant sentiments).
The distribution of the values of the overall sums of sentiment shows that the year-end messages prevalently transmit positive concepts, opinions and sentiments. Since 1949 to present day, the sentences of the speeches are in fact positioned on the positive segment of the axis, with an average sentiment value of +3.7 and a median value of +3.
However, for greater accuracy, these whole values (positive sum, negative sum and overall sum) are weighed by comparing them to the total number of words in the sentence. In this way 3 percentage values are obtained (positive and negative sentiments, and overall percentage). In combining the values of the single sentences per message and, later, combining the messages per president, granularity is somewhat lost but a clearer vision of the whole picture is acquired (particularly if a polynomial function is used to smooth the coarseness of the single values).
For example, President Mattarella’s year-end speech appears to be mainly positive even though there are distinctly negative statements.
The sentiment analysis —which perhaps should be called opinion mining in this case— can also help in making visible underlying structures in the construction of the speeches.
The following two graphs illustrate President Napolitano’s year-end addresses in 2007 and 2011. A resemblance of the narrative structure is observed: after the beginning well wishes (which can be translated with a high positive sentiment index), the President evokes negative facts, thoughts, opinions, situations (this significantly decreases the overall index value) to then mark a high, expressing again clearly positive sentiments and opinions which slowly “dull” to the final wishes. In other words: after taking off at top speed, then suddenly the bad news arrives which is immediately counterbalanced, first by a few but strong very positive opinions, then by a long series of still positive expressions, up to the conclusion of the message which fosters a pronounced hope for the year to come.
The long term analysis of the aggregated data per speech can benefit from this approach as well. Overall, the level of positive sentiment expressed in the year-end well wishes by all the Presidents does not vary much. Although some significant oscillations are observed year over year (with a maximum range of variation over the years between +28.5% in 2003 and +16.8% in 1981), the total trend remains stable —relative stability likely due, at least in part, to the characteristic of this particular communication exercise which tends to highlight the positive wishes and greetings repeated over and over during the speech.
On the other hand, the proportion of negative sentiment grows over the course of the years, but with oscillations that seem  to follow the evolutions of the economic and social crisis. In fact, it is possible to observe a rapid decline in the sentiments expressed between 1959 and 1980-1981 (practically a doubling of the negative lemmas), followed by a decrease in pessimism from 1981 to 2000, and again a progressive darkening of the skies from 2001-2002 on.
The expression of the sentiments considered for each single president shows some variability, with President Pertini who, up to today, ranks at the extremes both for the highest percentage of negative sentiments (lemmas) and for the lowest percentage of those positive. Beyond the differences in Presidents’ personality, the historical period surely influenced the volume of the negative sentiments expressed by Presidents Leone and Pertini. For this same reason, it would be useful to closely look at the next year-end addresses to understand if the negative sentiments expressed by President Mattarella confirms a trend which seems to inaugurate with President Napolitano or if it is only an instance given the fact that up to now President Mattarella has only had one occasion to deliver his year-end wishes to the nation.
This work provides unique insights into the institution’s textual production and its variation over time in three manners:
- Descriptive statistics were used to quantify the way(s) each Italian President speaks to the Nation. Amongst others, it allowed differentiating elocutionary styles: crisper (202 words/speech) or verbose (3,513 words/speech), direct (17 words/sentence) or convoluted (49 words/sentence), slow (95 words/minute) or fast (142 words/minute), as well as variations to means. When applied to the time series, the descriptive analysis shows the mutations of the elocutionary styles over time and the fact that they are not always in line with the zeitgeist.
- Natural language processing methods highlighted the frequency and associations of single or groups of words. This was useful to extract the features of the New Year speeches overall but also the main interests of each President (with the oldest President in the history of the Italian Republic being the most worried about the future of the young generation). Quantified examples are given for 7 themes: unemployment, work/job, youth, culture, terrorism, reform, and homeland. Absolute and relative frequencies of these themes were computed and compared to the average frequency of the same themes in the language overall for the same period. Supported by meaningful independence t-tests and confidence intervals, this approach showed the comparative evolution of the recurrence of the 7 topics. But it also showed it can be generalized to any theme.
- After having built a “sentiment dictionary”, quantitative sentiment analysis (opinion mining) has been applied to quantify the expression of ideas, opinions, and statements as positive or negative based on the wording. Relevant differences between Presidents emerge with, at the 2 extremes, President Pertini (18% positive sentiments against 9% negative) and President Gronchi (27% positive sentiments against 4.5% negative). Also, historical trends become more visible: towards more pessimism in the 1980s followed by a slightly stronger optimism in the 1990s and again more negative sentiments from 2000 onward. Sentiment analysis also made obvious that some Presidents built up their narratives following recurrent “sentiment/opinion patterns”. The most evident case is President Napolitano that alternates good and bad news in such a specific manner that it becomes a pattern signature structuring some of his speeches.
Although little has been done to date to incorporate data science in the field of textual analysis of political content, this work shows the early benefits of such an approach. Textual and verbal production of public administrations can be investigated with quantitative rigor opening new lines of research. Quantitative methods such as descriptive statistics, natural language processing, and sentiment analysis (opinion mining) prove to be highly valuable tools capable of bringing a strong contribution to enrich and enhance political sciences.
Baayen, R.H., (2008). Analyzing Linguistic Data, Cambridge University Press.
Baccianella, S., Esuli, A. and Sebastiani, F. (2010). “Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining”, in Calzolari, N. and al., editor, Proceedings of LREC, 2200–2204. http://is.gd/VLTKqB
Basile, V. and Nissim, M. (14 June 2013). “Sentiment analysis on Italian tweets”, Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 100–107. http://is.gd/io82rB
Casotto, P. (2012). Sentiment Analysis for the Italian Language, Tesi di Dottorato, Dipartimento di matematica e Informatica, Universita’ degli Studi di Udine.
Cataldi, M., Ballatore, A., Tiddi, I., Aufaure, M.-A. (22 June 2013). “Good location, terrible food: detecting feature sentiment in user-generated reviews”, Social Network Analysis and Mining, 3 (4): 1149–1163. http://is.gd/Tlp0Fc
Charpentier, A. (22 February 2016). “Clusters of Texts”, Freakonometrics. http://is.gd/32mB13
Davidov, D., Tsur, O, and Rappoport, A. (2010). “Enhanced sentiment learning using twitter hashtags and smileys”, Proceedings of the 23rd International Conference on Computational Linguistics, COLING ’10, 241–249, Stroudsburg, PA, USA.
Galitsky, B. and McKenna, E.W. (12 November 2009). “Sentiment Extraction from Consumer Reviews for Providing Product Recommendations”, Patent US–20090282019-A1. http://is.gd/ioVBsb
Gonzales-Ibanez, R., Muresan, S. and Wacholder, N. (June 2011). “Identifying sarcasm in twitter: A closer look”, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 581–586, Portland, OR, USA.
Gries, S.Th. (2013). Statistics for Linguistics with R, Second Edition, De Gruyter Mouton, Berlin.
Jockers, M.L., (2014). Text Analysis with R for Students of Literature, Springer.
Kennette, L.N., Wurm, L.H. and Van Havermaet, L.R. (2010). “Change detection: The effects of linguistic focus, hierarchical word level and proficiency”, The Mental Lexicon, 5(1), 47–86.
Kulkarni V., Rfou R., Perozzi B. and Skiena S. (2015). “Statistically Significant Detection of Linguistic Change”, Proceedings of the 24th International Conference on World Wide Web, 2015.
Lebeau, J. (20 January 2016). “State of the Union Speeches and Data”, More or Less Numbers. http://is.gd/NrJIRS
Liu, B. (2012). Sentiment Analysis and Opinion Mining, Synthesis Lectures on Human Language Technologies, Morgan & Claypool Publishers.
Lapowsky, I. (13 January 2016). “The True Message of the State of the Union Is in the Data”, WIRED. http://is.gd/7ULEdC
Mani, I. (2010). The Imagined Moment. Time, Narrative, and Computation, University of Nebraska Press.
Mejova, Y. (16 November 2009). Sentiment Analysis: An Overview. http://is.gd/C1U9OJ
Taboada, M., Brooke, J., Tofiloski, M., Voll, K. and Stede, M. (June 2011). “Lexicon-based methods for sentiment analysis”, Comput. Linguist., 37(2):267–307.
Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., Kappas, A, (2010). “Sentiment strength detection in short informal text”, Journal of the American Society for Information Science and Technology, 61 (12): 2544–2558. http://is.gd/yRnRCq
Vryniotis, V. (23 September 2013). “The importance of Neutral Class in Sentiment Analysis”, Machine Learning Blog & Software Development News. http://is.gd/FidS0k
 This document is the result of an analysis carried out by the author and reflects only and exclusively the opinions of the author. Therefore, this document does not involve in any way, neither directly nor indirectly, none of the employers, past or present, of the author. The author confirms that he has no conflict of interest, and has never worked, neither in a remunerated capacity nor gratuitously, for the President of the Italian Republic, and has never been appointed in any role or capacity by the Quirinale. Notwithstanding, the author informs the readers that he was honored with the title of Commander by the President of the Italian Republic (Decree August 1 2007).
 eMail: salvino [dot] salvaggio [at] gmail [dot] com
 The Author thanks Paolo Barbesino, Paolo Gasparini, Gaetano Palumbo for the comments to the previous versions.
 The first year-end message was delivered by President Luigi Einaudi on December 31, 1948. Enrico de Nicola – elected interim President on June 28, 1946, then again June 26, 1947 after his resignation, and again President of the Italian Republic from January 1, 1948 – did not deliver any year-end wishes to the nation during his two years at the Quirinale.
 Printed paper, radio and film distribution of the Istituto Luce in the early years, live television from 1954 on.
 In fact, since his first appearance on TV in 1954, the program often opens with a wide view which captures a room in the Quirinale palace to then progressively zoom in on the framing of the bust and face of the President speaking to the nation looking directly into the camera.
 Record held by President Giorgio Napolitano watched by approximately 13 million viewers on December 31, 2014.
 A much formalized ritual, with few surprises or formal variations but also significant, as shown by the clustering text which is unable to isolate clear or specific groups (Charpentier, 2016).
 Highlighted by change point detection to identify not only one or more potential changes but also the moment in which they occur (Kulkarni and al, 2015; Kennette and al, 2010).
 The variation to the average is statistically significant and cannot therefore be attributed to chance (t-test).
 It would be useful —but unfortunately the information is not available— to know the average speed of elocution of the Italian population in general, year by year, from 1949 on to compare to the data of the presidential speeches. Also, data on duration of President Einaudi’s speeches are not available.
 It is not therefore clear whether President Scalfaro was at the Quirinale for the message on December 31 1997 or not.
 p-value of t-test is approximately 0.00016.
 Precisely 481.8 words.
 With the significant exception of 1991, year in which President Cossiga went on TV to deliver his wishes saying that substantially he would not be saying anything else, all in just 419 words in less than 4 minutes. It must be noted that, from a statistical point of view, specifically the messages by Presidents Pertini and Scalfaro are the main contributors to general trend (regression).
 The terms “italy”, “italians” are pronounced almost 700 times in 67 messages.
 With words like italy, italians, young people, population, country, life, liberty, citizens, democracy, trust, responsibility.
 Natural Language Processing (NPL).
 Which in the statistical analysis stands for ‘president of the republic’.
 Obtained with AntCon 3.4.3 software.
 The themes where chosen without representativeness pretention.
 The following regex terms or expressions were used in the research of the occurrences: ‘unemployment|underemployment’, ‘work’, ‘culture’, ‘terror’, ‘reform’, ‘bhomeland|patriotism|patriot|patriots’, ‘byoung people|youth|youthful|b’
 This second analysis is made possible by Google Books which provides researchers with the frequency of all the words used in millions of books, year by year, language by language, from approximately 1500 to 2009. It is necessary to underline that the terminological quantification of the semantic linguistic corpus built by Google Books does not included texts published in mass media (newspapers, Internet) which might modify the frequency data adding a more pronounced component of modernity.
 Welch Method.
 Usually called sentiment analysis or opinion mining. See Mejova, 2009 for an overall description.
 For example to add quantitative depth to the literal analysis, or in a political setting to better understand the evolutions of the orientation of the electorate.
 To evaluate the level of adhesion to advertisements or specific brands.
 The easy availability of numerous English dictionaries of words associated to values of various sentiments and opinions has dynamized the research on English language contents (Baccianella et al., 2010; Gonzales-Ibanes and al., 2011; Liu, 2012). The same is taking place for Spanish and, in lesser degree, for other languages. In Italian, unfortunately, the availability of such instruments still suffers significant shortcomings and the laudable efforts of isolated researchers are not enough to bridge the double gap of, on one side, lists of words with the corresponding quantification of the positive or negative connotation and, on the other, of classifications of the same words per type of sentiment or opinion. Having a base dictionary available which signals the positive (+1) or negative (-1) value of an adjective is the first essential step for this approach. But also knowing if the same adjective belongs to a specific category of sentiment -for example anger, joy, sadness, fear, surprise, trust, etc.- makes it possible to enrich the analysis. In Italian, see for example the software sentiment-Italian-lang by Giuseppe Ragusa – https://github.com/gragusa?tab=repositories, Casotto, P. (2012), Basile and Nissim (2013).
 For greater precision, the lemmas can be quantified on a real scale that includes the intermediate values between 0 and 1, 0 and -1.
 In this type of graph, the order of sequence of the phrases in the speech act as a temporal axis (x-axis)- this approach is usually defined as novelistic time (Mani, 2010).
 This should be verified in detail.
To leave a comment
for the author, please follow the link and comment on their blog: RSS Feed – SaS in #R#
offers daily e-mail updates
news and tutorials
on topics such as: Data science
, Big Data, R jobs
, visualization (ggplot2
), programming (RStudio
, Web Scraping
) statistics (regression
, time series
) and more…
Source:: R News