Author: MARTÍ JIMÉNEZ PARREU
Read the full article by downloading it below.
Abstract
When building indices, we often have to deal with qualitative factors that are hard to quantify, that is why surveys such as the OECD Better life index exist – we want to know what people deem as relevant. In these cases we lack a point of reference or a defined set from which to draw weights. Therefore, we propose an alternative to solve this problem.
The objective of this project is to utilize the Google Trends dataset to create weights for indices based on the relative popularity of qualitative components. Each of these dimensions are bijectively mapped to quantitative indicators, thus, defining a composite index with “dynamic weights”. Fur- thermore, this paper contains the theoretical framework justifying this approach and a generalized methodology including each step of the process: 1. Indicator-Component Relationship, 2. Keyword Generation, 3. Keyword Selection, 4. Extraction of popularity measures and 5. Weighting indices using relative popularity.
To show why this technique might be useful, we introduce an application of this method with the Human Development index (HDI) for Spain in the 2005-2023 period. The results show how applying user-based weights leads to a statistically different HDI calculation, which might point to the possibility that the common assumption of equal weighting of HDI dimensions leads to an underestimation of the index. Moreover, search engine modeling shows how people significantly value the health dimension significantly more than the others. More importantly, this analysis calls for reconsidering our approach towards weighting indices, due to the idea that in some cases, using preferences for weighting components could result in a more representative index.
1. Introduction
The economic analysis of broad concepts such as welfare or productivity with indices is often limited by both a static visualization of its components’ weights and a disregard towards difficult-to-measure qualitative factors.
More problems arise if we consider that the selection of a weighting technique can significantly change the meaning of the indicator, hence, statistical techniques are made to minimize error. (factor analysis, regression, CLI…) However, the multidimensional characteristics that different components present can hinder its aggregation, you might not be able to add apples and oranges.1 In terms of measuring vague concepts, making a connection between qualitative and quantitative concepts becomes a arduous task. This problem is further complicated by the fact that broadening an indicator’ scope directly lowers its precision.2
In some sense, especially in heterogeneous, indefinite economic indicators that are hard to weigh, we need a common denominator between two or more components and ranking hierarchies, i.e. pref- erences. A clue to what people deem as important might be in what is worth their attention. This idea is something that in this modern era is crystal clear, online activity.
Digital presence reveals a lot of information, however, it might fail to identify preferences of underdeveloped or restricted countries/regions that do not consume internet in the same volume as others (like the US or Europe) or simply do not consume internet at all. (This is especially applicable for continents such as Africa) 3
Notwithstanding the previous concern, the range of internet access is widening rapidly, and as a consequence, more than two halves of the world population now use the internet (68% in 2024). This figure still forecasts positive growth for the upcoming decade, which showcases that worldwide digital preferences will be more representative as time elapses.4
If we interpret the internet with network externalities theory, it is plausible to predict that at some point in the following years a high-level equilibrium with near-full participation will be reached. Consequently, this sampling bias will disappear over time.

In a nutshell, the usefulness of data on demand for information is on a steady increase, which justifies making full use of it. And in a more theoretical standpoint, in the same way that Hayek defined the price system as an indicator of the aggregate unorganized knowledge. (Dictating how information changes)5 We could interpret online indicators such as popularity or website visits equivalently.
It is precisely because this aggregate concern around a certain topic or website represents unspe- cialized knowledge that we can use it as an indicator for what people care about. The justification is that demand for information arises from behavioral decision-making, that means, for instance, “Out of everything I could be doing right now, I chose to browse photos of cat puppies”. I am showing a preference towards cats, that means, it is likely that I might like cats or even might want to adopt one.
This same logic extends to more interesting online activities, such as uncertainty-related news searches, with current examples like social-network-based uncertainty indices.6 Once again, these patterns portray the level of uncertainty of the public. That is, when an individual worries about something, it will look up and interact with that same thing more intensively than if, in the contrary, nothing noteworthy was occurring.
Now, for the purpose of this article, we’re trying to find a generalized combination of the two notions formerly mentioned – a worry expressed towards a particular preferred topic. To do this, the most reasonable approach seems to be utilizing keywords related to a particular topic. (e.g. people worrying about prices expresses a worry about the cost of living)
This is relevant because keywords allow us to make a connection between a concept and its importance by quantifying its relative search frequency in terms of others.
In this way, an Explicit-Type, User-Weighting Method for composite indicators can be developed. Consequently, this project is an attempt to explore an alternative for index creation that considers public perception as weights that change over time.
2. Methods
The methodology involves initially identifying what are the required demographic traits of the de- sired target population (Country, age, etc.) and technical considerations. Hence, these distinctions must be defined before proceeding any further.
We will create a model that relates each component of an index to the preferences of the public towards it. At least for our example, these preferences will be modeled by keywords in the most popular search engine i.e. Google. (Since it holds 90% of the world’s search engine market share.7) Consequently, the database used will be Google Trends.8
The process mainly involves the following steps:
- Indicator-component relationship
- Keyword generation
- Keyword selection
- Extraction of popularity measures
- Weighting indices using relative popularity
It is of critical importance to clarify the caveats of this analysis in this document. We refer to the fact that there are a broad range of issues associated with each step of this methodology. Such as, each keyword being representative of its corresponding component, making sure each selected keyword is the common attribute of the bundle of generated keywords from the component (ideally fixed with factor analysis). Also, that all the keywords cover the union of the semantic groups of the topics in meaning and frequency (involving keyword generation and clustering), making sure selected keywords are as uncorrelated among themselves as much as possible (to better estimate the union of topics), and that selected keywords are relevant for the database.
Although we will try to fix some of the previous problems by setting assumptions, personalized scores and requirements, fully attempting to account for such anomalies involves advanced statistical techniques which would overextend the scope of this paper.
For now, we will focus on the general procedure without diving into specifics. The methodology is as follows: We start with an (already defined) index for which we want to weigh using the popularity of their components. (figure 2)

Now, we proceed to present each of the steps in more detail along with their theoretical justifi- cations.
Indicator-component relationship
Conditions:
- The relationship between indicators and components should be bijective.
f : Indicatori → Componenti
- The weights of the each indicator and component should behave bidirectionally.
wIndicatori ↔ wComponenti

This means, we have to use the predetermined labels (components) of each indicator as a reference. We don’t have to use the word that explicitly represents the component because it might not be relevant to the database (which can be confirmed by means of computing the average popularity level later on), and in the case it is not, we can try other names for this component. We know that this is not an arbitrary decision because we can check if our choice is right or not in the keyword generation section. (And start again if our revision tells us we are wrong)
Keyword generation
We want to estimate the full set (total frequency) of related searches to all components Ω (so that we can compare them). To do so, we create an estimator Ωˆ.

We will build our estimator so that it is consistent. Let us define a keyword generative function
g : componenti → P(keywordsi) that creates ki keywords for each component (input).

If we iterate this process for each component, we notice that as ki → ∞ the entire set of keywords related to each component is exhausted. Therefore, the union of each set of keywords should approximate the entire set of the union of the components.
Ωˆ = Si∈n g(component)i tends to Ω as ki → ∞
Therefore, we have a ”denominator”, with which to compare our weights. Visually, we have the following:

The idea of the previous figure is to represent each bundle of keywords generated by each component as a part of the estimator. Some keywords have been placed outside this circle just to consider when generated keywords do not belong in our topic of interest. That is our topic match, a binary indicator function that takes a count of how many of the generated keywords are actually relevant to what we want to measure (with a 1) and which are not (with a 0) . Moreover, we will assign each keyword with a similarity score to its component, so that we choose the most representative keyword.
With this data, we can estimate a matching percentage from which to infer how representative is the component we picked from 0-100% (by dividing the cardinality of all the matching keywords by the size of the correspondent component keyword pool).
Keyword selection
Although using the entire set of keywords would be ideal, to avoid computational complications when dealing with large-scale keyword samples, we try to simplify this problem. We know that we should obtain a keyword that represents the others, the set of keywords should be comparable and the keywords should be significant for the database.
That is why we will select for each component the word with the highest similarity score that matches our topic. Then, in the next phase we will confirm the comparability of the keywords by checking if the range of their normalized popularity intersect. That means that if at some point, the topics could possibly be equally important, then they’re comparable.
Extraction of popularity measures
We want to find what is the popularity of each of the keywords in terms of the others so we can compare them.
We will use a database or algorithm that contains search engine data to identify these magnitudes, it needs to allow a keyword as input and output its popularity relative to others.
Afterwards, we define the function Pop(xi,t) so that P : xi → Popularityi,t So that if we apply it to every value, and obtain a table for the popularity of each component or keyword for each time period.
With the previous data, we check if keywords are appropriate by analyzing the ranges of their relative popularity over the span of a given period, if they all intersect at some interval, we con- tinue with the process, if at least one does not then we assume they’re not similar and we return to selecting another set of keywords.
Weighting indices using relative popularity
We will define each of the weights by dividing by the sum of all the relative popularity levels:


At this point, we have a dataset with weights associated to each keyword at each specific period of time. Subsequently, we can proceed to define the “new” index.
Once we obtain each weight associated with the popularity of the component we can use the bidirectional relationship with the weights of the indicators to define popularity as the weight of that indicator in the index. We just have to take into account that the initial calculation method of the index will change how the weights have to be added.
Index(wold) → Index(wnew)
3. Applications
Let us apply this method to a typical index, the Human Development Index (HDI). Which contains the following components: (for which we can establish a bijective connection with an indicator)
- Life expectancy at birth → Health
- Mean years of schooling & expected years of schooling → Education
- Gross National Income (GNI) per capita (PPP-adjusted) → Standard of living
The HDI Formula is:

Note: In the calculations, the additional necessary steps of making the average between the expected and actual years of schooling and taking the logarithm of the GNI have been computed.
To apply the public perception method, we can define a weighted geometric mean.

To model component preferences, we will use worldwide data in the period 2005-2023. (Accessed on May 22nd, 2025) Furthermore, when considering the components, we should select those that are relevant for the database. Thus, let us generate the associated related searches such that we can find out if words/phrases such as “Health”, “Education” or “Standard of living” are representative.
Indicator-component relationship
At an initial glance, if we plug in the words “Health”, “Education” and “Standard of living” in the Google Trends website we will see how they present average relative popularity scores (measured from 0 to 100) of 64.63813, 32.93385 and < 1, the first two words are comparable, however, the third is not (not significant enough to be used).
Therefore we must find related words that are comparable and relevant, a good guess is “prices”, as it represents worry for the cost of living. Running the comparison once again, we find that their average popularity levels are: 65.03891, 33.15175 and 26.9144 respectively, now, they’re comparable. If we quickly observe the average popularity levels of each component name over this correspond- ing time period, health is considered as more important than education or prices. (Pop(Health) = 0.5227563, Pop(Education) = 0.259618, Pop(Prices) = 0.2176257).
These values can be interpreted as what percentage of the total popularity belongs to a component or keyword. (e.g. 52.27% of the popularity belongs to Health)


Figure 6 and 7 show us that how much importance people attribute to each HDI dimension is not the same than the weights we usually assign to them.
Keyword generation
Moving to generating keywords, we will find related keywords or topics (queries) that are related to the components and that they represent a preference towards a certain topic. We identify two parameters for each keyword:
- Similarity score: Each, word will be given a score based on how similar it is to its component, which goes from 0 to 100. (Automatically done by Google Trends)
- Topic match: a binary check done either by hand or LLM models, that indicates if a query is related to our topic of interest ((1) green) or not ((0) red)

We can observe how some words are related (green) and others are not. (red) We must highlight that the topic match has been done by hand although for large datasets, using a LLM model would be necessary. Additionally, keyword generation of each component for our example is limited to 25 related queries of a component, which is generated by a Google Trends Algorithm.
Keyword selection
We will simply select the keyword with the highest similarity score that is related to our topic of interest. (for each component)
If we consider how well each bundle of generated keywords matches to our topic of interest, we can infer if the chosen component is appropriate. (To check our decision for components, we observe the match percentage of a component, and if it were to be below a certain threshold, the process should be started again at indicator-component relationship to select initial components) The groups matching percentages for “Health”, “Education” and “Prices” are 92%, 84% and 76% which is sufficient for this example. (The chosen threshold depends on the level of rigorousness required, e.g. 75%)
We could identify, as expected, that “Health” is the most representative component name while “Prices” is the least representative one.
According to our guidelines we choose: “health care”, “school education” and “gas prices”. Moreover, if we consider the range of values for relative popularity normalized to 1, in this case, they do intersect, which means that they are strictly comparable. Pop(Health care) ∈ [0.282609, 0.847458] Pop(School education) ∈ [0.110169, 0.348485] Pop(Gas prices) ∈ [0.042373, 0.580247] All of the key- words intersect at [0.282609, 0.348485] Additionally, because we want to calculate the components for comparison, we apply the same requirement for components, which in this case they do not ful- fill, however they come quite close at some points. We’ll still compute it but taking it into account. Pop(Health) ∈ [0.43939, 0.63478] Pop(Education) ∈ [0.2, 0.35714] Pop(Prices) ∈ [0.14286, 0.29126]

Extraction of popularity measures
We can begin by observing the searches of our selected keywords from January of 2004 to May of 2025, we observe what’s shown in figure 10.

Given the specificity of each keyword, they’re really unstable and sensible to short-term trends, given a smaller search volume compared to its components. If we move on to obtaining the popu- larity of components for the sake of comparison, we will find similar popularity distributions (figure 11).

In contrast, the evolution of components’ popularity is more stable and less sensible to short- term trends, given a greater search volume compared to keywords. We make a few relevant observations:
- It appears that, as topics get more general (from keywords to components), they’re less likely to experience acute oscillations and they follow a more predictable trend.
- Even if Keywords and components are different, their relative popularity differences are very similar, that is, because the keyword is the most relevant subset of its component (we could perform a test of similarity of distributions, but that would overstretch this article)
Weighting indices using relative popularity
Now that we have the relative popularity of each keyword or component, we can proceed to generate the weights for each point in time. To do so we will normalize each observation of relative popularity to 1 for all observations. (As previously done to check for comparability)


Now, let us put the weights into full effect, HDI will vary over time with two effects:
- Changes in: Life expectancy at birth, Mean years of schooling / expected years of schooling and Gross National Income (GNI) per capita (PPP-adjusted)
- Changes in: Public Perception of Health, Education and Prices (Income).
The general expression of our new HDI is the following:

Let us for instance apply this method to Spain (2005-2023), we will use world preferences (as done until now) to model the sizes of the weights. Therefore, we obtain the following time series chart (figure 14) of the same HDI calculated with varying weights – Regular HDI (Assumes equal weights), Keywords HDI (Weights vary according to preferences, quantified by searches of keywords) and Components HDI (Weights vary according to preferences, quantified by searches of components)

There are a few observations we can make before finishing this example.
- Specific weighting objects like keywords might have more unequal weights, therefore signifi- cantly affecting the index.
- According to world preferences, (which mostly value Health over Education and Standard of living) the HDI for Spain is underestimated. (That means, that countries with a higher life expectancy should have a greater HDI)
Let us check the differences between each possible pair of HDI’s in order to know if we can state that they’re statistically equivalent or not. We will perform a Student t-test to measure the differences between both averages (assuming the variances are equal given that both sample sizes are equivalent).
After running the test, we find that:
p-value(Regular HDI, Keywords HDI)∗∗∗ = 4.563e − 06
p-value(Regular HDI, Components HDI)∗∗∗ = 0.004538
p-value(Components HDI, Keywords HDI)∗ = 0.07691
Therefore, we can state that the Regular and Keywords HDI are likely not to be equal and we can say the same for Regular and Components HDI. As expected, Components and Keywords HDI are less likely not to be equal, they’re highly correlated as the Keywords score 100 in similarity to Components by construction.
These results allow us to have another perspective on how HDI is perceived, to check if our definition and valuation of Health, Education or Standard of living (the weights) are the same or
not in different time periods. In a more general note, with indices that require public perception, we can compare the Index calculation with regular weights and the calculation with user-based weights. (In order to check if there are any differences or if it is more representative – which might depend on the index considered)
Coming back to our example, the results show that user-based weights show differences in index computation compared to regular weighting. However, using equal weights is useful for comparing values over time, which means that this approach might still not be justifiable. However, we could at least reconsider the way we view weights, and start considering what others think.
4. Discussion
This method allows economists to have another view on any index that satisfies the initial conditions. Which means that we could make statistical comparisons to find out if a given index is coherent with reality. More significant candidates for specific public-perception-weighting are left open to delve deep into, such as the Rule of Law index, the Global Innovation Index, the Social Progress Index, the Corruption Perception Index, the Climate Change Performance Index, and many more.
There’s room to make a large-scale analysis with this method, using semantic groups, use web- scraping in specific new-sites for keyword generation and selection (in order to isolate the “worry effect”). Using factor analysis to eliminate significantly correlated topics to better estimate the set of the union of the components, and making a generally more optimized approach.
Some questions persist, we still do not know how to make strict qualitative-quantitative rela- tionships in a sufficiently clear manner, and it is even harder to eliminate noise; however, as our competences to analyze information improve, these conundrums will eventually be solved and we might find an answer we didn’t expect.
The idea of quantifying something that is qualitative by nature is complex yet not impossible, with more information on behavioral preferences from individuals coming in the future new oppor- tunities arise and the only thing we have to do is to take advantage of them.
References
1 Sharpe, A., & Andrews, B. (2012). An assessment of weighting methodologies for composite indicators: The case of the index of economic well-being.
2 Quiros-Romero, G., & Reinsdorf, M. B. (2020). Measuring economic welfare: What and how?
3 International Telecommunication Union. (n.d.). Individuals using the Internet (& of population) [Data set]. World Telecommunication/ICT Indicators Database.
4 Statista Number of internet users worldwide in 2024, by subregion(in millions)
5 Hayek, F. A. (1945). The use of knowledge in society. American Economic Review, 35(4), 519–530.
6 Economic Policiy Uncertainty (n.d.) Twitter-based Uncertainty Indices
7 Knowledge at Wharton. (2025). Why Google dominates the search engine market. Knowledge at Wharton, Business Journal.
8 Google. (n.d.). Google Trends. Database
9 United Nations. (n.d.) Per capita GNI at current prices (US dollars) Data set. UNdata.
10 United Nations (n.d.) Life expectancy at birth for both sexes combined (years) Data set. UN- data.
11 Ritchie, H., Roser, M., Ortiz-Ospina, E., & Hasell, J. (2023). Expected years of schooling. Data set. Our World in Data.
The appendices can be found at the end of the pdf.