CASTEMO stands for “Computer-Assisted Semantic Text Modelling”. It is a specific approach to the collection of structured, deeply interconnected data following the natural syntactic structure of the written word. This means that you can preserve the essential lexical, syntactic, and semantic features of the original expression, the contextual embeddedness of the collected data as well as conflicting evidence and information given in a non-indicative modality, such as questions and conditional sentences. While doing this, you can also add further layers of annotation on top of the textual layer.
What is Computer-Assisted Semantic Text Modelling?
Computer-Assisted Semantic Text Modelling (CASTEMO) is a human-controlled, computer-assisted way of collecting data from texts.
CASTEMO allows you to:
- gather richly structured data from texts, ideal for systematic querying and quantitative analyses;
- comprehensively model the different semantic and syntactic dimensions of texts;
- follow a truly data-driven (source-driven) approach, rather than capturing a set of variables governed by a list of predefined hypotheses;
- preserve the natural syntactic “subject-predicate-object1-object2” structure;
- preserve the original expressions;
- work easily with multi-language resources;
- preserve the exact order in which information is given;
- preserve full information on “who is speaking”, “when/where are they speaking,” etc., through a hierarchical model of text parts combined with metadata describing these text parts at any level;
- enrich the model of the text with analytical layers (e.g., editorial classifications and inferences), while keeping the levels clearly distinct;
- record conflicting evidence, because it constructs the sentences of texts as statements: there is no issue with two statements presenting conflicting information, and you can thus choose which information to preference during analysis, rather than during collection;
- collect data selectively as well as in a maximalistic way, i.e. capturing just some aspects of your texts, or capturing every sentence;
- classify information in ways best adapted to the given project and set of sources.
CASTEMO enables you to handle the complexity of textual sources, i.e. represents:
- epistemic level to differentiate actual textual content, editorial interpretation of textual content, and more free editorial inference which goes beyond the text;
- editorial certainty levels (the dictionary we opted for is: not stated, certain, almost certain, probable, possible, dubious, false), which can be added to any statement, property, or any actant’s involvement in a statement;
- positive/negative logic (to represent also negative statements);
- modality (indication, question, condition, probability, wish…) and mood variant (realis/irrealis);
- conflicting information;
- various temporal and spatial relations (incl. relative), frequency, duration;
- and, if you want, even nuances such as partitivity to express iteration (e.g., “he went on giving food to the heretics for three years”: the food is one Object, but you can mark that it was given partitively);
- thus, in summary, CASTEMO allows most of what the natural language allows.
CASTEMO requires the adoption of some basic data model structures:
- data objects are expressed as entities (typically Persons, Groups, Actions, Events, Concepts, Objects, Locations, Resources, and Territories aka Texts);
- entities are related through statements with a syntactic structure (subject, predicate, object1, and object2);
- statements are also a type of entity, and can be related within statements (e.g. to express subordinate clauses) and properties;
- properties of entities have the following syntax: origin (i.e. entity to which you append the property) – property type – property value;
- properties are always read with the “has” verb: e.g., “Tom is a baker” would be modelled as “Tom – has – [occupation] – baker” (the property type will be expressed at a different epistemic level if it is not expressed in the text);
- the idea of editorial certainty, logic, modality, mood variant, partitivity.
However, CASTEMO does not require you to accept a very specific data model (ontology).
You do not need to accept any of our:
- specific dictionaries (e.g., you can opt for different certainty levels);
- specific taxonomies in the DISSINET Concepts and Actions lexico-semantic network (e.g., apple as fruit and fruit as comestible – you might have different choices);
- specific data collection guidelines – we have ours in DISSINET, but you are welcome to make your decisions as befits your project.
Who should be interested in CASTEMO and InkVisitor?
Is CASTEMO right for you and your research?
If any of the following sound like you, then please do contact us to explore further:
- I am interested in a holistic analytical exploration of a text or collection of texts.
- I want to collect structured and queryable data while remaining very close to the texts. Preserving the syntactic structure, context, and order in which the information is given is important for my research.
- I want to have the structure of data fully under my control.
- I am interested in the complex webs of social, spatiotemporal, and discursive connections between pieces of data.
- I want to record both what the text says and what I think it means. But I also want to keep those epistemic levels clearly separated in the data.
- I want to be able to grasp the near-totality of the source and thus make various data projections without returning to collect further data from this source.
Want to know more?
Check the article Model the source first!, which explains the principles of CASTEMO in more detail. If CASTEMO and InkVisitor are of interest to you, feel free to contact us to discuss your needs and get testing access to the InkVisitor Sandbox.