Updating Cogworks Umbraco Find and Replace: V7 to V8

I Initially thought it would not be too hard a task, unfortunately I was wrong...

My V8 Umbraco challenge

My task was to update the Cogworks Umbraco package, "Find and Replace" from Umbraco v7 to v8. I Initially thought it would not be too hard a task and that it wouldn’t take too long. But unfortunately I was wrong...

I encountered quite a lot of complexity around updating Umbraco references from V7 to V8 and adjusting some Umbraco specific logic. Here the story begins with database structure and queries.

Altering the T-SQL queries

The first task was writing the T-SQL queries for checking how data was stored and finding some relations between published / unpublished / saved versions.

Initially, I checked whether the database query from the previous package version would work. It soon became apparent that the database structure had changed quite significantly.

So I decided to explore and find dependencies as a way to achieve my goals. I started with research around the following T-SQL query:

DECLARE @phrase AS NVARCHAR(MAX) = N'%TestPage%';
DECLARE @contentId AS NVARCHAR(MAX) = N'%-1%';

WITH GetAllSearchedData
AS
(
   SELECT
        upd.[id] AS [propertyDataId]
        ,upd.[versionId]
        ,udv.[published]
        ,ucv.[current]
        ,ud.[published] [documentPublished]
        ,ucv.[versionDate]
        ,upd.[varcharValue]
        ,upd.[textValue]
        ,cpt.[Alias]
        ,cpt.[Name] AS [nodeName]
        ,cpt.[contentTypeId]
        ,udv.[templateId]
        ,un.[id] AS [nodeid]
    FROM [umbracoPropertyData] [upd]
        JOIN [cmsPropertyType] [cpt] ON cpt.[id] = upd.[propertyTypeId]
        JOIN [umbracoContentVersion] [ucv] ON ucv.[id] = upd.[versionId]
        JOIN [umbracoDocumentVersion] [udv] ON udv.[id] = ucv.[id]
        JOIN [cmsPropertyTypeGroup] [cptg] ON cptg.[id] = cpt.[propertyTypeGroupId]
        JOIN [umbracoNode] [un] ON un.[id] = cptg.[contenttypeNodeId]
        JOIN [umbracoContent] [uc] ON uc.[nodeId] = ucv.[nodeId]
        JOIN [umbracoDocument] [ud] ON ud.[nodeId] = uc.[nodeId]
    WHERE 
        un.[path] LIKE @contentId
        AND (upd.[textValue] LIKE @phrase OR upd.[varcharValue] LIKE @phrase) 
)
SELECT
    *
FROM GetAllSearchedData

I get such results of saved content:

First saved and published data

propertyDataId | versionId | published | current | documentPublished | versionDate | varcharValue | textValue | Alias | nodeName | contentTypeId | templateId | nodeid
2372 | 1155 | 1 | 0 | 1 | 2020-01-20 11:17:44.090 | TestPage | NULL | pageTitle | Page Title | 1083 | 1075 | 1083
2373 | 1156 | 0 | 1 | 1 | 2020-01-20 11:17:44.090 | TestPage | NULL | pageTitle | Page Title | 1083 | 1075 | 1083
2376 | 1155 | 1 | 0 | 1 | 2020-01-20 11:17:44.090 | NULL | TestPage | seoMetaDescription | Description | 1084 | 1075 | 1084
2377 | 1156 | 0 | 1 | 1 | 2020-01-20 11:17:44.090 | NULL | TestPage | seoMetaDescription | Description | 1084 | 1075 | 1084

Save and Published

propertyDataId | versionId | published | current | versionDate | varcharValue | textValue | Alias | nodeName | contentTypeId | templateId | nodeid
2372 | 1155 | 0 | 0 | 1 | 2020-01-20 11:17:44.090 | TestPage | NULL | pageTitle | Page Title | 1083 | 1075 | 1083
2382 | 1156 | 1 | 0 | 1 | 2020-01-20 11:38:16.877 | TestPage2 | NULL | pageTitle | Page Title | 1083 | 1075 | 1083
2383 | 1157 | 0 | 1 | 1 | 2020-01-20 11:38:16.877 | TestPage2 | NULL | pageTitle | Page Title | 1083 | 1075 | 1083
2376 | 1155 | 0 | 0 | 1 | 2020-01-20 11:17:44.090 | NULL | TestPage | seoMetaDescription | Description | 1084 | 1075 | 1084
2386 | 1156 | 1 | 0 | 1 | 2020-01-20 11:38:16.877 | NULL | TestPage2 | seoMetaDescription | Description | 1084 | 1075 | 1084
2387 | 1157 | 0 | 1 | 1 | 2020-01-20 11:38:16.877 | NULL | TestPage2 | seoMetaDescription | Description | 1084 | 1075 | 1084

Unpublished

propertyDataId | versionId | published | current | documentPublished | versionDate | varcharValue | textValue | Alias | nodeName | contentTypeId | templateId | nodeid
2372 | 1155 | 0 | 0 | 0 | 2020-01-20 11:17:44.090 | TestPage | NULL | pageTitle | Page Title | 1083 | 1075 | 1083
2382 | 1156 | 1 | 0 | 0 | 2020-01-20 11:38:16.877 | TestPage2 | NULL | pageTitle | Page Title | 1083 | 1075 | 1083
2392 | 1157 | 0 | 1 | 0 | 2020-01-20 11:38:53.497 | TestPage2 | NULL | pageTitle | Page Title | 1083 | 1075 | 1083
2376 | 1155 | 0 | 0 | 0 | 2020-01-20 11:17:44.090 | NULL | TestPage | seoMetaDescription | Description | 1084 | 1075 | 1084
2386 | 1156 | 1 | 0 | 0 | 2020-01-20 11:38:16.877 | NULL | TestPage2 | seoMetaDescription | Description | 1084 | 1075 | 1084
2394 | 1157 | 0 | 1 | 0 | 2020-01-20 11:38:53.497 | NULL | TestPage2 | seoMetaDescription | Description | 1084 | 1075 | 1084

Save and Publish

propertyDataId | versionId | published | current | documentPublished | versionDate | varcharValue | textValue | Alias | nodeName | contentTypeId | templateId | nodeid
2372 | 1155 | 0 | 0 | 1 | 2020-01-20 11:17:44.090 | TestPage | NULL | pageTitle | Page Title | 1083 | 1075 | 1083
2382 | 1156 | 0 | 0 | 1 | 2020-01-20 11:38:16.877 | TestPage2 | NULL | pageTitle | Page Title | 1083 | 1075 | 1083
2397 | 1157 | 1 | 0 | 1 | 2020-01-20 11:42:28.470 | TestPage3 | NULL | pageTitle | Page Title | 1083 | 1075 | 1083
2398 | 1158 | 0 | 1 | 1 | 2020-01-20 11:42:28.470 | TestPage3 | NULL | pageTitle | Page Title | 1083 | 1075 | 1083
2376 | 1155 | 0 | 0 | 1 | 2020-01-20 11:17:44.090 | NULL | TestPage | seoMetaDescription | Description | 1084 | 1075 | 1084
2386 | 1156 | 0 | 0 | 1 | 2020-01-20 11:38:16.877 | NULL | TestPage2 | seoMetaDescription | Description | 1084 | 1075 | 1084
2401 | 1157 | 1 | 0 | 1 | 2020-01-20 11:42:28.470 | NULL | TestPage3 | seoMetaDescription | Description | 1084 | 1075 | 1084
2402 | 1158 | 0 | 1 | 1 | 2020-01-20 11:42:28.470 | NULL | TestPage3 | seoMetaDescription | Description | 1084 | 1075 | 1084

Save

propertyDataId | versionId | published | current | documentPublished | versionDate | varcharValue | textValue | Alias | nodeName | contentTypeId | templateId | nodeid
2372 | 1155 | 0 | 0 | 1 | 2020-01-20 11:17:44.090 | TestPage | NULL | pageTitle | Page Title | 1083 | 1075 | 1083
2382 | 1156 | 0 | 0 | 1 | 2020-01-20 11:38:16.877 | TestPage2 | NULL | pageTitle | Page Title | 1083 | 1075 | 1083
2397 | 1157 | 1 | 0 | 1 | 2020-01-20 11:42:28.470 | TestPage3 | NULL | pageTitle | Page Title | 1083 | 1075 | 1083
2407 | 1158 | 0 | 1 | 1 | 2020-01-20 11:47:02.167 | TestPage4 | NULL | pageTitle | Page Title | 1083 | 1075 | 1083
2376 | 1155 | 0 | 0 | 1 | 2020-01-20 11:17:44.090 | NULL | TestPage | seoMetaDescription | Description | 1084 | 1075 | 1084
2386 | 1156 | 0 | 0 | 1 | 2020-01-20 11:38:16.877 | NULL | TestPage2 | seoMetaDescription | Description | 1084 | 1075 | 1084
2401 | 1157 | 1 | 0 | 1 | 2020-01-20 11:42:28.470 | NULL | TestPage3 | seoMetaDescription | Description | 1084 | 1075 | 1084
2409 | 1158 | 0 | 1 | 1 | 2020-01-20 11:47:02.167 | NULL | TestPage4 | seoMetaDescription | Description | 1084 | 1075 | 1084

Save And Publish

propertyDataId | versionId | published | current | documentPublished | versionDate | varcharValue | textValue | Alias | nodeName | contentTypeId | templateId | nodeid
2372 | 1155 | 0 | 0 | 1 | 2020-01-20 11:17:44.090 | TestPage | NULL | pageTitle | Page Title | 1083 | 1075 | 1083
2382 | 1156 | 0 | 0 | 1 | 2020-01-20 11:38:16.877 | TestPage2 | NULL | pageTitle | Page Title | 1083 | 1075 | 1083
2397 | 1157 | 0 | 0 | 1 | 2020-01-20 11:42:28.470 | TestPage3 | NULL | pageTitle | Page Title | 1083 | 1075 | 1083
2412 | 1158 | 1 | 0 | 1 | 2020-01-20 11:47:40.860 | TestPage4 | NULL | pageTitle | Page Title | 1083 | 1075 | 1083
2413 | 1159 | 0 | 1 | 1 | 2020-01-20 11:47:40.860 | TestPage4 | NULL | pageTitle | Page Title | 1083 | 1075 | 1083
2376 | 1155 | 0 | 0 | 1 | 2020-01-20 11:17:44.090 | NULL | TestPage | seoMetaDescription | Description | 1084 | 1075 | 1084
2386 | 1156 | 0 | 0 | 1 | 2020-01-20 11:38:16.877 | NULL | TestPage2 | seoMetaDescription | Description | 1084 | 1075 | 1084
2401 | 1157 | 0 | 0 | 1 | 2020-01-20 11:42:28.470 | NULL | TestPage3 | seoMetaDescription | Description | 1084 | 1075 | 1084
2416 | 1158 | 1 | 0 | 1 | 2020-01-20 11:47:40.860 | NULL | TestPage4 | seoMetaDescription | Description | 1084 | 1075 | 1084
2417 | 1159 | 0 | 1 | 1 | 2020-01-20 11:47:40.860 | NULL | TestPage4 | seoMetaDescription | Description | 1084 | 1075 | 1084

Two versions of the document (published and current) are always created, one of which is current for the document and it is the version that will always become the version that will be published - in this case when we work on this document.

The subject of specific document versions such as drafts, saved version and published will be explained in the next blog post as it's an interesting topic in itself!

On first impressions it looked as though this query worked fine (if you just filter by published), but after looking more closely I discovered there was something wrong with the data structure. I noticed some badly joined tables, so in this case we lost some important data.

Understanding and debugging Umbraco CMS code

So to investigate what was going on, I decided to look into the Umbraco codebase to understand the behavior. The first thing I wanted to find was how Umbraco manipulates the content and then how they query the database to retrieve it from the DB.

I found that the file ‘DocumentRepository.cs’ contains the logic for storing the new items (published and current) using the method ‘PersistNewItem’ and from this code we can learn how they managed it.

protected override void PersistNewItem(IContent entity)
{
    entity.AddingEntity();

    var publishing = entity.PublishedState == PublishedState.Publishing;

    // ensure that the default template is assigned
    if (entity.TemplateId.HasValue == false)
        entity.TemplateId = entity.ContentType.DefaultTemplate?.Id;

    // sanitize names
    SanitizeNames(entity, publishing);

    // ensure that strings don't contain characters that are invalid in xml
    // TODO: do we really want to keep doing this here?
    entity.SanitizeEntityPropertiesForXmlStorage();

    // create the dto
    var dto = ContentBaseFactory.BuildDto(entity, NodeObjectTypeId);

    // derive path and level from parent
    var parent = GetParentNodeDto(entity.ParentId);
    var level = parent.Level + 1;

    // get sort order
    var sortOrder = GetNewChildSortOrder(entity.ParentId, 0);

    // persist the node dto
    var nodeDto = dto.ContentDto.NodeDto;
    nodeDto.Path = parent.Path;
    nodeDto.Level = Convert.ToInt16(level);
    nodeDto.SortOrder = sortOrder;

    // see if there's a reserved identifier for this unique id
    // and then either update or insert the node dto
    var id = GetReservedId(nodeDto.UniqueId);
    if (id > 0)
        nodeDto.NodeId = id;
    else
        Database.Insert(nodeDto);

    nodeDto.Path = string.Concat(parent.Path, ",", nodeDto.NodeId);
    nodeDto.ValidatePathWithException();
    Database.Update(nodeDto);

    // update entity
    entity.Id = nodeDto.NodeId;
    entity.Path = nodeDto.Path;
    entity.SortOrder = sortOrder;
    entity.Level = level;

    // persist the content dto
    var contentDto = dto.ContentDto;
    contentDto.NodeId = nodeDto.NodeId;
    Database.Insert(contentDto);

    // persist the content version dto
    var contentVersionDto = dto.DocumentVersionDto.ContentVersionDto;
    contentVersionDto.NodeId = nodeDto.NodeId;
    contentVersionDto.Current = !publishing;
    Database.Insert(contentVersionDto);
    entity.VersionId = contentVersionDto.Id;

    // persist the document version dto
    var documentVersionDto = dto.DocumentVersionDto;
    documentVersionDto.Id = entity.VersionId;
    if (publishing)
        documentVersionDto.Published = true;
    Database.Insert(documentVersionDto);

    // and again in case we're publishing immediately
    if (publishing)
    {
        entity.PublishedVersionId = entity.VersionId;
        contentVersionDto.Id = 0;
        contentVersionDto.Current = true;
        contentVersionDto.Text = entity.Name;
        Database.Insert(contentVersionDto);
        entity.VersionId = contentVersionDto.Id;

        documentVersionDto.Id = entity.VersionId;
        documentVersionDto.Published = false;
        Database.Insert(documentVersionDto);
    }

    // persist the property data
    var propertyDataDtos = PropertyFactory.BuildDtos(entity.ContentType.Variations, entity.VersionId, entity.PublishedVersionId, entity.Properties, LanguageRepository, out var edited, out var editedCultures);
    foreach (var propertyDataDto in propertyDataDtos)
        Database.Insert(propertyDataDto);

    // if !publishing, we may have a new name != current publish name,
    // also impacts 'edited'
    if (!publishing && entity.PublishName != entity.Name)
        edited = true;

    // persist the document dto
    // at that point, when publishing, the entity still has its old Published value
    // so we need to explicitly update the dto to persist the correct value
    if (entity.PublishedState == PublishedState.Publishing)
        dto.Published = true;
    dto.NodeId = nodeDto.NodeId;
    entity.Edited = dto.Edited = !dto.Published || edited; // if not published, always edited
    Database.Insert(dto);

    //insert the schedule
    PersistContentSchedule(entity, false);

    // persist the variations
    if (entity.ContentType.VariesByCulture())
    {
        // bump dates to align cultures to version
        if (publishing)
            entity.AdjustDates(contentVersionDto.VersionDate);

        // names also impact 'edited'
        // ReSharper disable once UseDeconstruction
        foreach (var cultureInfo in entity.CultureInfos)
            if (cultureInfo.Name != entity.GetPublishName(cultureInfo.Culture))
                (editedCultures ?? (editedCultures = new HashSet<string>(StringComparer.OrdinalIgnoreCase))).Add(cultureInfo.Culture);

        // insert content variations
        Database.BulkInsertRecords(GetContentVariationDtos(entity, publishing));

        // insert document variations
        Database.BulkInsertRecords(GetDocumentVariationDtos(entity, editedCultures));
    }

    // refresh content
    entity.SetCultureEdited(editedCultures);

    // trigger here, before we reset Published etc
    OnUowRefreshedEntity(new ScopedEntityEventArgs(AmbientScope, entity));

    // flip the entity's published property
    // this also flips its published state
    // note: what depends on variations (eg PublishNames) is managed directly by the content
    if (entity.PublishedState == PublishedState.Publishing)
    {
        entity.Published = true;
        entity.PublishTemplateId = entity.TemplateId;
        entity.PublisherId = entity.WriterId;
        entity.PublishName = entity.Name;
        entity.PublishDate = entity.UpdateDate;

        SetEntityTags(entity, _tagRepository);
    }
    else if (entity.PublishedState == PublishedState.Unpublishing)
    {
        entity.Published = false;
        entity.PublishTemplateId = null;
        entity.PublisherId = null;
        entity.PublishName = null;
        entity.PublishDate = null;

        ClearEntityTags(entity, _tagRepository);
    }

    entity.ResetDirtyProperties();

    // troubleshooting
    //if (Database.ExecuteScalar<int>($"SELECT COUNT(*) FROM {Constants.DatabaseSchema.Tables.DocumentVersion} JOIN {Constants.DatabaseSchema.Tables.ContentVersion} ON {Constants.DatabaseSchema.Tables.DocumentVersion}.id={Constants.DatabaseSchema.Tables.ContentVersion}.id WHERE published=1 AND nodeId=" + content.Id) > 1)
    //{
    //    Debugger.Break();
    //    throw new Exception("oops");
    //}
    //if (Database.ExecuteScalar<int>($"SELECT COUNT(*) FROM {Constants.DatabaseSchema.Tables.DocumentVersion} JOIN {Constants.DatabaseSchema.Tables.ContentVersion} ON {Constants.DatabaseSchema.Tables.DocumentVersion}.id={Constants.DatabaseSchema.Tables.ContentVersion}.id WHERE [current]=1 AND nodeId=" + content.Id) > 1)
    //{
    //    Debugger.Break();
    //    throw new Exception("oops");
    //}
}

Looking at the code I realized how Umbraco is receiving documents. In the same file we have the method ‘GetBaseQuery’:-

protected Sql<ISqlContext> GetBaseQuery(QueryType queryType, bool current)
{
    var sql = SqlContext.Sql();

    switch (queryType)
    {
        case QueryType.Count:
            sql = sql.SelectCount();
            break;
        case QueryType.Ids:
            sql = sql.Select<DocumentDto>(x => x.NodeId);
            break;
        case QueryType.Single:
        case QueryType.Many:
            // R# may flag this ambiguous and red-squiggle it, but it is not
            sql = sql.Select<DocumentDto>(r =>
                r.Select(documentDto => documentDto.ContentDto, r1 =>
                    r1.Select(contentDto => contentDto.NodeDto))
                .Select(documentDto => documentDto.DocumentVersionDto, r1 =>
                    r1.Select(documentVersionDto => documentVersionDto.ContentVersionDto))
                .Select(documentDto => documentDto.PublishedVersionDto, "pdv", r1 =>
                    r1.Select(documentVersionDto => documentVersionDto.ContentVersionDto, "pcv")))

                // select the variant name, coalesce to the invariant name, as "variantName"
                .AndSelect(VariantNameSqlExpression + " AS variantName");
            break;
    }

    sql
        .From<DocumentDto>()
        .InnerJoin<ContentDto>().On<DocumentDto, ContentDto>(left => left.NodeId, right => right.NodeId)
        .InnerJoin<NodeDto>().On<ContentDto, NodeDto>(left => left.NodeId, right => right.NodeId)

        // inner join on mandatory edited version
        .InnerJoin<ContentVersionDto>()
            .On<DocumentDto, ContentVersionDto>((left, right) => left.NodeId == right.NodeId)
        .InnerJoin<DocumentVersionDto>()
            .On<ContentVersionDto, DocumentVersionDto>((left, right) => left.Id == right.Id)

        // left join on optional published version
        .LeftJoin<ContentVersionDto>(nested =>
            nested.InnerJoin<DocumentVersionDto>("pdv")
                    .On<ContentVersionDto, DocumentVersionDto>((left, right) => left.Id == right.Id && right.Published, "pcv", "pdv"), "pcv")
            .On<DocumentDto, ContentVersionDto>((left, right) => left.NodeId == right.NodeId, aliasRight: "pcv")

        // TODO: should we be joining this when the query type is not single/many?
        // left join on optional culture variation
        //the magic "[[[ISOCODE]]]" parameter value will be replaced in ContentRepositoryBase.GetPage() by the actual ISO code
        .LeftJoin<ContentVersionCultureVariationDto>(nested =>
            nested.InnerJoin<LanguageDto>("lang").On<ContentVersionCultureVariationDto, LanguageDto>((ccv, lang) => ccv.LanguageId == lang.Id && lang.IsoCode == "[[[ISOCODE]]]", "ccv", "lang"), "ccv")
            .On<ContentVersionDto, ContentVersionCultureVariationDto>((version, ccv) => version.Id == ccv.VersionId, aliasRight: "ccv");

    sql
        .Where<NodeDto>(x => x.NodeObjectType == NodeObjectTypeId);

    // this would ensure we don't get the published version - keep for reference
    //sql
    //    .WhereAny(
    //        x => x.Where<ContentVersionDto, ContentVersionDto>((x1, x2) => x1.Id != x2.Id, alias2: "pcv"),
    //        x => x.WhereNull<ContentVersionDto>(x1 => x1.Id, "pcv")
    //    );

    if (current)
        sql.Where<ContentVersionDto>(x => x.Current); // always get the current version

    return sql;
}

In all cases, every time I ran this query, the current value was set to true. So I decided to use this query and play around with it.

A light at the end of the SQL tunnel

Using the SQL above, I played around with it directly in the database. Finally after some fiddling, I got a working solution for our package.

SELECT
    [pcv].[text] AS [NodeName],
    [cv].[id] AS [VersionId],
    [cpt].[Alias] AS [PropertyAlias],
    [upd].[varcharValue] AS [VarcharValue],
    [upd].[textValue] AS [TextValue],
    [pdv].[published]
FROM [umbracoDocument] AS [d]
    INNER JOIN [umbracoContent] AS [c] ON [d].[nodeId] = [c].[nodeId]
    INNER JOIN [umbracoNode] AS [n] ON [c].[nodeId] = [n].[id]
    INNER JOIN [umbracoContentVersion] AS [cv] ON ([d].[nodeId] = [cv].[nodeId])
    INNER JOIN [umbracoDocumentVersion] AS [dv] ON ([cv].[id] = [dv].[id])
    LEFT JOIN [umbracoContentVersion] [pcv]
    INNER JOIN [umbracoDocumentVersion] [pdv] ON (([pcv].[id] = [pdv].[id]) AND [pdv].[published] = 1) ON ([d].[nodeId] = [pcv].[nodeId])
    LEFT JOIN [umbracoContentVersionCultureVariation] [ccv]
    INNER JOIN [umbracoLanguage] [lang] ON (([ccv].[languageId] = [lang].[id]) AND ([lang].[languageISOCode] = N'[[[ISOCODE]]]')) ON ([cv].[id] = [ccv].[versionId])
    INNER JOIN [umbracoPropertyData] [upd] ON [upd].[versionId] = [pcv].[id]
    INNER JOIN [cmsPropertyType] [cpt] ON [cpt].[id] = [upd].[propertyTypeId]
WHERE
    [cv].[current] = 1 AND
    [d].[published] = 1 AND
    ([d].[edited] = 0) AND
    [n].[path] LIKE @contentId AND
    ([upd].[textValue] LIKE @phrase OR [upd].[varcharValue] LIKE @phrase)
ORDER BY [d].[nodeId] ASC

But I decided to translate this query to a strongly typed query.

The return of the issue!

Now that I had the T-SQL query working, it was time to convert it to a strongly typed query, but welcome back to the issue!

                var sql = scope.SqlContext.Sql();

                sql = sql.Select<PropertyDataDto>("upd", upd => upd.VarcharValue);
                sql = sql.AndSelect<PropertyDataDto>("upd", upd => upd.TextValue);
                sql = sql.AndSelect<ContentVersionDto>("cv", cv => cv.Id); // VersionId
                sql = sql.AndSelect<ContentVersionDto>("pcv", pcv => pcv.Text); // NodeName
                sql = sql.AndSelect<PropertyTypeDto>("upt", upt => upt.Alias); // PropertyAlias
                sql = sql.AndSelect<PropertyTypeDto>("upt", upt => upt.Name); // Name

                sql = sql.From<DocumentDto>();

                sql = sql.InnerJoin<ContentDto>()
                    .On<ContentDto, DocumentDto>((left, right) => left.NodeId == right.NodeId);

                sql = sql.InnerJoin<NodeDto>()
                    .On<NodeDto, ContentDto>((left, right) => left.NodeId == right.NodeId);

                sql = sql.InnerJoin<ContentVersionDto>("cv")
                    .On<ContentVersionDto, DocumentDto>((left, right) => left.NodeId == right.NodeId, "cv");

                sql = sql.InnerJoin<DocumentVersionDto>()
                    .On<DocumentVersionDto, ContentVersionDto>((left, right) => left.Id == right.Id, aliasRight: "cv");

                sql = sql.LeftJoin<ContentVersionDto>(nested =>
                        nested.InnerJoin<DocumentVersionDto>("pdv")
                            .On<DocumentVersionDto, ContentVersionDto>(
                                (left, right) => left.Id == right.Id && left.Published, "pdv", "pcv"), "pcv")
                    .On<ContentVersionDto, DocumentDto>((left, right) => left.NodeId == right.NodeId, "pcv");

                sql = sql.LeftJoin<ContentVersionCultureVariationDto>(nested =>
                        nested.InnerJoin<LanguageDto>("lang")
                            .On<LanguageDto, ContentVersionCultureVariationDto>(
                                (left, right) => left.Id == right.LanguageId && left.IsoCode == "[[[ISOCODE]]]", "lang",
                                "ccv"), "ccv")
                    .On<ContentVersionDto, ContentVersionCultureVariationDto>(
                        (left, right) => left.Id == right.VersionId, "cv", "ccv");

                sql = sql.InnerJoin<PropertyDataDto>("upd")
                    .On<PropertyDataDto, ContentVersionDto>((left, right) => left.VersionId == right.Id, "upd", "cv");

                sql = sql.InnerJoin<PropertyTypeDto>("upt")
                    .On<PropertyTypeDto, PropertyDataDto>((left, right) => left.Id == right.PropertyTypeId, "upt", "upd");

                sql = sql.Where<ContentVersionDto>(c => c.Current, "cv");
                sql = sql.Where<DocumentDto>(d => d.Published);
                sql = sql.Where<DocumentDto>(d => !d.Edited);
                sql = sql.Where<NodeDto>(n => n.Path.Contains(contentId.ToString()));

                sql = sql.Where<PropertyDataDto>(p =>
                    (p.TextValue != null &&
                        p.TextValue.Contains(phrase)) ||
                    (p.VarcharValue != null &&
                        p.VarcharValue.Contains(phrase)));

Using the the code above I get the following exception - "Argument data type ntext is invalid for argument 1 of upper function". So I took a look at the NPoco library to find how they manage to translate the “Contains” method to a SQL query. I noticed that NPoco doesn't see attributions on the columns data type annotation (it doesn't read that it’s [N]TEXT) and adds the translate "Contains" method to the “like” statement and then wraps the first argument with the upper function. You can see it on this code snippet:

protected virtual object VisitColumnAccessMethod(MethodCallExpression m)
{
    var expression = (PartialSqlString)Visit(m.Object);

    if (_projection && expression is MemberAccessString)
        return expression;

    string statement;
    List<Object> args = this.VisitExpressionList(m.Arguments);

    switch (m.Method.Name)
    {
        case "ToUpper":
            statement = string.Format("upper({0})", expression);
            break;
        case "ToLower":
            statement = string.Format("lower({0})", expression);
            break;
        case "StartsWith":
            statement = CreateLikeStatement(expression, CreateParam(EscapeParam(args[0]) + "%"));
            break;
        case "EndsWith":
            statement = CreateLikeStatement(expression, CreateParam("%" + EscapeParam(args[0])));
            break;
        case "Contains":
            statement = CreateLikeStatement(expression, CreateParam("%" + EscapeParam(args[0]) + "%"));
            break;
protected virtual string CreateLikeStatement(PartialSqlString expression, string param)
{
    return string.Format("upper({0}) like {1} escape '{2}'", expression, param, EscapeChar);
}

And this is basically what is causing the issue. Following the remarks in the Microsoft documentation we can see that we don't have upper function that we are allowed to use on (N)Varchar type, so the upper function is not allowed (more info).

Summary

Finally after a few experiments I decide to use the following code:-

var likePhrase = $"'%{phrase}%'";

var textQuery = new Sql($@"AND
(
    (
        ([upd].[textValue] is not null)
        AND [upd].[textValue] LIKE {likePhrase}
    )
    OR
    (
        ([upd].[varcharValue] is not null)
        AND [upd].[varcharValue] LIKE {likePhrase}
    )
)");

sql = sql.Append(textQuery);

Conclusion

In the end we decided to revert back to using the SQL query instead of the strongly typed query. We did it, because of the issue with keeping models up-to-date with any breaking changes in the database models in future versions of Umbraco.

Translating a SQL query to a strongly typed query taught us a lot, at first glance you don’t think it is going to be too much of a challenge, however there are some issues with NPoco, for example:

  • anonymous types in the Select element (changing the name of the column)
  • appending the upper function to the contains method
  • orderby is not possible to be used to select a column alias
  • not allowing common table expression

So, in the end the final query looks like this:-


var sqlParams = new
{
    phrase = $"%{phrase}%",
    contentId = $"%{contentId}%"
};

var sqlQuery = $@"
    SELECT
          [upd].[varcharValue] AS [VarcharValue]
        , [upd].[textValue] AS [TextValue]
        , [cv].[id] AS [VersionId]
        , [pcv].[text] AS [NodeName]
        , [upt].[Alias] AS [PropertyAlias]
        , [upt].[Name] AS [PropertyName]
    FROM [umbracoDocument]
        INNER JOIN [umbracoContent] ON ([umbracoContent].[nodeId] = [umbracoDocument].[nodeId])
        INNER JOIN [umbracoNode] ON ([umbracoNode].[id] = [umbracoContent].[nodeId])
        INNER JOIN [umbracoContentVersion] [cv] ON ([cv].[nodeId] = [umbracoDocument].[nodeId])
        INNER JOIN [umbracoDocumentVersion] ON ([umbracoDocumentVersion].[id] = [cv].[id])
        LEFT JOIN [umbracoContentVersion] [pcv]
            INNER JOIN [umbracoDocumentVersion] [pdv] ON (([pdv].[id] = [pcv].[id]) AND [pdv].[published] = 1)
        ON ([pcv].[nodeId] = [umbracoDocument].[nodeId])
        LEFT JOIN [umbracoContentVersionCultureVariation] [ccv]
            INNER JOIN [umbracoLanguage] [lang] ON (([lang].[id] = [ccv].[languageId]) AND ([lang].[languageISOCode] =N'[[[ISOCODE]]]'))
        ON ([cv].[id] = [ccv].[versionId])
        INNER JOIN [umbracoPropertyData] [upd] ON ([upd].[versionId] = [cv].[id])
        INNER JOIN [cmsPropertyType] [upt] ON ([upt].[id] = [upd].[propertyTypeId])
    WHERE ([cv].[current] = 1)
        AND ([umbracoDocument].[published] = 1)
        AND ([umbracoDocument].[edited] = 0)
        AND ([umbracoNode].[path] LIKE upper(@contentId))
        AND
        (
            (
                [upd].[textValue] is not null AND [upd].[textValue] LIKE @phrase
            )
            OR
            (
                [upd].[varcharValue] is not null AND [upd].[varcharValue] LIKE @phrase
            )
        )
    ORDER BY ([umbracoDocument].[nodeId])";

var sql = scope.SqlContext.Sql(sqlQuery, sqlParams);

var results = await scope.Database
    .QueryAsync<ContentDataModel>(sql);

Whilst we were looking at this we also decided to add some security to protect against SQL Injection attacks - the initial versions of the code were MVP.

The topic of how Umbraco stores/manage versions of the content (saved, draft, published) is going to be a separate blog post. We took the approach to work only on Published document variations - by design.

Working on this package has taught me a lot about the internal mechanisms of both Umbraco and NPoco. We’d welcome any feedback you have about this post and our Find and Replace package.

  • Image for Redefining Outsourcing: Canopius' Voyage to a Paperless Insurance Platform Strategy

    Redefining Outsourcing: Canopius' Voyage to a Paperless Insurance Platform

  • Image for Data Visualisation: Behind Bestinvest’s Grow My Money tool Build

    Data Visualisation: Behind Bestinvest’s Grow My Money tool

  • Image for Cloud scalability: Azure’s new cost-saving features explained Build

    Cloud scalability: Azure’s new cost-saving features explained

  • Image for 5 Inspiring Companies That Use Digital Solutions to Save the Planet News

    5 Inspiring Companies That Use Digital Solutions to Save the Planet

Ready to collaborate ?

Get in touch to see how we can transform your digital presence.

Send us a message